Will Smith consuming spaghetti and different bizarre AI benchmarks that took off in 2024

January 1, 2025

94

When an organization releases a brand new AI video generator, it’s not lengthy earlier than somebody makes use of it to make a video of actor Will Smith consuming spaghetti.

It’s turn out to be one thing of a meme in addition to a benchmark: Seeing whether or not a brand new video generator can realistically render Smith slurping down a bowl of noodles. Smith himself parodied the pattern in an Instagram submit in February.

Google Veo 2 has finished it.

We at the moment are consuming spaghett eventually. pic.twitter.com/AZO81w8JC0

— Jerrod Lew (@jerrod_lew) December 17, 2024

Will Smith and pasta is however certainly one of a number of weird “unofficial” benchmarks to take the AI neighborhood by storm in 2024. A 16-year-old developer constructed an app that offers AI management over Minecraft and assessments its potential to design constructions. Elsewhere, a British programmer created a platform the place AI performs video games like Pictionary and Join 4 towards one another.

It’s not like there aren’t extra tutorial assessments of an AI’s efficiency. So why did the weirder ones blow up?

LLM Pictionary — **Picture Credit:**Paul Calcraft

For one, lots of the industry-standard AI benchmarks don’t inform the common particular person very a lot. Corporations usually cite their AI’s potential to reply questions on Math Olympiad exams, or determine believable options to PhD-level issues. But most individuals — yours really included — use chatbots for issues like responding to emails and primary analysis.

Crowdsourced {industry} measures aren’t essentially higher or extra informative.

Take, for instance, Chatbot Enviornment, a public benchmark many AI lovers and builders comply with obsessively. Chatbot Enviornment lets anybody on the internet charge how effectively AI performs on specific duties, like creating an online app or producing a picture. However raters have a tendency to not be consultant — most come from AI and tech {industry} circles — and solid their votes based mostly on private, hard-to-pin-down preferences.

The Chatbot Enviornment interface.Picture Credit:LMSYS

Ethan Mollick, a professor of administration at Wharton, not too long ago identified in a submit on X one other downside with many AI {industry} benchmarks: they don’t evaluate a system’s efficiency to that of the common particular person.

“The truth that there should not 30 totally different benchmarks from totally different organizations in drugs, in regulation, in recommendation high quality, and so forth is an actual disgrace, as persons are utilizing techniques for these items, regardless,” Mollick wrote.

Bizarre AI benchmarks like Join 4, Minecraft, and Will Smith consuming spaghetti are most definitely not empirical — and even all that generalizable. Simply because an AI nails the Will Smith take a look at doesn’t imply it’ll generate, say, a burger effectively.

Mcbench — Word the typo; there’s no such mannequin as Claude 3.6 Sonnet.Picture Credit:Adonis Singh

One knowledgeable I spoke to about AI benchmarks instructed that the AI neighborhood concentrate on the downstream impacts of AI as an alternative of its potential in slim domains. That’s smart. However I’ve a sense that bizarre benchmarks aren’t going away anytime quickly. Not solely are they entertaining — who doesn’t like watching AI construct Minecraft castles? — however they’re simple to know. And as my colleague Max Zeff wrote about not too long ago, the {industry} continues to grapple with distilling a expertise as advanced as AI into digestible advertising.

The one query in my thoughts is, which odd new benchmarks will go viral in 2025?

TechCrunch has an AI-focused publication! Join right here to get it in your inbox each Wednesday.

Will Smith consuming spaghetti and different bizarre AI benchmarks that took off in 2024

Related Articles

Mapping the 21 unlawful settlements Israel had in Gaza 20 years in the past | Israel-Palestine battle Information

Carbadox, the Carcinogenic Drug Fed to U.S. Pigs however Banned in Different Nations

As Trump Expands Strikes on Cartels, A lot is Nonetheless Unclear – The Cipher Transient

LEAVE A REPLY Cancel reply

Latest Articles

Mapping the 21 unlawful settlements Israel had in Gaza 20 years in the past | Israel-Palestine battle Information

Carbadox, the Carcinogenic Drug Fed to U.S. Pigs however Banned in Different Nations

As Trump Expands Strikes on Cartels, A lot is Nonetheless Unclear – The Cipher Transient

Charlie Kirk homicide suspect Tyler Robinson to look in court docket: What to know | Donald Trump Information

Sri Lanka, Bangladesh, Nepal: Is South Asia fertile for Gen Z revolutions? | Protests