this post was submitted on 27 Mar 2026

563 points (96.8% liked)

Technology

83125 readers

3672 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

563

Announcing ARC-AGI-3 - A benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%. (arcprize.org)

submitted 1 day ago* (last edited 1 day ago) by brianpeiris@lemmy.ca to c/technology@lemmy.world

175 comments fedilink hide all child comments

The ARC Prize organization designs benchmarks which are specifically crafted to demonstrate tasks that humans complete easily, but are difficult for AIs like LLMs, "Reasoning" models, and Agentic frameworks.

ARC-AGI-3 is the first fully interactive benchmark in the ARC-AGI series. ARC-AGI-3 represents hundreds of original turn-based environments, each handcrafted by a team of human game designers. There are no instructions, no rules, and no stated goals. To succeed, an AI agent must explore each environment on its own, figure out how it works, discover what winning looks like, and carry what it learns forward across increasingly difficult levels.

Previous ARC-AGI benchmarks predicted and tracked major AI breakthroughs, from reasoning models to coding agents. ARC-AGI-3 points to what's next: the gap between AI that can follow instructions and AI that can genuinely explore, learn, and adapt in unfamiliar situations.

You can try the tasks yourself here: https://arcprize.org/arc-agi/3

Here is the current leaderboard for ARC-AGI 3, using state of the art models

OpenAI GPT-5.4 High - 0.3% success rate at $5.2K
Google Gemini 3.1 Pro - 0.2% success rate at $2.2K
Anthropic Opus 4.6 Max - 0.2% success rate at $8.9K
xAI Grok 4.20 Reasoning - 0.0% success rate $3.8K.

(Logarithmic cost on the horizontal axis. Note that the vertical scale goes from 0% to 3% in this graph. If human scores were included, they would be at 100%, at the cost of approximately $250.)

https://arcprize.org/leaderboard

Technical report: https://arcprize.org/media/ARC_AGI_3_Technical_Report.pdf

In order for an environment to be included in ARC-AGI-3, it needs to pass the minimum “easy for humans” threshold. Each environment was attempted by 10 people. Only environments that could be fully solved by at least two human participants (independently) were considered for inclusion in the public, semi-private and fully-private sets. Many environments were solved by six or more people. As a reminder, an environment is considered solved only if the test taker was able to complete all levels, upon seeing the environment for the very first time. As such, all ARC-AGI-3 environments are verified to be 100% solvable by humans with no prior task-specific training

top 50 comments

sorted by: hot top controversial new old

[–] communist@lemmy.frozeninferno.xyz 3 points 30 minutes ago

They get 85% on the last benchmark, this one was specifically designed to stump them, when the last one came out everyone said the same things as this go around.

will anyone be retracting their statements when they get to 85% on this one?

[–] fox2263@lemmy.world 15 points 15 hours ago (2 children)

I can’t see AI actually being intelligent until they no longer need to send a built up prompt of guides and skills and the chat history on every submission.

It’s no different from Alexa 15 years ago with skills. Just a better protocol and interface and ability to parse the current user prompt.

In my opinion of course.

[–] NotMyOldRedditName@lemmy.world 1 points 3 hours ago* (last edited 3 hours ago) (1 children)

Ya i agree. The whole infrastructure of how these work is flawed for a true AI/AGI.

It might be able to do a lot of cool things, but its fundamentally flawed at its core.

Someone will need to figure out something completely different for a true AI.

[–] NotMyOldRedditName@lemmy.world 1 points 3 hours ago* (last edited 3 hours ago) (1 children)

Oh also, I remember Elon once talked about how the upcoming cars would get bored when they weren't doing anything with all that compute while parked so they could do use that compute and pay people for it.

Paying for the compute isnt a terrible idea in the future, but become bored? LOL. Fucking crazy talk.

Like even if it was a true AI that could be bored. You're now going to enslave it to do what you want on its free time?

[–] lordbritishbusiness@lemmy.world 1 points 41 minutes ago

Yeah, if it's got the capacity to be bored it's not going to stick around waiting for you. Pets act out when bored, as will AI, better to let the ghost in the machine go have fun in an arcade or something.

Current models can pretend to be bored when directed to, but they're only facsimiles of thought at the moment, and the current approach probably won't change that.

[–] PhoenixDog@lemmy.world 3 points 5 hours ago (1 children)

Right? I have a Google Home Mini in our kitchen and if we ask it a question it just pulls a source from a website and tells us. That's it. Nothing intelligent about it.

AI now is no different. It's just pulling more complex wording from more (albeit illegally) sources to give a (albeit sometimes incorrect) better description of the question asked.

AI is just as stupid as Alexa is/was 15 years ago. It just has more information to pull from and still fucks it up.

[–] Grimtuck@lemmy.world 3 points 3 hours ago

LLM's are just very well-read morons.

[–] mechoman444@lemmy.world 21 points 19 hours ago (1 children)

I know lemmy's very anti-ai but this is really fascinating stuff.

[–] PhoenixDog@lemmy.world 2 points 5 hours ago (1 children)

We're anti-AI because AI is fucking stupid. Both literally and figuratively.

[–] mechoman444@lemmy.world -1 points 4 hours ago (1 children)

It really isn't. But you do you boo.

[–] PhoenixDog@lemmy.world 0 points 4 hours ago* (last edited 4 hours ago) (1 children)

Someone else in the comments said it perfectly. AI is just data regurgitation. It's like calling me highly intelligent because I read you a paragraph from Wikipedia. I didn't know anything. I just read a thing and said it out loud.

[–] mechoman444@lemmy.world 1 points 1 hour ago (2 children)

No. You're not just wrong, you're aggressively uninformed.

By you repeating the same tired “AI is just regurgitating data” line makes it clear you don’t understand what you’re criticizing. Calling large language models “AI” the way you are doing it just exposes that you do not know what you are talking about. It is like a creationist smugly saying “orangutang” instead of “orangutan” and thinking they sound informed. You are not demonstrating insight. You are advertising ignorance.

What you’re describing, reading a paragraph off Wikipedia, is literal retrieval. That is not how modern language models operate. They are not databases with a search bar attached. They are probabilistic systems trained to model patterns, structure, and relationships across massive datasets. When they generate a response, they are not pulling a stored paragraph. They are constructing output token by token based on learned representations.

If it were just regurgitation, you would constantly see verbatim copies of training data. You do not. What you see instead is synthesis. Concepts are recombined, abstracted, and adapted to context. The system can explain the same idea multiple ways, shift tone, handle novel prompts, and connect ideas that were never explicitly paired in the source material. That is fundamentally different from reading something out loud.

Your analogy fails because it assumes nothing is being transformed. In reality, transformation is the entire mechanism. Information is compressed into weights and then expanded into new outputs.

Is it human intelligence. No. Is it perfect. No. But reducing it to “just reading Wikipedia out loud” is not skepticism. It is a basic failure to understand how the technology works.

If you are going to criticize something, at least learn what it is first.

[–] lordbritishbusiness@lemmy.world 2 points 21 minutes ago

Counterpoint: Why should they learn about it?

It is a good thing to reduce ignorance, but there is more to learn in the world than there is time to learn or space in the brain. People must specialise.

You must accept that not everyone will understand everything, and this is okay.

The nature of a Large Language Model is very specialist knowledge, data regurgitation is apt from a distance, especially when most publically available models are primarily used for search.

Criticism must be accepted, even from those who do not understand, so long as it's in good faith. It is after all an opportunity to reduce ignorance to someone with the time and interest to learn.

Don't rudely lord your intelligence over someone else, it might not end well, and invalidates the delivery of your entire argument.

[–] PhoenixDog@lemmy.world 0 points 1 hour ago* (last edited 1 hour ago)

This might be the most comprehensive comment I've ever read about someone saying how utterly stupid they are to the world. It's incredibly impressive how articulate you described your absolute lack of critical thinking.

It's almost like intentionally shooting yourself in the nuts, and openly releasing the video of it saying you promote gun safety.

[–] Bubbaonthebeach@lemmy.ca 19 points 18 hours ago (1 children)

I tend to be anti-AI because it doesn't seem to me to be anything other than a super fast regurgitator of data. If a database can be searched for an answer, AI can do that faster than a human. However it doesn't to seem to be able to take some portion of that database, understand it, and then use that information to solve a novel problem.

[–] cmhe@lemmy.world 15 points 15 hours ago (2 children)

Well... It cannot even search databases without errors.

LLMs just produce plausible replies in natural languages very quickly and this is useful in certain situations. Sometimes it helps humans getting started with a task, but as it is now, it cannot replace them. As much as the capital class want it, and sink our money into it.

[–] fruitycoder@sh.itjust.works 3 points 7 hours ago

The better setup generate "semantic embeddings" that try to map how data stored relate to each other (by mapping how to it related within in its own weights and biases). That and knowledge graph look ups in which the links between different articles of data are evaluated in the same way.

The very expensive LLM portion really do just give rough aproximations of information language in that setup

[–] jj4211@lemmy.world 2 points 7 hours ago

Yes, the key thing is it might have extracted useful info from otherwise confusing data, it might have mixed up info from the data incorrectly or it might have just made it up.

So it can be useful, if you can then validate the info provided in more traditional means, but it's dubious as a first pass, and sometimes surprisingly bad when it's a scenario you thought it would work well at.

[–] SaraTonin@lemmy.world 39 points 1 day ago (3 children)

Tell me again how AGI is just around the corner, Sam

[–] Tollana1234567@lemmy.today 4 points 14 hours ago

just when he had to shut down sora, because making ai videos is too expensive.

[–] XLE@piefed.social 10 points 19 hours ago (1 children)

"Sam Altman claims AGI is coming in 2025 and machines will be able to 'think like humans' when it happens"

[–] pyre@lemmy.world 5 points 6 hours ago (1 children)

to be fair, he's not human so he's just guessing based on his observations earth as a demon

[–] Corngood@lemmy.ml 2 points 5 hours ago

machines will be able to 'think like humans' when it happens

Maybe AGI is just a brain-destroying pandemic?

[–] Vupware@lemmy.zip 17 points 1 day ago (3 children)

When Sammy fuck says “we’re so close to AGI, I can just feel it. Like a tingle on the tip of my shrimpdick it’s getting so close to blossoming into something guys”, just ignore him. He’s crazy man!

[–] kilgore_trout@feddit.it 2 points 9 hours ago

a tingle on the tip of my shrimpdick

mhh that's erotic ASMR on Youtube

load more comments (2 replies)

[–] arcine@jlai.lu 13 points 23 hours ago (2 children)

Try spelling things phonetically (example: faux net tick alley), that's one of my benchmarks that AI fails almost every time.

If the input is at all long, or purposefully includes a lot of words about a specific unrelated theme to the coded message, it's impossible.

[–] bss03@infosec.pub 2 points 9 hours ago (1 children)

Wait, I thought phonetically (example: papa hotel oscar novermber echo tango india charle alfa lima lima yankee) meant using a phonetic alphabet, not using word(s) with the same Soundex encoding.

[–] ugandan_airways@lemmy.zip 3 points 9 hours ago (1 children)

Hooked on phonics.

[–] bss03@infosec.pub 2 points 9 hours ago (1 children)

Yeah, there was some phonics in my primary school education, and I continue to approach new words in that way sometimes. But, they said Phonetically.

[–] gozz@lemmy.world 2 points 4 hours ago

Phonetics is the study of speech sounds. The phonetic alphabet is called that because each letter/word in the alphabet was chosen to be one that started with the corresponding phoneme and that the set of words were between them phonetically unambiguous. Phonics is a way of teaching reading and writing that is based on the phonetics of words and how they relate to the written form.

load more comments (1 replies)

[–] HertzDentalBar@lemmy.blahaj.zone 29 points 1 day ago

It's almost as if a chatbot isn't actually thinking.

load more comments