this post was submitted on 24 Jun 2026

282 points (98.0% liked)

Technology

85719 readers

4298 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

282

Advanced AI models suffer a near-total collapse on classic psychology test as cognitive demands increase (www.psypost.org)

submitted 2 days ago by sanitation@lemmy.today to c/technology@lemmy.world

107 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] khornechips@sh.itjust.works 16 points 2 days ago (1 children)

So… last week then?

[–] communist@lemmy.frozeninferno.xyz -4 points 1 day ago (3 children)

I get that you hate AI but there's no reason to lie about its capabilities.

[–] criss_cross@lemmy.world 10 points 1 day ago (2 children)

A lot of tools like Claude or ChatGPT have internal tools they call when they do math (or use a python script) rather than have the model actually compute anything.

The underlying tech itself can’t do it because you can’t do math by token probability.

[–] SpaceDuck@feddit.org 1 points 1 day ago

Is that relevant? Mathematicians will use tools and computers that calculate for them too. Are we saying they should all do it in their heads?

[–] communist@lemmy.frozeninferno.xyz 1 points 1 day ago

Whether they use tools to do it or not is entirely unimportant, that's just how they do it?

[–] expr@programming.dev 9 points 1 day ago (1 children)

That's not lying. There's nothing linguistic about numerical computation.

[–] communist@lemmy.frozeninferno.xyz -2 points 1 day ago (1 children)

No.

https://www.nature.com/articles/d41586-025-02343-x

It's lying

[–] zbyte64@awful.systems 1 points 1 day ago (1 children)

You know the "DeepMind and OpenAi models" is the hint that the LLM model is not the one doing the math. The LLM provides a hypothesis and the DeepMind model provides grounding or feedback on whether the hypothesis even makes sense or works.

[–] communist@lemmy.frozeninferno.xyz 1 points 1 day ago (1 children)

It is totally irrelevant that the model calls tools to do the math. That is still a success.

[–] zbyte64@awful.systems 1 points 19 hours ago* (last edited 19 hours ago) (1 children)

It's relevant to what the parent was saying about LLMs. The success of the LLM in using mathematical tools does not contradict what they were saying. To then accuse them of lying because of a misunderstanding is... bad form.

[–] communist@lemmy.frozeninferno.xyz 1 points 16 hours ago

It does the math, it just uses a calculator.

[–] kayohtie@pawb.social 4 points 1 day ago (2 children)

All of these features are not something the models themselves can do, but are grafted on.

I could easily write a Home Assistant automation pattern matching for nearly every way someone could say "how many Rs are in strawberry", depluralize a plural letter, and run it against "wc" in a bash terminal.

That doesn't mean it's smarter. It's that I've added something specific to it.

MCP and the like is just that too, gluing on functions or the ability to hopefully invoke a function. That's why so many hilariously mundane ones exist.

At the core, it's still a large language model: a statistical model of frequency of word and word chunk (token) patterns.

Sometimes one model can invoke another via that tooling but it's still a grafting on. It isn't a singular thing or system, but disjointed pieces so completely detached from how brains work.

This isn't AI hate, it's reality. I love the field of artificial intelligence and machine learning. It's cool as hell. But an LLM is fundamentally incapable of being anything more than an LLM with glued on pieces that invoke functionality.

OpenAI saw people mock the inability to count so they wrote a specialized tool to count letters and glued it on.

The world is full of endless edge cases. The inability to simply resolve them without gluing on every single one means it just isn't doing anything new.

[–] MangoCats@feddit.it 3 points 1 day ago

I believe the progress of the last year is largely attributable to the appropriate "grafting on" of these wrappers around the LLM cores.

[–] communist@lemmy.frozeninferno.xyz 0 points 1 day ago (1 children)

They regularly win olympiad mathematics up from not standing a chance and just created a novel solution to the erdos conjecture, them counting the r's in strawberry is inconsequential but also something they can do even if you just use the raw api or a local model.

[–] zbyte64@awful.systems 3 points 1 day ago (1 children)

Using computers to search for a counter example to a conjecture isn't exactly new ground and I suspect they did so with the aide of some harness tweaks like some numerical LSP. Like cool, it pushed the envelope but like what the parent said, they grafted on the ability to do a specific task.

[–] communist@lemmy.frozeninferno.xyz 0 points 1 day ago* (last edited 1 day ago) (1 children)

That doesn't change the fact that llm's are capable of acing math olympiads. So what if it uses tools? You probably would too. I doubt anybody there did it without a calculator.

https://www.nature.com/articles/d41586-025-02343-x

[–] zbyte64@awful.systems 1 points 21 hours ago* (last edited 21 hours ago) (1 children)

Aren't you the least bit curious what tools they gave the LLM and how the LLM used those tools? It's like back in math class you are asked to solve a quadratic formula but you forgot how. So you use the calculator to try different numbers and the calculator is telling you if you are getting closer. Sure I got the right answer, but it's hardly a testament to my math skills.

[–] communist@lemmy.frozeninferno.xyz 1 points 16 hours ago* (last edited 16 hours ago) (1 children)

The calculator does not tell them if they're getting closer? This isn't how anything works. No I can't say I'm very interested in whether or not the llm has access to python/a calculator as long as it completes the task, that doesn't matter.

[–] zbyte64@awful.systems 1 points 15 hours ago (1 children)

If you are not interested in how it completes the task then you are not an authority on how it works.

[–] communist@lemmy.frozeninferno.xyz 1 points 13 hours ago* (last edited 13 hours ago) (1 children)

I'm academically interested, what I mean when I say I'm not interested is that I just don't see the significance when we're talking about if it's capable of the task.

[–] zbyte64@awful.systems 1 points 12 hours ago (1 children)

How are you able to understand it's capability without understanding what tools it is capable of manipulating to effect?

[–] communist@lemmy.frozeninferno.xyz 1 points 11 hours ago (1 children)

You aren't, and that's exactly what I'm saying, it's capable of doing these things with tools, therefore it's capable of doing these things.

[–] zbyte64@awful.systems 1 points 7 hours ago (1 children)

So why are you allergic to people talking about the quality of the tools in regards to capability?

[–] communist@lemmy.frozeninferno.xyz 1 points 7 hours ago (1 children)

I don't know what you mean, I wasn't the one who claimed they couldn't do something they clearly can.

[–] zbyte64@awful.systems 1 points 7 hours ago (1 children)

You are the one collapsing tool use into a binary when there are varying degrees of competency and hand holding.

[–] communist@lemmy.frozeninferno.xyz 1 points 5 hours ago

I am not, you inaccurately said that the math olympiad was not bested by llm's because they had a tool that told them if they were close but incorrect and can just try an infinite number of times. This is incorrect, they had a number of tries with python. This just isn't a true statement. I think them besting it with use of python is equally significant and still counts as them besting it, and saying they can't do math work is absurd.