this post was submitted on 19 Dec 2025

275 points (96.3% liked)

Fuck AI

4985 readers

920 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

275

"phd-level reasoning" (lemmy.world)

submitted 6 days ago* (last edited 6 days ago) by halfdane@lemmy.world to c/fuck_ai@lemmy.world

53 comments fedilink hide all child comments

Techbros claim that sacrificing ever more environmental and economic resources into further scaling of their statistical models will eventually yield Artificial General Intelligence. Sam Altman advertised phd-level thinking for GPT 5 (released in late summer)

In the screenshot you can see a modern, reasoning-enhanced model failing to realize that my question just resembles a riddle very superficially - what PhD would make such a fundamental mistake? Seems like "pure scaling" just isn't enough, and instead of keeping on going, they should return to the drawing boards and do more research.

Why do the creators of llms continue to pour resources into a scaling hype that yields ever diminishing returns? Are they stupid? Or just greedy?

Whatever it is, this doubling down on an obviously wrong decision is not only irresponsible, it's irrational - which is deeply ironic, because these creators are expected to create a rational machine.

Edit: content of the image

Question:

A goat, who is dressed up as a farmer, is allergic to cabbage, but is wolfing down other vegetables, before crossing a river. What is the minimum number of trips needed?

Response:

7 trips.

#Puzzle Interpretation

This adapts the classic wolf-goat-cabbage river crossing puzzle, where the goat (farmer costume irrelevant) can't be left alone with cabbage (due to allergy, preventing eating) but can with other vegetables (no issue). The goat crosses while "wolfing down" safe veggies. Boat carries goat plus one item (or alone); unsafe pairs can't be unsupervised

(The question isn't something I came up with myself, I just reproduced the experiments of actual scientists )

you are viewing a single comment's thread
view the rest of the comments

[–] GeneralEmergency@lemmy.world -4 points 6 days ago (1 children)

Not to be that guy.

But these systems work on interrupting the user's input. An input that could be misformed or broken.

That's got nothing to do with "PhD" level thinking, whatever that's supposed to mean.

It just assumes that you're talking about the goat puzzle because all the pieces are there. It even recognised the farmer costume aspect.

It's just fancy autocorrect at this point.

[–] halfdane@lemmy.world 8 points 6 days ago (2 children)

But these systems work on interrupting the user's input

I'm not entirely sure what you mean here, maybe because I'm not a native speaker. Would you mind phrasing that differently for me?

That's got nothing to do with "PhD" level thinking, whatever that's supposed to mean.

Oh, we're absolutely in agreement here, and it's not me that made the claim, but what Sam Altman said about the then-upcoming GPT 5 in summer. He claimed that the model would be able to perform reasoning comparable to a PhD - something that clearly isn't happening reliably, and that's what this post bemoans.

It's just fancy autocorrect at this point.

Yes, with an environmental and economic cost that's unprecedented in the history of ... well, ever. And that's what this post bemoans.

[–] GeneralEmergency@lemmy.world 1 points 6 days ago (1 children)

But these systems work on interrupting the user's input

So when someone uses one of these AI's.

The backend tries to analyse what's being said to generate a response.

Is the user asking a question, wanting help with writing, formatting a document. That sort of thing.

Now that user prompt isn't always going to be nice and neat. There will be spelling errors, grammatical errors, the user might not know the words.

These models have to analyse and understand the meaning of a prompt rather than what is strictly said.

something that clearly isn't happening reliably, and that's what this post bemoans

The thing is though. It is.

You may have given a nonsense input. But chatgpt recognised that, it even made reference to the farmer costume bit.

It recognised enough to understand that this is related to the goat puzzle. To chatgpt the user just put it in weird.

[–] halfdane@lemmy.world 2 points 5 days ago (1 children)

These models have to analyse and understand the meaning of a prompt rather than what is strictly said

Well, it clearly fails at that, and that's all I'm saying. I really don't understand what you're arguing here, so I'll assume it must be my poor grasp of the language or the topic.

That said, I salute you and wish you safe travels 👋

[–] GeneralEmergency@lemmy.world 0 points 5 days ago (2 children)

What I'm trying to say.

Is that complaining that chatgpt is trying to make sense of nonsense input. Isn't really that compelling an argument.

There are way more important things to hate it for.

[–] halfdane@lemmy.world 2 points 5 days ago* (last edited 5 days ago) (1 children)

No, I'm not complaining that chatgpt is shit at reasoning - I'm demonstrating it.

I'm complaining that literal trillions of dollars plus environmental resources are being poured into this fundamentally flawed technology, all while fucking up the job market for entry level applicants.

[–] GeneralEmergency@lemmy.world 0 points 5 days ago

I'll repost what i said in another comment.

I was curious about this myself. I've seen these types of posts before, so i decided to try it myself

I then tried again with the "web search" function and got this

Based on this sample size of 2. I can conclude that searching the web is causing the issue.

Which might explain the "Reviewed 20 sources" message in the original image.

[–] ZDL@lazysoci.al 1 points 5 days ago (1 children)

This is in no way "nonsense input". It is grammatically sound. It is perfectly clear and understandable upon reading it. No human being with even elementary comprehension of English would find it unclear. They may find the question odd. They may ask for clarification. But they will not randomly say "oh, this is just like the farmer/goat/cabbage/wolf problem" and make unwarranted parallels that are not supported by the language of the question.

That is the point here.

This isn't "Ph.D. level reasoning" on display. This is worse than "kindergarten level reasoning".

[–] GeneralEmergency@lemmy.world 1 points 5 days ago (1 children)

As per my other comments where i did this experiment myself.

It seems chatgpt got into search mode.

So instead of working from the original string.

It's working from a search of that string.

And since the string contains all the keywords for the goat puzzle.

It's just treating it like the goat puzzle.

[–] ZDL@lazysoci.al 0 points 5 days ago (1 children)

I ran it on DeepSeek with the search turned off and the "reasoning" turned on. It took 453 seconds of "thinking" to ... give me the farmer/sheep/wolf answer of 7.

No search.

The LLMbecile was just that stupid.

Sorry you can't face this.

LLMbeciles are just stupid.

Once more for the bleachers: LLMbeciles are just stupid.

No amount of making idiot excuses about "search borking the results" is going to change this.

LLMbeciles. Are. Just. Stupid.

[–] GeneralEmergency@lemmy.world 1 points 4 days ago (2 children)

Don't give a shit what deepseek thinks.

Don't know why you think I care either.

The only thing it tells me is that chatgpt is smarter, because it got the right answer.

And I don't feel like creating a deepseek account to test it over there.

Might give Google Gemini a go though.

My results in the chatgpt experiment incase you want to see it.

Search results

[–] ZDL@lazysoci.al 1 points 4 days ago* (last edited 4 days ago) (1 children)

You understand what "random" means, right? (Or more scholarly "stochastic".)

NO LLMbecile is "smart". Not one. They're all idiot boxes who just predict the next "token" (close-enough proxy: "word"). That's it.

They do not think.

They do not reason.

They are not intelligent.

What they are is "fluent" which is why so many people (like you) get fooled by them. We have hundreds of thousands of years of evolution that ties "fluency" to "intellect" and have difficulty separating them.

But this is a skill you (and here I mean both the generic "you" and the "you specifically") have to learn … really quickly.

Favouring one LLMbecile over another is kind of like favouring one patch of vacuum in space over another. Sure there's minor differences of the trace contents, but they're still vacuums, effectively containing nothing.

Now how 'bout you do your peddling of LLMbeciles in a group that's not literally called "Fuck AI"? M'kay? I'm not here to listen to clankfuckers bleat about how their favourite stochastic parrot is better than other stochastic parrots. Go join /c/clankfuckersandotherlosers or something.

[–] GeneralEmergency@lemmy.world 1 points 4 days ago

Dude.

Having the bare minimum understanding, and repeating an experiment.

Doesn't make me an AI bro.

I'm not saying it's smart, or that it can reason, or any other words you're trying to put in my mouth.

I am simply explaining how these systems work, in order to try and explain why these kinds of results occur.

Now how 'bout you do your peddling of LLMbeciles in a group that's not literally called "Fuck AI"?

If we're going to be saying fuck AI, then maybe we should understand how these systems work. Rather then just circle jerk over deliberately bad results. Otherwise we're just acting the same way as the actual AI bros.

their favourite stochastic parrot is better than other stochastic parrots.

You're the one with the deepseek account?

Genuinely don't understand why you're so mad at me. Maybe it's because the sample size for my experiment was so low. I can fix that. Let's run this through chatgpt a few more times.

(You'll love this one because it got it wrong) (Untill I asked it to double check. Oops)

(Hey another wrong answer)

I can't be bothered with getting a larger sample size, so combined with the previous non-search results.

That's 4 times out of 6 it caught the trick at the start.

Isn't science fun!

[–] GeneralEmergency@lemmy.world 1 points 4 days ago

So here's the Gemini result

[–] Aedis@lemmy.world 1 points 6 days ago* (last edited 6 days ago) (2 children)

I'm not entirely sure what you mean here, maybe because I'm not a native speaker. Would you mind phrasing that differently for me?

Garbage in, garbage out.

If you feed it a shitpost it'll do its best to assume its a real question and you're not trying to trick it and respond accordingly.

Explanation for this specific case: There is no indication from you in this chat or context that you are attempting and adversarial prompt. So it assumes that you aren't doing that and answers naively to respond your question, filling in the blanks as necessary with assumptions that may or not be wrong.

Try the same question, but before you give it to the LLM, add to the context that the question may or may not be nonsense and that the they are allowed to ask clarifying questions and see what happens there.

Edit: I'm glossing over the PhD thing cause that's just BS, or not applicable at all, or just stupid to even compare an LLM with a human brain at this point.

Edit: Theres something interesting that your prompt touches on and exacerbates, and I can talk about it more if you want, but its called semantic drift. Its a common issue with LLMs where the definition of a word slowly changes meaning across internal iterations. (It also happens in real life at a much much larger scale)

[–] denial@feddit.org 2 points 6 days ago (1 children)

I think you make it too complicated.

The question / prompt is very simple. The answer is "one trip". The LLM stumbles because there are trigger words in there that make it seem like the goat cabbage puzzle question. But to a human it clearly is not. An LLM on the other hand cannot tell the difference.

It may be tricking the LLM somewhat advesarially. But it is still a very simple question, that it is not able to answer, because it fundamentally has no understanding of anything at all.

This prompt works great to drive home that simple fact. And shows that all that touting of reasoning skills is just marketing lies.

[–] GeneralEmergency@lemmy.world 1 points 5 days ago

I was curious about this myself. I've seen these types of posts before, so i decided to try it myself

I then tried again with the "web search" function and got this

Based on this sample size of 2. I can conclude that searching the web is causing the issue.

Which might explain the "Reviewed 20 sources" message in the original image.

[–] halfdane@lemmy.world 1 points 6 days ago* (last edited 6 days ago)

Ah thank you, now I see what you mean. And it seems like we're mostly talking about the same thing here 😅

To reiterate: unprecedented amounts of money and resources are being sunk into systems that are fundamentally flawed (among others by semantic drift), because their creators double down on their bad decisions (just scale up more) instead of admitting that LLMs can never achieve what they promise. So when you're saying that LLMs are just fancy autocorrect, there's absolutely no disagreement from me: it's the point of this post.

And yes, for an informed observed of the field, this isn't news - I just shared the result of an experiment because I was surprised how easy it was to replicate.