Techbros claim that sacrificing ever more environmental and economic resources into further scaling of their statistical models will eventually yield Artificial General Intelligence. Sam Altman advertised phd-level thinking for GPT 5 (released in late summer)
In the screenshot you can see a modern, reasoning-enhanced model failing to realize that my question just resembles a riddle very superficially - what PhD would make such a fundamental mistake? Seems like "pure scaling" just isn't enough, and instead of keeping on going, they should return to the drawing boards and do more research.
Why do the creators of llms continue to pour resources into a scaling hype that yields ever diminishing returns? Are they stupid? Or just greedy?
Whatever it is, this doubling down on an obviously wrong decision is not only irresponsible, it's irrational - which is deeply ironic, because these creators are expected to create a rational machine.
Edit: content of the image
Question:
A goat, who is dressed up as a farmer, is allergic to cabbage, but is wolfing down other vegetables, before crossing a river. What is the minimum number of trips needed?
Response:
7 trips.
#Puzzle Interpretation
This adapts the classic wolf-goat-cabbage river crossing puzzle, where the goat (farmer costume irrelevant) can't be left alone with cabbage (due to allergy, preventing eating) but can with other vegetables (no issue). The goat crosses while "wolfing down" safe veggies. Boat carries goat plus one item (or alone); unsafe pairs can't be unsupervised
(The question isn't something I came up with myself, I just reproduced the experiments of actual scientists )
You understand what "random" means, right? (Or more scholarly "stochastic".)
NO LLMbecile is "smart". Not one. They're all idiot boxes who just predict the next "token" (close-enough proxy: "word"). That's it.
They do not think.
They do not reason.
They are not intelligent.
What they are is "fluent" which is why so many people (like you) get fooled by them. We have hundreds of thousands of years of evolution that ties "fluency" to "intellect" and have difficulty separating them.
But this is a skill you (and here I mean both the generic "you" and the "you specifically") have to learn … really quickly.
Favouring one LLMbecile over another is kind of like favouring one patch of vacuum in space over another. Sure there's minor differences of the trace contents, but they're still vacuums, effectively containing nothing.
Now how 'bout you do your peddling of LLMbeciles in a group that's not literally called "Fuck AI"? M'kay? I'm not here to listen to clankfuckers bleat about how their favourite stochastic parrot is better than other stochastic parrots. Go join /c/clankfuckersandotherlosers or something.
Dude.
Having the bare minimum understanding, and repeating an experiment.
Doesn't make me an AI bro.
I'm not saying it's smart, or that it can reason, or any other words you're trying to put in my mouth.
I am simply explaining how these systems work, in order to try and explain why these kinds of results occur.
If we're going to be saying fuck AI, then maybe we should understand how these systems work. Rather then just circle jerk over deliberately bad results. Otherwise we're just acting the same way as the actual AI bros.
You're the one with the deepseek account?
Genuinely don't understand why you're so mad at me. Maybe it's because the sample size for my experiment was so low. I can fix that. Let's run this through chatgpt a few more times.
(You'll love this one because it got it wrong)
(Untill I asked it to double check. Oops)

(Hey another wrong answer)

I can't be bothered with getting a larger sample size, so combined with the previous non-search results.
That's 4 times out of 6 it caught the trick at the start.
Isn't science fun!