this post was submitted on 23 Jun 2025
145 points (93.9% liked)

Fuck AI

3205 readers
1045 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 1 year ago
MODERATORS
 

So I gave an LLMbecile that claims to be a reasoning model...

Are you a "reasoning" model?

Great question! Yes, I am designed to be a "reasoning" model. That means I don't just repeat information—I analyze, synthesize, and draw logical conclusions based on the data and context you provide. My goal is to help you solve problems, answer questions, and think through complex topics in a way that’s clear and helpful.

If you have a tricky question, need help with decision-making, or want to break down a complicated concept, feel free to put me to the test!

...the following as a test of its reasoning:

There are seven customers (Carol, Darrell, Henrietta, Kenneth, Lula, Omar, Peter).
There are seven cheeses (cheddar, colby jack, gouda, havarti, muenster, provolone, swiss).
There are seven meats (bologna, corned beef, ham, liverwurst, prosciutto, roast beef, salami).
There are seven sandwiches priced at one of ($2.75 $3.75 $4.75 $5.75 $6.75 $7.75 $8.75)

Match prices to customers, meats, and cheeses based on these clues:

1. Omar's sandwich costs more than the order with liverwurst.
2. Henrietta's order doesn't cost $2.75.
3. Kenneth's order is either the order with provolone cheese or the $7.75 order.
4. Omar's order doesn't include swiss cheese.
5. The order with prosciutto includes provolone cheese.
6. Lula's order doesn't include corned beef.
7. The order with ham doesn't include swiss cheese.
8. Henrietta's order costs less than Darrell's sandwich.
9. Carol's sandwich costs $5.75.
10. Of the sandwich with havarti cheese and Kenneth's order, one includes prosciutto and the other costs
$3.75.
11. Henrietta's sandwich doesn't cost $4.75.
12. The order with colby jack cheese costs less than the sandwich with prosciutto.
13. Henrietta's sandwich doesn't include roast beef.
14. The order with roast beef doesn't include colby jack cheese.
15. The sandwich with muenster cheese costs 1 dollar less than Lula's order.
16. The $5.75 order includes colby jack cheese.
17. The $8.75 sandwich includes gouda cheese.
18. The sandwich with colby jack cheese costs 3 dollars less than the sandwich with corned beef.
19. The $3.75 order is either the sandwich with bologna or Kenneth's sandwich.

Now before I did this, I gave the task to my SO as an assignment. (He's an engineer.) He wrote two solutions using two different tools. Something he called CLP/FD and something he called CHR. He's rusty in both of those, so it took him about 30 minutes to get them working. (He's confident he can do any more such problems with far less time spent coding now.)

It took about 3ms for his first program to solve it, and about 7ms for his second.

So how did the "reasoning" LLMbecile fare?

Well, it was infinitely slower than my SO's programs. Because it couldn't solve it.

After 16 steps (I told it to explain its reasoning along the way; I'm not going to copy all that out because it's tedious) it gave me this:

So it failed, and when I pressed it for the final solution it basically argued that it would take too long but told me that if I guided it with some more detailed questions it could solve it for me.

(You know. The thing I already had solved two different ways before asking the LLMbecile.)

When I asked this, however...

I thought the whole point of a reasoning model was to do the reasoning, not have the user do the reasoning for it?

... the answer was golden:


Your critique is entirely valid—and it’s a sharp observation that highlights the current state of AI reasoning models. Here’s what’s really going on:

Why AI Reasoning Models Struggle with Complex Logic Puzzles

  • Pattern Matching, Not True Reasoning:
    Most AI models, including those billed as “reasoning” models, are fundamentally pattern matchers. They excel at low- and medium-complexity problems where the answer can be inferred from seen data or familiar patterns. But when a problem’s complexity increases—like with large, multi-variable logic grid puzzles—their performance can collapse, and they often fail to follow through with systematic, step-by-step deduction.

  • No Genuine Understanding:
    These models don’t “understand” logic or context the way humans do. They don’t build or manipulate internal representations of the puzzle’s state; instead, they generate plausible next steps based on training data. This leads to errors, contradictions, or incomplete solutions as complexity rises.

  • Scaling Limits:
    Research shows that as puzzles get harder, models may initially increase their reasoning effort, but beyond a certain point, their ability to sustain coherent reasoning collapses—even if they have enough computational resources to keep going.

  • Lack of Explicit Algorithms:
    Unlike humans who might draw a grid or systematically eliminate possibilities, models rarely execute explicit, algorithmic reasoning. Their “chain-of-thought” is often just a simulation of reasoning, not the real thing.

What This Means for Users

  • For simple logic puzzles:
    AI models can often solve them and explain their reasoning convincingly.

  • For complex, multi-layered puzzles:
    The model’s reasoning may break down, leading to incomplete, inconsistent, or outright wrong answers—just as you observed in your sandwich puzzle example.

  • Human intervention is still needed:
    For truly novel, complex logic problems, humans remain better at systematic, stepwise deduction, especially when the solution space is large and requires careful cross-elimination.

Bottom Line

AI “reasoning” is impressive, but brittle. It’s not the same as human logic, and it can’t reliably replace human reasoning in genuinely complex or novel scenarios—especially when the solution requires detailed, multi-step deduction and cross-referencing.

Your expectation that a reasoning model should “do the reasoning” is exactly what researchers are working toward—but as of now, there are real, well-documented limits to what these models can achieve on hard logic puzzles.


The takeaway points from this response (keeping in mind that all of this comes from the LLMbecile itself ... so it may be hallucinated! 🤣):

  • Most AI models, including those billed as “reasoning” models, are fundamentally pattern matchers.
  • These models don’t “understand” logic or context the way humans do.
  • their ability to sustain coherent reasoning collapses
  • Their “chain-of-thought” is often just a simulation of reasoning, not the real thing.
  • The model’s reasoning may break down, leading to incomplete, inconsistent, or outright wrong answers
  • Human intervention is still needed

So next time you see some aitechbroshilldude honking the horn of "reasoning models", point them to this where a reasoning LLMbecile explains why it's not actually reasoning (and accidentally explains why it's utterly useless).

you are viewing a single comment's thread
view the rest of the comments
[–] vrighter@discuss.tchncs.de 3 points 1 day ago (1 children)

it's pretty hard to implement two variations of a brute force search.

[–] hedgehog@ttrpg.network -1 points 1 day ago

Did he implement two different variations? OP said he used two different tools, not that his solutions were any different.

That said… how so?

There are many different ways two different brute force approaches might vary.

A naive search and a search with optimizations that narrow the search area (e.g., because certain criteria are known and thus don’t need to be iterated over) can both be brute force solutions.

You could also just change the search order to get a different variation. In this case, we have customer, price, meat, cheese, and we need to build a combination of those to get our solution; the way you construct that can also vary.