Gemini 3 (Fast) got it right for me; it said that unless I wanna carry my car there it's better to drive, and it suggested that I could use the car to carry cleaning supplies, too.
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
Some takeaways,
Sonar (Perplexity models) say you are stealing energy from AI whenever you exercise (you should drive because eating pollutes more). ie gets right answer for wrong reason.
US humans, and 55-65 age group, score high on international scale probably for same reasoning. "I like lazy".
I hope this is satire.
I asked my locally hosted Qwen3 14B, it thought for 5 minutes and then gave the correct answer for the correct reason (it did also mention efficiency).
Hilariously one of the suggested follow ups in Open Web UI was "What if I don't have a car - can I still wash it?"
My locally hosted Qwen3 30b said “Walk” including this awesome line:
Why you might hesitate (and why it’s wrong):
- X “But it’s a car wash!” -> No, the car doesn’t need to drive there—you do.
Note that I just asked the Ollama app, I didn’t alter or remove the default system prompt nor did I force it to answer in a specific format like in the article.
A follow up I got from my Open WebUI was "Is walking the car to the wash safer than driving it there?"
My kid got it wrong at first, saying walking is better for exercise, then got it right after being asked again.
Claude Sonnet 4.6 got it right the first time.
My self-hosted Qwen 3 8B got it wrong consistently until I asked it how it thinks a car wash works, what is the purpose of the trip, and can that purpose be fulfilled from a distance. I was considering using it for self-hosted AI coding, but now I’m having second thoughts. I’m imagining it’ll go about like that if I ask it to fix a bug. Ha, my RTX 4060 is a potato for AI.
There's a difference between 'language' and 'intelligence' which is why so many people think that LLMs are intelligent despite not being so.
The thing is, you can't train an LLM on math textbooks and expect it to understand math, because it isn't reading or comprehending anything. AI doesn't know that 2+2=4 because it's doing math in the background, it understands that when presented with the string 2+2=, statistically, the next character should be 4. It can construct a paragraph similar to a math textbook around that equation that can do a decent job of explaining the concept, but only through a statistical analysis of sentence structure and vocabulary choice.
It's why LLMs are so downright awful at legal work.
If 'AI' was actually intelligent, you should be able to feed it a few series of textbooks and all the case law since the US was founded, and it should be able to talk about legal precedent. But LLMs constantly hallucinate when trying to cite cases, because the LLM doesn't actually understand the information it's trained on. It just builds a statistical database of what legal writing looks like, and tries to mimic it. Same for code.
People think they're 'intelligent' because they seem like they're talking to us, and we've equated 'ability to talk' with 'ability to understand'. And until now, that's been a safe thing to assume.
A person who posted after you is using 14B and got the correct answer.
I just tried it on Braves AI 
The obvious choice, said the motherfucker 😆
This is why computers are expensive.
Dirtying the car on the way there?
The car you're planning on cleaning at the car wash?
Like, an AI not understanding the difference between walking and driving almost makes sense. This, though, seems like such a weird logical break that I feel like it shouldn't be possible.
You're assuming AI "think" "logically".
Well, maybe you aren't, but the AI companies sure hope we do
and what is going to happen is that some engineer will band aid the issue and all the ai crazy people will shout “see! it’s learnding!” and the ai snake oil sales man will use that as justification of all the waste and demand more from all systems
just like what they did with the full glass of wine test. and no ai fundamentally did not improve. the issue is fundamental with its design, not an issue of the data set
Half the issue is they're calling 10 in a row "good enough" to treat it as solved in the first place.
A sample size of 10 is nothing.
Frankly would like to see some error bars on the "human polling". How many people rapiddata is polling are just hitting the top or bottom answer?
Ai is not human. It does not think like humans and does not experience the world like humans. It is an alien from another dimension that learned our language by looking at text/books, not reading them.
It's dumber than that actually. LLMs are the auto complete on your cellphone keyboard but on steroids. It's literally a model that predicts what word should go next with zero actual understanding of the words in their contextual meaning.
and a large chunk of human beings have no understanding of contextual meaning, so it seems like genius to them.
Hey LLM, if I have a 16 ounce cup with 10oz of water in it and I add 10 more ounces, how much water is in the cup?
What a great idea! Would you like me to write up a business plan for your new water company?
They didn’t take into account the “thinking mode” most model pass when thinking is activated
Sure they did. They even had a notation on the results table that grok passed expect when reasoning mode was off.
ETA: they even posted all the reasoning texts for the models they tested