this post was submitted on 07 Apr 2025
38 points (100.0% liked)

TechTakes

1785 readers
113 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS
 

"Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as "trivial", even when their validity was crucial."

you are viewing a single comment's thread
view the rest of the comments
[–] swlabr@awful.systems 26 points 1 week ago* (last edited 1 week ago) (3 children)

You didn't link to the study; you linked to the PR release for the study. This and this are the papers linked in the blog post.

Note that the papers haven't been published anywhere other than on Anthropic's online journal. Also, what the papers are doing is essentially tea leaf reading. They take a look at the swill of tokens, point at some clusters, and say, "there's a dog!" or "that's a bird!" or "bitcoin is going up this year!". It's all rubbish dawg

[–] bitofhope@awful.systems 18 points 1 week ago (1 children)

To be fair, the typesetting of the papers is quite pleasant and the pictures are nice.

[–] froztbyte@awful.systems 10 points 1 week ago

they gotta make up for all those scary cave-wall pictures somehow

[–] swlabr@awful.systems 9 points 1 week ago

It's an anti-fun version of listening to dark side of the moon while watching the wizard of oz.