Multiple LLMs voting together on content validation catch each other’s mistakes to achieve 95.6% accuracy. (arxiv.org)

submitted 3 weeks ago by Lugh@futurology.today to c/futurology@futurology.today

27 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] dustyData@lemmy.world 9 points 3 weeks ago

Not a very good, or easy comparison to make. Against the average, sure, the AI is above the average. But a domain expert like a doctor or an accountant is way much more accurate than that. In the 99+% range. Sure, everyone makes mistakes. But when we are good at something, we are really good.

Anyways this is just a ridiculous amount of effort and energy wasted just to reduce hallucinations to 4.4%.

[-] Lugh@futurology.today 3 points 3 weeks ago

But a domain expert like a doctor or an accountant is way much more accurate

Actually, not so.

If the AI is trained on narrow data sets, then it beats humans. There's quite a few examples of this recently with different types of medical expertise.

[-] dustyData@lemmy.world 8 points 3 weeks ago* (last edited 3 weeks ago)

Cool, where are the papers?

[-] Lugh@futurology.today -2 points 3 weeks ago

Large language models surpass human experts in predicting neuroscience results

A small study found ChatGPT outdid human physicians when assessing medical case histories, even when those doctors were using a chatbot.

[-] massive_bereavement@fedia.io 7 points 3 weeks ago

Are you kidding me? How did NYT reach those conclusions when the chair flipping conclusions of said study quite clearly states that [sic]"The use of an LLM did not significantly enhance diagnostic reasoning performance compared with the availability of only conventional resources."

https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395

I mean, c'mon!

On the Nature one:

"we constructed a new forward-looking (Fig. 2) benchmark, BrainBench."

and

"Instead, our analyses suggested that LLMs discovered the fundamental patterns that underlie neuroscience studies, which enabled LLMs to predict the outcomes of studies that were novel to them."

and

"We found that LLMs outperform human experts on BrainBench"

Is in reality saying: we made this benchmark that LLMs know how to cheat around our benchmark better than experts do, nothing more, nothing else.

load more comments (1 replies)

load more comments (2 replies)

load more comments (3 replies)

this post was submitted on 01 Dec 2024

48 points (88.7% liked)

Futurology

1854 readers

8 users here now

founded 1 year ago

MODERATORS

voidx@futurology.today

Lugh@futurology.today

Espiritdescali@futurology.today

AwesomeLowlander@futurology.today