1309

Bing AI performing at its peak once again... (lemmy.world)

submitted 11 months ago by Dehydrated@lemmy.world to c/microblogmemes@lemmy.world

196 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] Thorry84@feddit.nl 142 points 11 months ago

[-] douglasg14b@lemmy.world 35 points 11 months ago* (last edited 11 months ago)

Generative AI is INCREDIBLY bad at mathmatical/logical reasoning. This is well known, and very much not surprising.

That's actually one of the milestones on the way to general artificial intelligence. The ability to reason about logic & math is a huge increase in AI capability.

[-] callcc@lemmy.world 4 points 11 months ago

Well known by you, not everybody.

[-] fallingcats@discuss.tchncs.de 4 points 11 months ago* (last edited 11 months ago)

Well known by everyone that knows anything about LLMs at all

[-] kromem@lemmy.world 0 points 11 months ago* (last edited 11 months ago)

It's not. This is already obsolete.

[-] fallingcats@discuss.tchncs.de 5 points 11 months ago

I've used gpt4 enough in the past months to confidently say the improvements in this blog post aren't noteworthy

[-] kromem@lemmy.world 1 points 11 months ago

They aren't live in the consumer model. This is a research post, not in production.

There's other literature elsewhere on getting improved math performance with GPT-4 as it exists right now.

[-] kromem@lemmy.world 0 points 11 months ago

It's really not in the most current models.

And it's already at present incredibly advanced in research.

The bigger issue is abstract reasoning that necessitates nonlinear representations - things like Sodoku, where exploring a solution requires updating the conditions and pursuing multiple paths to a solution. This can be achieved with multiple calls, but doing it in a single process is currently a fool's errand and likely will be until a shift to future architectures.

[-] douglasg14b@lemmy.world 1 points 11 months ago

I'm referring to models that understand language and semantics, such as LLMs.

Other models that are specifically trained can't do what it can, but they can perform math.

[-] kromem@lemmy.world 1 points 11 months ago

The linked research is about LLMs. The opening of the abstract of the paper:

In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning step. Given the importance of training reliable models, and given the high cost of human feedback, it is important to carefully compare the both methods. Recent work has already begun this comparison, but many questions still remain. We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. Our process-supervised model solves 78% of problems from a representative subset of the MATH test set. Additionally, we show that active learning significantly improves the efficacy of process supervision.

[-] Trollception@lemmy.world 16 points 11 months ago

So that's correct... Or am I dumber than the AI?

[-] JGrffn@lemmy.world 92 points 11 months ago

If one gallon is 3.785 liters, then one gallon is less than 4 liters. So, 4 liters should've been the answer.

[-] Smc87 86 points 11 months ago

Dumber

[-] WhiteHawk@lemmy.world 43 points 11 months ago

4l > 3.785l

[-] Matty_r@programming.dev 19 points 11 months ago

4l is only 2 characters, 3.785l is 6 characters. 6 > 2, therefore 3.785l is greater than 4l.

[-] Klear@sh.itjust.works 7 points 11 months ago* (last edited 11 months ago)

You're forgetting the decimal point. The second one is just 1.4 characters.

[-] intensely_human@lemm.ee 2 points 11 months ago

“4” > “3.785”

=> false

[-] nifty@lemmy.world 1 points 11 months ago

That’s maybe how GPT reasoned it as well, you could be an LLM whisperer.

[-] stolid_agnostic@lemmy.ml 20 points 11 months ago

Everyone has a bad day now and then so don’t worry about it.

[-] fossphi@lemm.ee 2 points 11 months ago

Ummm... username check out?

[-] moog@lemm.ee 0 points 11 months ago

U are dumber than the AI ig lol

[-] SomeoneSomewhere@lemmy.nz 4 points 11 months ago

Obviously it's referring to the 4.54609 litre UK gallon /s

[-] kromem@lemmy.world 1 points 11 months ago

You can see from the green icon that it's GPT-3.5.

GPT-3.5 really is best described as simply "convincing autocomplete."

It isn't until GPT-4 that there were compelling reasoning capabilities including rudimentary spatial awareness (I suspect in part from being a multimodal model).

In fact, it was the jump from a nonsense answer regarding a "stack these items" prompt from 3.5 to a very well structured answer in 4 that blew a lot of minds at Microsoft.

this post was submitted on 27 Dec 2023

1309 points (96.0% liked)

Microblog Memes

5881 readers

4152 users here now

A place to share screenshots of Microblog posts, whether from Mastodon, tumblr, ~~Twitter~~ X, KBin, Threads or elsewhere.

Created as an evolution of White People Twitter and other tweet-capture subreddits.

Rules:

Please put at least one word relevant to the post in the post title.
Be nice.
No advertising, brand promotion or guerilla marketing.
Posters are encouraged to link to the toot or tweet etc in the description of posts.

Related communities:

founded 1 year ago

MODERATORS

ReadyUser31@lemmy.world

aeronmelon@lemmy.world