Today's Large Language Models are Essentially BS Machines : technology

[–] scrubbles@poptalk.scrubbles.tech 79 points 2 years ago (5 children)

And everyone in tech who has worked on ML before collectively says "yeah that's what we've been trying to tell you". Don't get me wrong, LLMs are a huge leap, but god did it show how greedy corporations are, just immediately jumping to "how quick can we lay people off?". The tech is not to that spec. Yet. It will get there, but goddamn do we need to be demanding some regulations now

[–] Dark_Arc@social.packetloss.gg 39 points 2 years ago (1 children)

The tech is not to that spec. Yet.

I'm not sure it will. At least, not this tech, not this approach to the problem. From my understanding there's fundamentally no comprehension; it's not bugged, broken, or incomplete, it's just not there... it's missing from the design.

[–] communist@beehaw.org 17 points 2 years ago (3 children)

We don't know that for sure yet, we saw a lot of emergent intelligent properties appear as we scaled up, and we're nowhere near done scaling LLM's, I'm not saying it will be solved, just that we don't know one way or the other yet.

[–] Veraticus@lib.lgbt 13 points 2 years ago (27 children)

LLMs are fundamentally different from human consciousness. It isn't a problem of scale, but kind.

They are like your phone's autocomplete, but very very good. But there's no level of "very good" for autocomplete that makes it a human, or will give it sentience, or allow it to understand the words it is suggesting. It simply returns the next most-likely word in a response.

If we want computerized intelligence, LLMs are a dead end. They might be a good way for that intelligence to speak pretty sentences to us, but they will never be that themselves.

[–] communist@beehaw.org 4 points 2 years ago* (last edited 2 years ago) (1 children)

You're guessing, you don't actually know that for sure, it seems intuitively correct, but we simply do not know enough about cognition to make that assumption.

Perhaps our ability to reason exclusively comes from our ability to predict, and by scaling up the ability to predict, we become more and more able to reason.

These are guesses, all we have now are guesses, you can say "it doesn't reason" and "it's just autocorrect" all you want, but if that were the case why did scaling it up eventually enable it to perform basic math? Why did scaling it up improve its ability to problemsolve significantly (gpt3 vs gpt4), there's so many unknowns in this field, to just say "nah, can't be, it works differently from us" doesn't mean it can't do the same things as us given enough scale.

[–] Veraticus@lib.lgbt 8 points 2 years ago (10 children)

I'm not guessing. When I say it's a difference of kind, I really did mean that. There is no cognition here; and we know enough about cognition to say that LLMs are not performing anything like it.

Believing LLMs will eventually perform cognition with enough hardware is like saying, "if we throw enough hardware at a calculator, it will eventually become alive." Even if you throw all the hardware in the world at it, there is no emergent property of a calculator that would create sentience. So too LLMs, which really are just calculators that can speak English. But just like calculators they have no conception of what English is and they do not think in any way, and never will.

load more comments (10 replies)

load more comments (26 replies)

[–] Dark_Arc@social.packetloss.gg 9 points 2 years ago* (last edited 2 years ago) (1 children)

I don't believe in scaling as a way to discover understanding. Doing that is just praying that the machine comes alive... these machines weren't programmed to come alive in that way. That's my fundamental argument, the design of LLMs ignores understanding of the content... it doesn't matter how much content it's been scaled up to.

If I teach a real AI about fishing, it should be able to reason about fishing and it shouldn't need to have read a supplementary knowledge of mankind to do it.

What the LLMs seem to be moving towards is more of a search and summary engine (for existing content). That's a very similar and potentially quite useful thing, but it's not the same thing as understanding.

It's the difference between the kid that doesn't know much but is really good at figuring it out based on what they know vs the kid that's read all the text books front to back and can't come up with anything original to save their life but can quickly regurgitate and summarize anything they've ever read.

[–] communist@beehaw.org 4 points 2 years ago* (last edited 2 years ago) (1 children)

If I teach a real AI about fishing, it should be able to reason about fishing and it shouldn’t need to have read a supplementary knowledge of mankind to do it.

This is a faulty assumption.

In order for you to learn about fishing, you had to learn a shitload about the world. Babies don't come out of the womb able to do such tasks, there is a shitload of prerequisite knowledge in order to fish, it's unfair to expect an ai to do this without prerequisite knowledge.

Furthermore, LLM's have been shown to do many things that aren't in their training data, so the notion that it's a stochastic parrot is also false.

[–] Dark_Arc@social.packetloss.gg 4 points 2 years ago* (last edited 2 years ago) (1 children)

Furthermore, LLM’s have been shown to do many things that aren’t in their training data, so the notion that it’s a stochastic parrot is also false.

And (from what I've seen) they get things wrong with extreme regularity, increasingly so as thing diverge from the training data. I wouldn't say they're a "stochastic parrot" but they don't seem to be much better when things need to be correct... and again, based on my (admittedly limited) understanding of their design, I don't anticipate this technology (at least without some kind of augmented approach that can reason about the substance) overcoming that.

In order for you to learn about fishing, you had to learn a shitload about the world. Babies don’t come out of the womb able to do such tasks, there is a shitload of prerequisite knowledge in order to fish, it’s unfair to expect an ai to do this without prerequisite knowledge.

That's missing the forest for the trees. Of course an AI isn't going to go fishing. However, I should be able to assert some facts about fishing and it should be able to reason based on those assertions. e.g. a child can work off of facts presented about fishing, "fish are hard to catch in muddy water" -> "the water is muddy, does that impact my chances of a catching a bluegill?" -> "yes, it does, bluegill are fish, and fish don't like muddy water".

There are also "teachings" brought about by how these are programmed that make the flaws less obvious, e.g., if I try to repeat the experiment in the post here Google's Bard outright refuses to continue because it doesn't have information about Ryan McGee. I've also seen Bard get notably better as it's been scaled up, early on I tried asking it about RuneScape and it spewed absolute nonsense. Now... It's reasonable-ish.

I was able to reproduce a nonsense response (once again) by asking about RuneScape. I asked how to get 99 firemaking, and it invented a mechanic that doesn't exist "Using a bonfire in the Charred Stump: The Charred Stump is a bonfire located in the Wilderness. It gives 150% Firemaking experience, but it is also dangerous because you can be attacked by other players." This is a novel (if not creative) invention of Bard likely derived from advice for training Prayer (which does have something in the Wilderness which gives 350% experience).

load more comments (1 replies)

[–] BotCheese@beehaw.org 7 points 2 years ago (2 children)

And we're nowhere near dome scalimg LLM's

I think we might be, I remember hearing openAI was training on so much literary data that they didn't and couldn't find enough for testing the model. Though I may be misrememberimg.

[–] newde@feddit.nl 5 points 2 years ago (2 children)

No that's definitely the case. However, Microsoft is now working making LLM's more dependent on several high quality sources. For example: encyclopedias will be more important sources than random reddit posts.

load more comments (2 replies)

load more comments (1 replies)

[–] Veraticus@lib.lgbt 21 points 2 years ago* (last edited 2 years ago) (3 children)

I was mostly posting this because the last time LLMs came up, people kept on going on and on about how much their thoughts are like ours and how they know so much information. But as this article makes clear, they have no thoughts and know no information.

In many ways they are simply a mathematical party trick; formulas trained on so much language, they can produce language themselves. But there is no “there” there.

[–] lily33@lemm.ee 10 points 2 years ago* (last edited 2 years ago) (1 children)

have no thoughts

True

know no information

False. There's plenty of information stored in the models, and plenty of papers that delve into how it's stored, or how to extract or modify it.

I guess you can nitpick over the work "know", and what it means, but as someone else pointed out, we don't actually know what that means in humans anyway. But LLMs do use the information stored in context, they don't simply regurgitate it verbatim. For example (from this article):

If you ask an LLM what's near the Eiffel Tower, it'll list location in Paris. If you edit its stored information to think the Eiffel Tower is in Rome, it'll actually start suggesting you sights in Rome instead.

[–] Veraticus@lib.lgbt 6 points 2 years ago (4 children)

They only use words in context, which is their problem. It doesn't know what the words mean or what the context means; it's glorified autocomplete.

I guess it depends on what you mean by "information." Since all of the words it uses are meaningless to it (it doesn't understand anything of what it either is asked or says), I would say it has no information and knows nothing. At least, nothing more than a calculator knows when it returns 7 + 8 = 15. It doesn't know what those numbers mean or what it represents; it's simply returning the result of a computation.

So too LLMs responding to language.

load more comments (4 replies)

[–] sincle354@beehaw.org 9 points 2 years ago

Sadly we don't even know what "knowing" is, considering human memory changes every time it is accessed. We might just need language and language only. Right now they're testing if generating verbalized trains of thought helps (it might?). The question might change to: Does the sum total of human language have enough consistency to produce behavior we might call consciousness? Can we brute force the Chinese room with enough data?

[–] pbjamm@beehaw.org 6 points 2 years ago

They are the perfect embodiment of the internet.

They know everything, but understand nothing

[–] MasterBuilder@lemmy.one 14 points 2 years ago (3 children)

I've been unemployed for 7 months. Every online job I see that's been posted for at least 6 hours has over 200 applications. I'm a senior Dev with 30 years experience, and I can't find work.

I'd say generative AI is an existential threat as bad as offshoring was for steel in the early 80s. I'm now left with the prospect of spending the last 20 years of my work life at or near minimum wage.

After all, I can't afford to spend $250,000 on a new bachelor's degree, and a community college degree might get me to $25/hr, and still costs thousands. This is causing impoverishment on a massive scale.

Ignore this threat at your peril.

[–] seang96@spgrn.com 17 points 2 years ago* (last edited 2 years ago) (3 children)

Your issue sounds more like a capitalism issue. FANG companies lay off thousands of employees to cut costs and prepare for changes in the economy. AI didn't make them lay off all those employees, just corporate greed. Until AI can gather requirements, accurately produce code with at least 80%, can compile the software itself, it isn't a threat.

Edit fix autocorrect

load more comments (3 replies)

[–] scrubbles@poptalk.scrubbles.tech 12 points 2 years ago

I'm a senior dev too, and at first I thought the same, but really it's a market downturn. Companies are just afraid to hire right now. I'd look into generative AI, try to understand how it works. That's how I've been spending my time, and yeah, it's intuitive the way they do it but the more you understand how it works the more you realize that it's not ready to take our jobs. Yet. Again maybe someday, but there is a lot of work that needs to be done to get something semi up and running, and the models that Google uses are not going to be usable for every company. (Take a look at all the specialized models already).

Our job never goes away, but it does constantly evolve. This is just another point where we have to learn new skills, and that may be that we all need to be model tuners some day. At the end of the day the user still needs to correctly describe what they want to have happen on the screen, and there are currently no ways to take what they describe into a full piece of software.

[–] HelixTitan@beehaw.org 8 points 2 years ago (1 children)

Hard to believe a senior dev can't find work. Those positions are the most needed. Also 25 an hour is 50k a year. No where in the US are senior devs paid that little. I suppose you may not be US based, but your cost for college seems to imply US, albeit at an expensive school.

load more comments (1 replies)

[–] p03locke@lemmy.dbzer0.com 13 points 2 years ago (1 children)

And everyone in tech who has worked on ML before collectively says “yeah that’s what we’ve been trying to tell you”.

Everybody in tech would even have a passing understanding of the technology was collectively saying that. We understand the limits of technology and can feel out the bounds easily. But, too many of these dumbasses with dollar signs in their eyes are all "to the moon!", and tripping and failing on implementing the tech in unreasonable ways.

It was never a factoid machine, like some people wanted to believe. It was always about creatively writing something, and only one with so much attention.

[–] interolivary@beehaw.org 10 points 2 years ago

It was never a factoid machine

Funny tidbit about the word "factoid": its original meaning was "an item of unreliable information that is reported and repeated so often that it becomes accepted as fact", but the modern usage is "a brief or trivial item of news or information".

This means that the modern usage of "factoid" is in itself a factoid, and that in the old sense LLMs sort of are factoid machines.

Note that I'm not saying the modern use is wrong. Languages evolve, and words taking on new meanings doesn't mean the new meanings are "wrong" (and surprisingly words changing to mean the opposite of what they used to mean isn't all that uncommon either.)

[–] biddy@feddit.nl 7 points 2 years ago

I disagree, a lot of white collar work is simply writing bullshit.

[–] bpalmerau@aussie.zone 26 points 2 years ago (21 children)

“has a model of how words relate to each other, but does not have a model of the objects to which the words refer.

It engages in predictive logic, but cannot perform syllogistic logic - reasoning to a logical conclusion from a set of propositions that are assumed to be true”

Is this true of all current LLMs?

[–] Veraticus@lib.lgbt 31 points 2 years ago

Yes, this is how all LLMs function.

load more comments (20 replies)

[–] Blapoo@lemmy.ml 24 points 2 years ago

They're glorified autocompletes. Way too much attention is being given to LLMs in isolation. By themselves: Not a silver bullet.

But when called in a chain . . . eyebrows

[–] Fizz@lemmy.nz 21 points 2 years ago (2 children)

Humans are bullshit machines as well.

[–] Thorny_Thicket@sopuli.xyz 12 points 2 years ago (1 children)

This is what I find the most amusing about the criticism of LLMs and many other AI systems aswell. People often talk about them as if they're somehow uniquely flawed, while in reality what they're doing isn't that different from what humans do aswell. The biggest difference is that when a human hallucinates it's often obvious but when chatGPT does that it's harder to spot.

[–] dr_catman@beehaw.org 14 points 2 years ago (1 children)

This is… really not true at all.

LLMs differ from humans in a very very important way when it comes to language: we know the meanings of the words we use. LLMs do not “know” things, are unconcerned with “meanings”, and thus cannot be said to be “using” words in any meaningful way.

[–] Zaktor@sopuli.xyz 8 points 2 years ago

we know the meanings of the words we use.

Uh, but we don't? Not really. People use the wrong words all the time and each person's definition (i.e., encoding) is slightly different. We mimic phrases and structures we've heard to sound smarter and forge on with uncertain statements because frequently they go unchallenged or simply aren't important.

We're more structurally complex than a LLM, but we fool ourselves in thinking we're somehow uniquely thoughtful and reliable.

[–] renard_roux@beehaw.org 6 points 2 years ago

A chip off the ol' block, then 🙂

[–] StringTheory@beehaw.org 20 points 2 years ago (1 children)

This reminds me of an article about journalism and the internet, from ages ago. A class was asked how they would research for a topic (it was some recent political event, I don’t remember). The class confidently answered “the internet.” The professor struggled to get them to understand that wasn’t enough. Yes, there is all kinds of stuff about this event on the internet, but how did it get there?. And more importantly, what is missing?

Sure, all the sexy AI stuff gives us goosebumps and sounds great. But how did it get there, and what is missing? Someone somewhere has to do the actual original work first, or it’s just making collages from the same library over and over and over again.

[–] Veraticus@lib.lgbt 8 points 2 years ago* (last edited 2 years ago) (2 children)

And also it's no replacement for actual research, either on the Internet or in real life.

People assume LLMs are like people, in that they won't simply spout bullshit if they can avoid it. But as this article properly points out, they can and do. You can't really trust anything they output. (At least not without verifying it all first.)

[–] HalJor@beehaw.org 10 points 2 years ago

People assume LLMs are like people, in that they won’t simply spout bullshit if they can avoid it.

There are plenty of people who spout bullshit every chance they get.

[–] upstream@beehaw.org 5 points 2 years ago (1 children)

As with any tool it is how you use it that matters.

Today’s LLM’s are capable of fairly amazing stuff.

It’s a BS machine? Sure. Have you read or written stuff for higher education?

You don’t get points for being short and concise, even though you should. You get points for following the BS formula.

You know who else is good at BS?

LLM’s. If you manage to provide it enough meaningful input it can do a great lot of BS legwork for you.

I see people who overuse it, don’t edit, isn’t critical. Sure. Then you end up with just BS.

But there’s plenty of useful applications, like writing boiler plate code (see also CoPilot), structuring code, tests, etc.

Is it worth all the hype? Nope.

Some of it? Probably.

load more comments (1 replies)

[–] lloram239@feddit.de 18 points 2 years ago* (last edited 2 years ago) (2 children)

Today's Large Language Models are Essentially BS Machines

Apparently so are today's bloggers and journalists. Since they just keep repeating the same nonsense and seem to lack any sense of understanding. I am really starting to question if humans are capable of original thought.

The responses all came with citations and links to sources for the fact claims. And the responses themselve all sound entirely reasonable. They are also entirely made up.

This does not compute. Bing Chat provides sources, as in links you can click on and that work. It doesn't pull things out of thin air, it pulls information out of Bing search and summarizes it. That information is often wrong, incomplete and misleading, as it will only take a tiny number of websites to source that information. But so would most humans using Bing search. So not really a problem with the bot itself.

ChatGPT gives most of the time far better answers, as it bases the answers on knowledge gained from all the sources, not just specific ones. But that also means it can't provide sources and if you pressure it to give you some, it will make them up. And depending on the topic, it might also not know something for which Bing can find a relevant website.

LLMs are trained not to produce answers that meet some kind of factual threshold, but rather to produce answers that sound reasonable.

And guess what answer sounds the most reasonable? A correct one. People seriously seem to have a hard time to grasp how freakishly difficult it is to generate plausible language and how much stuff has to be going on behind the scene to make that possible. That does not mean GPT will be correct all the time or be an all knowing oracle, but you'll have to be rather stupid to expect that to begin with. It's simple the first chatbot that actually kind of works a lot of the time. And yes, it can reason and understand within its limits, it making mistakes from time to time does not refute that, especially when badly prompted (e.g. asking it to solve a problem step by step can dramatically improve the answers).

LLMs are not people, but neither are they BS generators. In plenty of areas they already outperform humans and in others not so much. But you are not learning that from articles that treat every little mistake from an LLM like some huge gotcha moment.

[–] Veraticus@lib.lgbt 10 points 2 years ago (1 children)

No one is saying there's problems with the bots (though I don't understand why you're being so defensive of them -- they have no feelings so describing their limitations doesn't hurt them).

The problem is what humans expect from LLMs and how humans use them. Their purposes is to string words together in pretty ways. Sometimes those ways are also correct. Being aware of what they're designed to do, and their limitations, seems important for using them properly.

[–] lloram239@feddit.de 8 points 2 years ago

they have no feelings so describing their limitations

These kinds of articles, which all repeat exactly the same extremely basic points and make lots of fallacious ones, are absolute dogshit at describing the shortcomings of AI. Many of them don't even bother actually testing the AI themselves, but just repeat what they heard elsewhere. Even with this one I am not sure what exactly they did, as Bing Chat works completely different for me from what is reported here. It won't hurt the AI, but it certainly hurts me reading the same old minimum effort content over and over and over again, and they are the ones accusing AI of generating bullshit.

The problem is what humans expect from LLMs and how humans use them.

Yes, humans are stupid. They saw some bad sci-fi and now they expect AI to be capable of literal magic.

[–] FlashMobOfOne@beehaw.org 8 points 2 years ago

These AI systems do make up bullshit often enough that there's even a term for it: Hallucination.

Kind of a euphemistic term, like how religious people made up the word 'faith' to cover for the more honest term: gullible.

[–] gaytswiftfan@beehaw.org 18 points 2 years ago (1 children)

hmm i think we need twelve more articles on this

[–] RickyRigatoni@lemmy.ml 10 points 2 years ago

We should feed the ones already made to a LLM and have it write the next 12 for the irony.

[–] radix@lemm.ee 9 points 2 years ago* (last edited 2 years ago) (1 children)

What else should they be?? They reflect human language.

[–] Veraticus@lib.lgbt 9 points 2 years ago (6 children)

People think they are actually intelligent and perform reasoning. This article discusses how and why that is not true.

load more comments (6 replies)

[–] crow@beehaw.org 7 points 2 years ago (3 children)

And what does that mean about the jobs it can replace?

[–] avidamoeba@lemmy.ca 18 points 2 years ago* (last edited 2 years ago)

They can replace the bullshit jobs of which we have many, serving the essential purpose of keeping the people doing them fed and thus the economy and society stable. 🥲

load more comments (2 replies)

[–] Zaktor@sopuli.xyz 4 points 2 years ago (1 children)

They're both BS machines and fact generators. It produced bullshit when asked about him because as far as I can tell he's kind of a nobody, not because it's just a stylistic generator. If he asked about a more prominent person likely to exist more significantly within the training corpus, it would likely be largely accurate. The hallucination problem stems from the system needing to produce a result regardless of whether it has a well trained semantic model for the question.

LLMs encode both the style of language and semantic relationships. For "who is Einstein", both paths are well developed and the result is a reasonable response. For "who is Ryan McGreal", the semantic relationships are weak or non-existent, but the stylistic path is undeterred, leading to the confidently plausible bullshit.

[–] Veraticus@lib.lgbt 7 points 2 years ago (8 children)

They don't generate facts, as the article says. They choose the next most likely word. Everything is confidently plausible bullshit. That some of it is also true is just luck.

load more comments (8 replies)

Technology