503
We have to stop ignoring AI’s hallucination problem
(www.theverge.com)
This is a most excellent place for technology news and articles.
"We invented a new kind of calculator. It usually returns the correct value for the mathematics you asked it to evaluate! But sometimes it makes up wrong answers for reasons we don't understand. So if it's important to you that you know the actual answer, you should always use a second, better calculator to check our work."
Then what is the point of this new calculator?
Fantastic comment, from the article.
It's a nascent stage technology that reflects the world's words back at you in statistical order by way parsing user generated prompts. It's a reactive system with no autonomy to deviate from a template upon reset. It's no Rokos Basilisk inherently, just because
am I understanding correctly that it's just a fancy random word generator
More or less, yes.
Not random, moreso probabilistic, which is almost the same thing granted.
It's like letting auto complete always pick the next word in the sentence without typing anything yourself. But fancier.
Yes, but it's, like, really fancy.
Its not just a calculator though.
Image generation requires no fact checking whatsoever, and some of the tools can do it well.
That said, LLMs will always have limitations and true AI is still a ways away.
The biggest disappointment in the image generation capabilities was the realisation that there is no object permanence there in terms of components making up an image so for any specificity you're just playing whackamole with iterations that introduce other undesirable shit no matter how specific you make your prompts.
They are also now heavily nerfing the models to avoid lawsuits by just ignoring anything relating to specific styles that may be considered trademarks, problem is those are often industry jargon so now you're having to craft more convoluted prompts and get more mid results.
It does require fact-checking. You might ask a human and get someone with 10 fingers on one hand, you might ask people in the background and get blobs merged on each other. The fact check in images is absolutely necessary and consists of verifying that the generate image adheres to your prompt and that the objects in it match their intended real counterparts.
I do agree that it's a different type of fact checking, but that's because an image is not inherently correct or wrong, it only is if compared to your prompt and (where applicable) to reality.
It doesn't? Have you not seen any of the articles about AI-generated images being used for misinformation?
Sure it does. Let's say IKEA wants to use midjourney to generate images for its furniture assembly instructions. The instructions are already written, so the prompt is something like "step 3 of assembling the BorkBork kitchen table".
Would you just auto-insert whatever it generated and send it straight to the printer for 20000 copies?
Or would you look at the image and make sure that it didn't show a couch instead?
If you choose the latter, that's fact checking.
I can't agree more strongly with this point!
Some problems lend themselves to "guess-and-check" approaches. This calculator is great at guessing, and it's usually "close enough".
The other calculator can check efficiently, but it can't solve the original problem.
Essentially this is the entire motivation for numerical methods.
In my personal experience given that's how I general manage to shortcut a lot of labour intensive intellectual tasks, using intuition to guess possible answers/results and then working backwards from them to determine which one is right and even prove it, is generally faster (I guess how often it's so depends on how good one's intuition is in a given field, which in turn correlates with experience in it) because it's usually faster to show that a result is correct than to arrive at it (and if it's not, you just do it the old fashion way).
That said, it's far from guaranteed faster and for those things with more than one solution might yield working but sub-optimal ones.
Further, merelly just the intuition step does not yield a result that can be trusted without validation.
Maybe by being used as intuition is in this process, LLMs can help accelerate the search for results in subjects one has not enough experience in to have good intuition on but has enough experience (or there are ways or tools to do it inherent to that domain) to do the "validation of possible results" part.
That's not really right, because verifying solutions is usually much easier than finding them. A calculator that can take in arbitrary sets of formulas and produce answers for variables, but is sometimes wrong, is an entirely different beast than a calculator that can plug values into variables and evaluate expressions to check if they're correct.
As a matter of fact, I'm pretty sure that argument would also make quantum computing pointless - because quantum computers are probability based and can provide answers for difficult problems, but not consistently, so you want to use a regular computer to verify those answers.
Perhaps a better comparison would be a dictionary that can explain entire sentences, but requires you to then check each word in a regular dictionary and make sure it didn't mix them up completely? Though I guess that's actually exactly how LLMs operate...
It's only easier to verify a solution than come up with a solution when you can trust and understand the algorithms that are developing the solution. Simulation software for thermodynamics is magnitudes faster than hand calculations, but you know what the software is doing. The creators of the software aren't saying "we don't actually know how it works".
In the case of an LLM, I have to verify everything with no trust whatsoever. And that takes longer than just doing it myself. Especially because an LLM is writing something for me, it isn't doing complex math.
If a solution is correct then a solution is correct. If a correct solution was generated randomly that doesn't make it less correct. It just means that you may not always get correct solutions from the generating process, which is why they are checked after.
Except when you're doing calculations, a calculator can run through an equation substituting the given answers and see that the values match... Which is my point of calculators not being a good example. And the case of a quantum computer wasn't addressed.
I agree that LLMs have many issues, are being used for bad purposes, are overhyped, and we've yet to see if the issues are solvable - but I think the analogy is twisting the truth, and I think the current state of LLMs being bad is not a license to make disingenuous comparisons.
Its left to be seen in the future then
The problem is people thinking the tool is a "calculator" (or fact-checker or search engine) while it's just a text generator. It's great for generating text.
But even then it can't keep a paragraph stable during the conversation. For me personally, the best antidote against the hype was to use the tool.
I don't judge people believing it's more than it is though. The industry is intentionally deceiving everyone about this and we also intuitively see intelligence when someone can eloquently express themselves. Seeing that in software seems magical.
We now have a great Star Trek like human machine interface. We only need real intelligence in the backend.
No scientific discover has value
It would be a great comment if it represented reality, but as an analogy it's completely off.
LLM-based AI represents functionality that nothing other than the human mind and extensive research or singular expertise can replicate. There is no already existing 'second, better calculator' that has the same breadth of capabilities, particularly in areas involving language.
If you're only using it as a calculator (which was never the strength of an LLM in the first place), for problems you could already solve with a calculator because you understand what is required, then uh... yeah i mean use a calculator, that is the appropriate tool.
do you know what an analogy is??
Analogous. Which the above commenter just explained why it isnt.