this post was submitted on 07 Jun 2026

644 points (98.6% liked)

Technology

85243 readers

4258 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

644

Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them (ea.rna.nl)

submitted 1 day ago by Trilogy3452@lemmy.world to c/technology@lemmy.world

176 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] vermaterc@lemmy.ml 2 points 1 day ago (7 children)

So are we assuming here that LLMs won't become more efficient over time? GPT-3 has been a frontier model just a few years ago and it's performance blew everyone's mind at that time. I can now run equivalent LLM on my personal computer. Why can't we expect that after a few years Claude Sonnet level of capability won't be possible to accomplish locally?

[–] pinball_wizard@lemmy.zip 1 points 20 hours ago* (last edited 20 hours ago)

So are we assuming here that LLMs won't become more efficient over time?

Mostly. Moore's law ran up against the physical limits of the materials we make chips out of - so desktops of today just do what the desktops of yesterday do, mostly.

We should keep seeing improvements in highly specialized models. There's interesting outcomes to have here, with the right setup and ollama.

but -

The really promising impressive models today are just running with long contexts on shithloads of hardware - which is neither coming to home PCs any time soon nor going to actually be profitable to run any time soon.

There's an argument to be made that running the really interesting model on a ton of hardware might make money for really specific uses - but then when we talk about specific uses that are worth lots of money, those use cases tend to tolerate difficult interfaces and reward accuracy. LLMs invariably reduce accuracy in exchange for ease of use. There might be a sweet spot for a huge expensive hallucination prone LLM in some of these uses, but I doubt it (the entire approach) competes, long term.

There's a few specific use cases where inaccuracy is desirable - largely forms of shifting accountability and some kinds of gambling. Things that either are or should be crimes have a higher tolerance for AI hallucination.

But - a small cheap local model has all the desirable attributes for doing these things (crimes) poorly as a big expensive model. So there's probably not even much money to be made there.

I expect that this tech is not going away, but it's also not earning back the current investment.

[–] mabeledo@lemmy.world 1 points 23 hours ago (1 children)

They could, but what’s the plan here, exactly? That all these for profit companies who are currently publishing models for free, like Qwen, will continue to do so in the future?

[–] vermaterc@lemmy.ml 1 points 15 hours ago (1 children)

Why not? Why Microsoft develops it's .NET ecosystem? Why Google develops Go/Dart? It costs them lots of money and they give it for free.

The answer is: they don't earn money on it directly, but these tools are a way to tie programmers to their cloud services. If you use .NET you'll probably end up on Azure. If Go - probably you'll use GCP.

So I suspect the same will be with LLMs. At some point they will say: "hey, you can use this LLM however you want, but as you are already using it, then you may want to know our platform is optimized for it"

[–] mabeledo@lemmy.world 1 points 11 hours ago

That’s not an accurate analogy.

LLM providers are SaaS providers, meaning that even if they were to give you the source of all the tools they use, there’s a fundamental limit to how much you can self host.

A better comparison would be Google giving away their indexed search data: you might be able to run an infinitesimal portion of it on your hardware, and will never ever match the results Google offers on their website, and since it’s a monopoly, you would be at a permanent disadvantage.

Same goes for all these AI companies. They are an oligopoly that give away subpar free models, compared to their cloud offerings. Self hosted LLMs will never stand a chance.

[–] ag10n@lemmy.world 9 points 1 day ago (3 children)

What’s the cost of the compute you have to run something locally?

Majority of people don’t have 32G of vram to run something remotely as capable

[–] MrQuallzin@pie.eyeofthestorm.place 6 points 1 day ago (1 children)

I've got an old 1060ti in my server. Ollama shares it with just a couple other containers. Electricity here is majority hydro with some natural gas, $0.08/kWh.

It's a little slow, but I can comfortably run qwen3:14b. Of course that's not all done on the GPU, a large part is offloaded to server ram (generally 32GB available so more than enough headroom)

My server and my gaming PC combined last month came out to $13.32

[–] ag10n@lemmy.world 4 points 1 day ago* (last edited 1 day ago)

How does that compare to closed models that Anthropic offers, at the context and scale they offer.

I run Qwen3.6 27B locally and it’s usable with 16G vram but still not the same as a data centre of Blackwell clusters.

[–] greyscale@lemmy.grey.ooo 2 points 1 day ago (1 children)

lfm2 works like greased lightning on the NPU built into the current macbook M5.

[–] ag10n@lemmy.world 1 points 1 day ago (1 children)

Describe greased lightning, because it’s much slower and needs to handle compression for context

We’re moving in that direction but an M5 is not what the majority of people are running at home

[–] greyscale@lemmy.grey.ooo 0 points 1 day ago (1 children)

I dunno man, I'm not a slopjockey so I don't know the minutiae of the addiction.

All of our devs appear to have M5s right now. All of those copilot+ laptops have NPUs too.

[–] ag10n@lemmy.world 1 points 1 day ago (1 children)

Your company has bought you the latest and greatest and likely supports commercial token usage too

You can’t compare LLMs at scale to running it locally; same experience and capabilities

[–] greyscale@lemmy.grey.ooo -3 points 1 day ago

"Latest and greatest" my fucking sides lmao

My company gave me some US shitware and I've got some local shitware instead.

If you can't make that work and are dependent on the teat of the slopgenerators, that's a skill issue on you, buddy.

[–] blackbeans@lemmy.zip 0 points 1 day ago* (last edited 1 day ago) (1 children)

I remember my computer not being fast enough to even play an MP3 file. Two years later, my computer was capable of running 3D accelerated games, browsing the internet at broadband speeds and playing videos.

Sometimes technology advances fast. We could be entering such an era as there are major investments taking place and global competitors will rise to the occasion to market these to a broader audience.

I think it will be entirely possible for consumers to use a decent LLM on their computer in a few years time.

[–] ag10n@lemmy.world 5 points 1 day ago (2 children)

It’s not the 90s anymore. Unless there’s a compression algorithm putting billions of relationships into a manageable size, local AI is highly specific under 8G vram (text-to-speech as an example is under 1G) let alone the context required for keeping a conversation or writing code.

[–] ThirdConsul@lemmy.zip 1 points 1 day ago (1 children)

If text-to-speech is what Youtube uses to autogenerate the subtitles, it is worthless for anything that uses slightly richer vocabulary.

[–] pirat@lemmy.world 2 points 1 day ago

No. Autogenerated subtitles would be speech-to-text, rather than text-to-speech.

[–] blackbeans@lemmy.zip -2 points 1 day ago (2 children)

To be clear, I wasn't talking about a leap in LLM design. I was talking about a leap in hardware capabilities...

[–] KRAW@linux.community 2 points 1 day ago

Improved hardware capabilities used to come very quickly (see Moore's Law and Dennard Scaling). However that trend is basically over, so getting higher performance hardware takes a lot of effort to make hardware specialized for certain tasks. That's why you see there inference accelerators like Groq, SambaNova, Cerebrus, etc. However this is hardware that still is gonna go into data centers. Something innovative has to happen on the AI side for commercial-grade models to be runnable on consumer hardware.

[–] ag10n@lemmy.world 2 points 1 day ago

Which are increasingly out of reach for a normal person. Phones let alone PC hardware have increased exponentially in recent history

[–] potustheplant@feddit.nl 4 points 1 day ago* (last edited 1 day ago) (1 children)

Wake me up when this says "yes".

[–] Anti_Iridium@lemmy.world 0 points 20 hours ago* (last edited 9 hours ago) (1 children)

Profit ≠ success

Edit: to clarify, even if profitable it will still be a failure of society in some way/shape/form.

[–] potustheplant@feddit.nl 1 points 16 hours ago

Lol

[–] greyscale@lemmy.grey.ooo 3 points 1 day ago

It already happened, small language models are busy dragging their nutsack on frontier models, running on a macbook and costing nothing

Where's the fucking product, Sam?

[–] givesomefucks@lemmy.world -2 points 1 day ago (1 children)

Why can’t we expect that after a few years Claude Sonnet level of capability won’t be possible to accomplish locally?

Because when you're old enough to remember what AIM chat it's could do 25 years ago, it stops being impressive what today's chatbots can do...

It's seems "new" because everyone hated it and it was just a novelty back then.

But if you read up on them, they did 90% of what modern ones do. And if they had access to today's computing, the only explanation for why they still suck so much, is that no one has ever wanted them.

The oligarchs just decided it didn't matter

[–] unpossum@sh.itjust.works 4 points 1 day ago (1 children)

Because when you're old enough to remember what AIM chat it's could do 25 years ago, it stops being impressive what today's chatbots can do...

C’mon, that’s just silly.