this post was submitted on 03 Jun 2026
808 points (99.6% liked)

People Twitter

10036 readers
524 users here now

People tweeting stuff. We allow tweets from anyone.

RULES:

  1. Mark NSFW content.
  2. No doxxing people.
  3. Must be a pic of the tweet or similar. No direct links to the tweet.
  4. No bullying or international politcs
  5. Be excellent to each other.
  6. Provide an archived link to the tweet (or similar) being shown if it's a major figure or a politician. Archive.is the best way.

founded 2 years ago
MODERATORS
808
Managers (media.piefed.zip)
submitted 2 days ago* (last edited 2 days ago) by inari@piefed.zip to c/whitepeopletwitter@sh.itjust.works
 
you are viewing a single comment's thread
view the rest of the comments
[–] Lysergid@lemmy.ml 32 points 1 day ago (5 children)

Honestly IDK why companies especially medium-big don’t do this. They could plug in RAG with internal/confidential data and have better results and security. I guess question is what is capital plus maintenance cost of running such infra for say 10k+ employees

[–] jj4211@lemmy.world 1 points 12 hours ago

Because in the feeding frenzy, every company with a product/marketing budget is trying to make the customers pay by the token and companies are doing jack to help "mere mortal" companies get going with this stuff on premise.

You are right that the technical hurdles are not insane to get this going, but most companies don't know where to begin and there's no huge marketing blitz telling the business leaders this is realistically on the table and here's the company you can call to make it happen for you.

Even if you overcame that and proposed really how to get going, you will still probably hit the aversion to capex that has persisted since Amazon told the industry that capex is toxic and you really want all your money to be spent on opex. Big companies like Amazon will take on that scary CapEx for you and you're expenses will be nice and just OpEx. Coincidentally, the companies that spend the most on CapEx manage to pull in more revenue and profit than you will ever dream to, but still, remember CapEx is toxic.

[–] Zos_Kia@jlai.lu 19 points 1 day ago (1 children)

I think the issue is also that you need some serious hardware to get good inference speed when your devs are working, but then most of the time this hardware will be under utilized.

That being said you can get good performance from indie inference farms, at a fraction of the cost of the big US labs. I think it's a great compromise and in a few months the open models will be near parity with opus 4.6 which is really all you need for most tasks.

[–] plyth@feddit.org 6 points 1 day ago (1 children)

opus 4.6 which is really all you need for most tasks.

The same tasks that can fit into 640KB.

[–] Zos_Kia@jlai.lu 2 points 1 day ago (1 children)

Not sure what you're referring to?

[–] EisFrei@lemmy.world 4 points 1 day ago (1 children)
[–] Zos_Kia@jlai.lu 0 points 1 day ago (1 children)

Aha thanks for sharing that's a cool anecdote. But i think my point still stands, as there are thresholds effects in LLM "intelligence" which don't directly map to the RAM comparison.

Opus 4.6 is comparable to a mid-level developer. It requires some guidance and will sometimes get things wrong, but is also suitable to work in most business environments: most projects are not that complicated or high stakes in the first place.

In the future you'll probably have Opus 7.5 or some shit, which will be at a mega-senior level but also considerably more expensive. And given the price difference, companies will suddenly discover that they don't really need expert level coding at a high price tag, and that a reliable workhorse at a fraction of the cost is largely enough for their needs.

[–] jj4211@lemmy.world 1 points 12 hours ago (1 children)

Opus 4.6 is comparable to a mid-level developer.

Not really...

Yes, it pays attention to certain details that humans will tend to flub, so it's better than juniors when it comes to that...

But broadly speaking, it's a moron. It's like a junior dev pasting 15 year old stack overflow answers into a project, but better at making it fit in, but still doing pretty dumb approaches.

I spent a bunch of tokens to try to get Opus 4.7 to do a task for me last week. The result had mistakes and the test case that should be near instant took 3 minutes to complete (indicating that a user would be staring at a spinner for 3 minutes). It did save me the trouble of trying to figure out the details basic structure of the thing I was going to interact with (the documentation was dense and lacking specific examples, and Opus did output something that let me see how it basically worked in a to-the-point way), but I had to rewrite the "meat" of the task to get correct execution in under a second.

In the future you’ll probably have Opus 7.5 or some shit, which will be at a mega-senior

My impression has been less about it being more "senior" over time and more about being able to consistently deliver junior level work for longer amounts of output. Error rate remains problematic so you end up with more to review that in a way tortuously "looks right" for longer. When it digs itself into a hole, it's very bad at trying to amend the mess that has accumulated.

[–] Zos_Kia@jlai.lu 1 points 11 hours ago

I mean obviously mileage does vary from project to project and task to task, but i think you might be overestimating mid-level developers. Or you've been really lucky with your recruitment ! Cause i would describe them just the way you described Opus. Pretty eager, kind of try-hard, decent engineering chops but often misdirected with dumb approaches.

Of course my experience is limited and i've never really been in a managing role but i've been the adult in a fair number of rooms and i've done my share of "grooming sprints" and dispatching tasks.

That being said, there are projects that are horribly reluctant to agentic coding. It's pretty rare as most codebases nowadays are bog standard and rely on roughly the same abstractions, but i've seen it happen. It can come from the complexity of the domain, or of the codebase, or from the way documentation and tribal knowledge clash, or a myriad other reasons. Often it's the kind of projects that require more mature devs and can't really onboard juniors/mids.

When it digs itself into a hole, it’s very bad at trying to amend the mess that has accumulated

Oh yeah definitely. Once it's in the hole you better scratch that branch off and restart with more specific instructions cause agents are very "additive", they don't often think to remove stuff and change their approach. Again, kind of like mid devs once they're committed to an implementation plan.

[–] bountygiver@lemmy.ml 3 points 1 day ago

Because the people selling the AI wants to make sure their customers don't know about this. It's all about causing a dependency so they get subscription income forever.

[–] MalReynolds@slrpnk.net 10 points 1 day ago* (last edited 1 day ago)

Bigs definitely do, and anyone with confidential data should be.

[–] sobchak@programming.dev 0 points 1 day ago* (last edited 1 day ago) (1 children)

Probably more expensive than the subsidized costs. Hmm...

H100 GPUs cost $25k, and have 80GB of RAM. Kimi k2.6 has 1.1T parameters. Assuming 8 bit quantization, would need 14 GPUs to run a single agent at a time (I'm not sure the cloud models use quantization; it could be double). So, $350k per vibecoding dev on GPUs alone. Life expectancy is ~4 years, so ~90k/year amortized. This is ignoring the significant electrical/HVAC cost of handling 10KW of electricity and heat per vibecoding dev (and tons of other costs as well).

[–] theunknownmuncher@lemmy.world 5 points 1 day ago* (last edited 1 day ago) (1 children)

Probably more expensive than the subsidized costs.

Of course, but that's exactly the problem. OpenAI and Anthropic are preparing to IPO, so they must now demonstrate profits on inference. The time to take advantage of subsidized compute is in the past, and the subscription and per-token prices that they offer for inference are skyrocketing, overwhelming the budgets of companies that somehow did not see this bait-and-switch pricing coming.

per vibecoding dev

No lol. These same hardware requirements would apply to the cloud hosted models as well, so if that's how it worked, you're suggesting that Anthropic, OpenAI, Meta, and Google have purchased ~14 H100 GPUs per user that they serve???

That would be literally billions of GPUs, while it is estimated that in 2024, Google's AI division owned only 26,000 H100 GPUs and Meta owned the most H100 GPUs of any company at 350,000 units. These GPUs have very high throughput for inference and can serve many users, because that is exactly what they have been designed to do.

I’m not sure the cloud models use quantization

they absolutely do, yeah

[–] sobchak@programming.dev 0 points 1 day ago (1 children)

14 H100 GPUs per user that they serve

Not per user, but probably decent rough estimate to that per vibecoding dev that is continually running agents 8+ hours/day. Some people's "workflows" involve running multiple parallel agents sometimes or even a significant portion of the time (using the git worktree feature), so I think that's probably a decent rough estimate. I imagine the limit would be serving 10 of these types of "devs." Of course, there's batching and stuff that can be done, but I think it still slows everybody else down near linearly. H100s aren't the only accelerators used for inference; I just chose it as an example. Google has ~5 million H100 equivalent accelerators, Microsoft has 3.5 million, and Amazon has 2.5 million (https://www.networkworld.com/article/4156949/google-owns-the-most-ai-compute-and-it-built-it-its-way.html).

[–] theunknownmuncher@lemmy.world 2 points 1 day ago* (last edited 1 day ago)

Even so, your numbers are still a tiny fraction of GPU units compared to concurrent users, and the limit you "imagine" is just that, imagined.

And you do need to remember that the majority of the compute at these companies is used for model training and not used for inference.