People Twitter

10036 readers

513 users here now

People tweeting stuff. We allow tweets from anyone.

RULES:

Mark NSFW content.
No doxxing people.
Must be a pic of the tweet or similar. No direct links to the tweet.
No bullying or international politcs
Be excellent to each other.
Provide an archived link to the tweet (or similar) being shown if it's a major figure or a politician. Archive.is the best way.

founded 2 years ago

MODERATORS

SendMeYourTaTas@sh.itjust.works

pelespirit@sh.itjust.works

808

Managers (media.piefed.zip)

submitted 2 days ago* (last edited 2 days ago) by inari@piefed.zip to c/whitepeopletwitter@sh.itjust.works

170 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] theunknownmuncher@lemmy.world 5 points 1 day ago* (last edited 1 day ago) (1 children)

Probably more expensive than the subsidized costs.

Of course, but that's exactly the problem. OpenAI and Anthropic are preparing to IPO, so they must now demonstrate profits on inference. The time to take advantage of subsidized compute is in the past, and the subscription and per-token prices that they offer for inference are skyrocketing, overwhelming the budgets of companies that somehow did not see this bait-and-switch pricing coming.

per vibecoding dev

No lol. These same hardware requirements would apply to the cloud hosted models as well, so if that's how it worked, you're suggesting that Anthropic, OpenAI, Meta, and Google have purchased ~14 H100 GPUs per user that they serve???

That would be literally billions of GPUs, while it is estimated that in 2024, Google's AI division owned only 26,000 H100 GPUs and Meta owned the most H100 GPUs of any company at 350,000 units. These GPUs have very high throughput for inference and can serve many users, because that is exactly what they have been designed to do.

I’m not sure the cloud models use quantization

they absolutely do, yeah

[–] sobchak@programming.dev 0 points 1 day ago (1 children)

14 H100 GPUs per user that they serve

Not per user, but probably decent rough estimate to that per vibecoding dev that is continually running agents 8+ hours/day. Some people's "workflows" involve running multiple parallel agents sometimes or even a significant portion of the time (using the git worktree feature), so I think that's probably a decent rough estimate. I imagine the limit would be serving 10 of these types of "devs." Of course, there's batching and stuff that can be done, but I think it still slows everybody else down near linearly. H100s aren't the only accelerators used for inference; I just chose it as an example. Google has ~5 million H100 equivalent accelerators, Microsoft has 3.5 million, and Amazon has 2.5 million (https://www.networkworld.com/article/4156949/google-owns-the-most-ai-compute-and-it-built-it-its-way.html).

[–] theunknownmuncher@lemmy.world 2 points 1 day ago* (last edited 1 day ago)

Even so, your numbers are still a tiny fraction of GPU units compared to concurrent users, and the limit you "imagine" is just that, imagined.

And you do need to remember that the majority of the compute at these companies is used for model training and not used for inference.