this post was submitted on 01 Jul 2025
2115 points (98.4% liked)

Microblog Memes

8420 readers
1434 users here now

A place to share screenshots of Microblog posts, whether from Mastodon, tumblr, ~~Twitter~~ X, KBin, Threads or elsewhere.

Created as an evolution of White People Twitter and other tweet-capture subreddits.

Rules:

  1. Please put at least one word relevant to the post in the post title.
  2. Be nice.
  3. No advertising, brand promotion or guerilla marketing.
  4. Posters are encouraged to link to the toot or tweet etc in the description of posts.

Related communities:

founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] jsomae@lemmy.ml 12 points 4 days ago* (last edited 4 days ago) (62 children)

I know she's exaggerating but this post yet again underscores how nobody understands that it is training AI which is computationally expensive. Deployment of an AI model is a comparable power draw to running a high-end videogame. How can people hope to fight back against things they don't understand?

[–] FooBarrington@lemmy.world 20 points 4 days ago (29 children)

It's closer to running 8 high-end video games at once. Sure, from a scale perspective it's further removed from training, but it's still fairly expensive.

[–] brucethemoose@lemmy.world 1 points 4 days ago* (last edited 4 days ago)

Not at all. Not even close.

Image generation is usually batched and takes seconds, so 700W (a single H100 SXM) for a few seconds for a batch of a few images to multiple users. Maybe more for the absolute biggest (but SFW, no porn) models.

LLM generation takes more VRAM, but is MUCH more compute-light. Typically one has banks of 8 GPUs in multiple servers serving many, many users at once. Even my lowly RTX 3090 can serve 8+ users in parallel with TabbyAPI (and modestly sized model) before becoming more compute bound.

So in a nutshell, imagegen (on an 80GB H100) is probably more like 1/4-1/8 of a video game at once (not 8 at once), and only for a few seconds.

Text generation is similarly efficient, if not more. Responses take longer (many seconds, except on special hardware like Cerebras CS-2s), but it parallelized over dozens of users per GPU.


This is excluding more specialized hardware like Google's TPUs, Huawei NPUs, Cerebras CS-2s and so on. These are clocked far more efficiently than Nvidia/AMD GPUs.


...The worst are probably video generation models. These are extremely compute intense and take a long time (at the moment), so you are burning like a few minutes of gaming time per output.

ollama/sd-web-ui are terrible analogs for all this because they are single user, and relatively unoptimized.

load more comments (28 replies)
load more comments (60 replies)