But is anyone who voted for Trump going to see this, and if they do, worry?
I don’t know a good alternative, but it feels like they’re preaching to the choir.
But is anyone who voted for Trump going to see this, and if they do, worry?
I don’t know a good alternative, but it feels like they’re preaching to the choir.
One thing about Anthropic/OpenAI models is they go off the rails with lots of conversation turns or long contexts. Like when they need to remember a lot of vending machine conversation I guess.
A more objective look: https://arxiv.org/abs/2505.06120v1
https://github.com/NVIDIA/RULER
Gemini is much better. TBH the only models I’ve seen that are half decent at this are:
“Alternate attention” models like Gemini, Jamba Large or Falcon H1, depending on the iteration. Some recent versions of Gemini kinda lose this, then get it back.
Models finetuned specifically for this, like roleplay models or the Samantha model trained on therapy-style chat.
But most models are overtuned for oneshots like fix this table or write me a function, and don’t invest much in long context performance because it’s not very flashy.
And 500 are about to get fired?
Most of the US believes in this, or is just unaware. That's how its been for most of history around the world.
...The remarkable issue here is the elites/rules we handed the reigns now drink their own kool-aid. The very top of most authoritarian regimes are at least cognisant of some hypocrisy, even if ideology eats them some.
The other is that people are more 'connected' than ever, but to disinformation streams. I feel like a lot of the world (especially the US fancies) themselves as super smart on shit they know nothing about because of something they saw on Facebook or YouTube.
Not at all. Not even close.
Image generation is usually batched and takes seconds, so 700W (a single H100 SXM) for a few seconds for a batch of a few images to multiple users. Maybe more for the absolute biggest (but SFW, no porn) models.
LLM generation takes more VRAM, but is MUCH more compute-light. Typically one has banks of 8 GPUs in multiple servers serving many, many users at once. Even my lowly RTX 3090 can serve 8+ users in parallel with TabbyAPI (and modestly sized model) before becoming more compute bound.
So in a nutshell, imagegen (on an 80GB H100) is probably more like 1/4-1/8 of a video game at once (not 8 at once), and only for a few seconds.
Text generation is similarly efficient, if not more. Responses take longer (many seconds, except on special hardware like Cerebras CS-2s), but it parallelized over dozens of users per GPU.
This is excluding more specialized hardware like Google's TPUs, Huawei NPUs, Cerebras CS-2s and so on. These are clocked far more efficiently than Nvidia/AMD GPUs.
...The worst are probably video generation models. These are extremely compute intense and take a long time (at the moment), so you are burning like a few minutes of gaming time per output.
ollama/sd-web-ui are terrible analogs for all this because they are single user, and relatively unoptimized.
TBH most people still use old SDXL finetunes for porn, even with the availability of newer ones.
Also, one other thing is that Nvidia clocks their GPUs (aka the world's AI accelerators) very inefficiently, because they have a pseudo monopoly, and they can.
It doesn't have to be this way, and likely wont in the future.
The UC paper above touches on that. I will link a better one if I find it.
But specifically:
streaming services
Almost all the power from this is from internet infrastructure and the end device. Encoding videos (for them to be played thousands/millions of times) is basically free since its only done once, with the exception being YouTube (which is still very efficient). Storage servers can handle tons of clients (hence they're dirt cheap), and (last I heard) Netflix even uses local cache boxes to shorten the distance.
TBH it must be less per capita than CRTs. Old TVs burned power like crazy.
Bingo.
Altman et al want to kill open source AI for a monopoly.
This is what the entire AI research space already knew even before deepseek hit, and why they (largely) think so little of Sam Altman.
The real battle in the space is not AI vs no AI, but exclusive use by AI Bros vs. open models that bankrupt them. Which is what I keep trying to tell /c/fuck_ai, as the "no AI" stance plays right into the AI Bro's hands.
I think that’s going a bit far. ML models are tools to augment people, mostly.
Only because of brute force over efficient approaches.
Again, look up Deepseek's FP8/multi GPU training paper, and some of the code they published. They used a microscopic fraction of what OpenAI or X AI are using.
And models like SDXL or Flux are not that expensive to train.
It doesn’t have to be this way, but they can get away with it because being rich covers up internal dysfunction/isolation/whatever. Chinese trainers, and other GPU constrained ones, are forced to be thrifty.
Sadly this means it’s not accessible to like 95% of people, even if driven to install it.