I know.
Sure, and that's why many cloud providers - even ones that don't train their own models - are only slowly onboarding new customers onto bigger models. Sure. Makes total sense.
Given that cloud providers are desperately trying to get more compute resources, but are limited by chip production - yes, of course? Why do you think they're trying to expand their resources while their existing resources aren't already limited?
Seriously, it feels like it's gotten much worse over the last few months.
My guy, we're not talking about just leaving a model loaded, we're talking about actual usage in a cloud setting with far more GPUs and users involved.
Thanks for the suggestion, I'll have to give that a try!
They are, it'd be uneconomical not to use them fully the whole time. Look up how batching works.
I compared the TDP of an average high-end graphics card with the GPUs required to run big LLMs. Do you disagree?
BCD?