this post was submitted on 29 Dec 2025
22 points (100.0% liked)

Technology

1332 readers
3 users here now

A tech news sub for communists

founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] yogthos@lemmygrad.ml 6 points 2 days ago

From the research I've seen published already, I'm pretty confident that it should be possible to get the same quality with smaller models like 32bln params or maybe even less that we see with full blown 600+bln param models. It seems like a lot of it come down to tracking context out of band so it doesn't need to live in memory, and people have come up with a number of approaches for doing that. We'll probably see more work done on MoE approach as well to efficiently load parts of the model that are actually relevant to the task being worked on. It's also possible we'll see novel approaches like this that could significantly reduce the memory requirements.