Technology

1332 readers

3 users here now

A tech news sub for communists

founded 3 years ago

MODERATORS

muad_dibber@lemmygrad.ml

burlemarx@lemmygrad.ml

Philo_and_sophy@lemmygrad.ml

Tencent just released WeDLM 8B, it's a diffusion language model that runs 3-6× faster than vLLM-optimized Qwen3-8B on math reasoning tasks. (huggingface.co)

submitted 2 days ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

2 comments fedilink hide all child comments

instruct model https://huggingface.co/tencent/WeDLM-8B-Instruct

you are viewing a single comment's thread
view the rest of the comments

[–] yogthos@lemmygrad.ml 6 points 2 days ago

From the research I've seen published already, I'm pretty confident that it should be possible to get the same quality with smaller models like 32bln params or maybe even less that we see with full blown 600+bln param models. It seems like a lot of it come down to tracking context out of band so it doesn't need to live in memory, and people have come up with a number of approaches for doing that. We'll probably see more work done on MoE approach as well to efficiently load parts of the model that are actually relevant to the task being worked on. It's also possible we'll see novel approaches like this that could significantly reduce the memory requirements.