Technology

1133 readers

40 users here now

A tech news sub for communists

founded 2 years ago

MODERATORS

muad_dibber@lemmygrad.ml

How to Run Deepseek-R1-0528 Locally (unsloth.ai)

submitted 1 week ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

5 comments fedilink hide all child comments

top 5 comments

sorted by: hot top controversial new old

[–] ksynwa@lemmygrad.ml 4 points 1 week ago (2 children)

For DeepSeek-R1-0528-Qwen3-8B, the model can pretty much fit in any setup, and even those with as less as 20GB RAM.

What's the target audience here?

[–] yogthos@lemmygrad.ml 6 points 1 week ago (1 children)

People with 20+ gigs of ram who want to run local models?

[–] ksynwa@lemmygrad.ml 2 points 1 week ago (1 children)

Generally speaking would people want to run it on their own personal hardware? Or are there VPS-like services which allow people to run their own models?

[–] burlemarx@lemmygrad.ml 3 points 1 week ago

For sure there are VPS services available for this. However, it's sad how much we need a powerful rig with 32GB RAM just to do regular development or test new FOSS stuff these days (for the sake of being melodramatic).

[–] KrasnaiaZvezda@lemmygrad.ml 4 points 1 week ago

First of all, DeepSeek-R1-0528-Qwen3-8B is not the Deepseek model people refer to when talking about Deepseek, so that's misleading to say the least. The actual Deepseek model is the 671B parameter model which they breafly mention but is not the main topic of the article as one would assume from the title. That model is really good, the best open source and one of the best in general, and it is possible to run locally, but requires some 200GB RAM/VRAM to run at the smallest qualities and 800GB+ RAM/VRAM if running at full quality.

As for the model the article is about and that you mentioned, it is based on the Qwen3-8B model which can be run in as little as ~5GB available RAM/VRAM quantized to q4_k_m, ie. it can be run on computers and even in some phones.

As for the target audience, anyone wanting privacy in their LLM uses or simply not paying for an API access for use in automation tasks or research. As this is a thinking version though it will take quite a few tokens to get to an answer, so it's better for people that have a GPU or those who simply need something more powerful locally sometimes.