this post was submitted on 23 Feb 2026
187 points (99.5% liked)
Slop.
809 readers
577 users here now
For posting all the anonymous reactionary bullshit that you can't post anywhere else.
Rule 1: All posts must include links to the subject matter, and no identifying information should be redacted.
Rule 2: If your source is a reactionary website, please use archive.is instead of linking directly.
Rule 3: No sectarianism.
Rule 4: TERF/SWERFs Not Welcome
Rule 5: No bigotry of any kind, including ironic bigotry.
Rule 6: Do not post fellow hexbears.
Rule 7: Do not individually target federated instances' admins or moderators.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Also, 16gb VRAM? You'll be able to load a better model like https://huggingface.co/mradermacher/Skyfall-31B-v4-i1-GGUF which is a little stronger than the ones I linked in the guide. If the "i1-Q4_K_S" is too large then try the "i1-IQ4_XS" quant.
Probably try it offloading just the down tensors (top option) in the guide. Make sure your KV Batch size is 1024 (or better) so the context gets offloaded on the GPU faster to cut down on response times. Otherwise everything else in the guide is good for you. If you find you have a little bit of VRAM space at 16k context and 1024 batch size try upping the context a little until your VRAM is like 15GB utilized or better.