this post was submitted on 23 Feb 2026
187 points (99.5% liked)

Slop.

809 readers
577 users here now

For posting all the anonymous reactionary bullshit that you can't post anywhere else.

Rule 1: All posts must include links to the subject matter, and no identifying information should be redacted.

Rule 2: If your source is a reactionary website, please use archive.is instead of linking directly.

Rule 3: No sectarianism.

Rule 4: TERF/SWERFs Not Welcome

Rule 5: No bigotry of any kind, including ironic bigotry.

Rule 6: Do not post fellow hexbears.

Rule 7: Do not individually target federated instances' admins or moderators.

founded 1 year ago
MODERATORS
 

Thanks capitalism for doing the stupidest implementation of this technology possible

you are viewing a single comment's thread
view the rest of the comments
[–] LaughingLion@hexbear.net 1 points 1 week ago* (last edited 1 week ago)

Also, 16gb VRAM? You'll be able to load a better model like https://huggingface.co/mradermacher/Skyfall-31B-v4-i1-GGUF which is a little stronger than the ones I linked in the guide. If the "i1-Q4_K_S" is too large then try the "i1-IQ4_XS" quant.

Probably try it offloading just the down tensors (top option) in the guide. Make sure your KV Batch size is 1024 (or better) so the context gets offloaded on the GPU faster to cut down on response times. Otherwise everything else in the guide is good for you. If you find you have a little bit of VRAM space at 16k context and 1024 batch size try upping the context a little until your VRAM is like 15GB utilized or better.