[Help] Trying to run a local Story telling model with KoboldCpp (kbin.social)

submitted 1 year ago* (last edited 1 year ago) by darkeox@kbin.social to c/localllama@sh.itjust.works

16 comments fedilink hide all child comments

Hi,

Just like the title says:

I'm try to run:

https://huggingface.co/TheBloke/WizardLM-Uncensored-SuperCOT-StoryTelling-30B-SuperHOT-8K-GGML

With:

koboldcpp:v1.43 using HIPBLAS on a 7900XTX / Arch Linux

Running :

--stream --unbantokens --threads 8 --usecublas normal

I get very limited output with lots of repetition.

Illustrattion

I mostly didn't touch the default settings:

Settings

Does anyone know how I can make things run better?

EDIT: Sorry for multiple posts, Fediverse bugged out.

you are viewing a single comment's thread
view the rest of the comments

[-] micheal65536@lemmy.micheal65536.duckdns.org 2 points 1 year ago

Yeah, I think you need to set the contextsize and ropeconfig. Documentation isn't completely clear and in some places sort of implies that it should be autodetected based on the model when using a recent version, but the first thing I would try is setting these explicitly as this definitely looks like an encoding issue.

this post was submitted on 11 Sep 2023

9 points (100.0% liked)

LocalLLaMA

2231 readers

3 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

noneabove1182@sh.itjust.works