LocalLLaMA

3427 readers

2 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

noneabove1182@sh.itjust.works

Smokeydope@lemmy.world

MonsterBug@sh.itjust.works

[Help] Trying to run a local Story telling model with KoboldCpp (kbin.social)

submitted 2 years ago* (last edited 2 years ago) by darkeox@kbin.social to c/localllama@sh.itjust.works

16 comments fedilink hide all child comments

Hi,

Just like the title says:

I'm try to run:

https://huggingface.co/TheBloke/WizardLM-Uncensored-SuperCOT-StoryTelling-30B-SuperHOT-8K-GGML

With:

koboldcpp:v1.43 using HIPBLAS on a 7900XTX / Arch Linux

Running :

--stream --unbantokens --threads 8 --usecublas normal

I get very limited output with lots of repetition.

Illustrattion

I mostly didn't touch the default settings:

Settings

Does anyone know how I can make things run better?

EDIT: Sorry for multiple posts, Fediverse bugged out.

you are viewing a single comment's thread
view the rest of the comments

[–] darkeox@kbin.social 2 points 2 years ago (1 children)

Don't be sorry, you're being so helpful, thank you a lot.

I finally replicated your config:

localhost/koboldcpp:v1.43 --port 80 --threads 4 --contextsize 8192 --useclblas 0 0 --smartcontext --ropeconfig 1.0 32000 --stream "/app/models/mythomax-l2-kimiko-v2-13b.Q5_K_M.gguf"

And had satisfying results! The performance of LLaMA2 really is nice to have here as well.

[–] rufus@discuss.tchncs.de 1 points 2 years ago* (last edited 2 years ago) (1 children)

Looks good to me.

For reference: I think i got the settings in my screenshot from Reddit. But they seem to have updated the post since. The current recommended settings have a temperature and some other settings that are closer to what I've seen in the default settings. I've tested those (new to me) settings and they also work for me. Maybe I also adapted the settings from here.

And I've linked a 33b MythoMax model in the previous post that's probably not working properly. I've edited that part and crossed it out. But you seem to use a 13b version anyways. That's good.

I've tried a few models today. I think another promising model for writing stories is Athena. For your information: I get inspiration from this list. But beware, that's for ERP, so erotic role play. So some models from that ranking are probably not safe for work (or for minors). But other benchmarks often test for factual knowledge and answering questions. And in my experience the models good at those things are not necessarily good at creative tasks. But that's more my belief. I don't know if it's actually true. And this ranking also isn't very scientific.

[–] darkeox@kbin.social 2 points 2 years ago (1 children)

Ah thank you for the trove of information. What would be the best general knowledge model according to you?

[–] rufus@discuss.tchncs.de 1 points 2 years ago* (last edited 2 years ago) (1 children)

Well, I'm not that up to date anymore. I think MythoMax 13b is pretty solid. Also for knowledge. But I can't be bothered anymore to read up on things twice weekly. That news is probably already 3 weeks old and there will be a (slightly) better one out there now. And it gets outperformed by pretty much every one of the big 70b models. But I can't run them on my hardware, so I wouldn't know.

This benchmark ranks them by several scientific tests. You can hide the 70b models and scarlett-33b seems to be a good contender. Or the older Platypus models directly below. But be cautious, sometimes these models look better on paper than they really are.

Also regarding 'knowledge': I don't know about your application. Just in case you're not aware of this... Language models hallucinate and regularly just make up stuff. Even expensive and big models will do this. The models we play with, even more so. Just be aware of it.

And lastly: There is another good community here on Lemmy: !fosai@lemmy.world You can find a few tutorials and more people there, too. And have a look at the 'About' section or stickied posts there. They linked more benchmarks and info.

[–] darkeox@kbin.social 2 points 2 years ago

Alright, thanks for the info & additional pointers.