LocalLLaMA

2957 readers

1 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

noneabove1182@sh.itjust.works

Smokeydope@lemmy.world

MonsterBug@sh.itjust.works

What model to grade practice test? (sh.itjust.works)

submitted 3 days ago by HumanPerson@sh.itjust.works to c/localllama@sh.itjust.works

11 comments fedilink hide all child comments

I took a practice test (math) and would like to have it be graded by a LLM since I can't find the key online. I have 20GB VRAM, but I'm on intel Arc so I can't do gemma3. I would prefer models from ollama.com 'cause I'm not deep enough down the rabbit hole to try huggingface stuff yet and don't have time to right now.

you are viewing a single comment's thread
view the rest of the comments

[–] Smokeydope@lemmy.world 1 points 2 days ago* (last edited 2 days ago)

Models running on gguf should all work with your gpu assuming its set up correctly and properly loaded into the vram. It shouldnt matter if its qwen or mistral or gemma or llama or llava or stable diffusion. Maybe the engine you are using isnt properly configured to use your arc card so its all just running on your regular ram which limits things? Idk.

Intel arc gpu might work with kobold and vulcan without any extra technical setup. Its not as deep in the rabbit hole as you may think, a lot of work was put in to making one click executables with nice guis that the average person can work with..

Models

Find a bartowlski made quantized gguf of the model you want to use. Q4_km is recommended average quant to try first. Try to make sure it all can fit within your card size wise for speed. Shouldnt be a big problem for you with 20gb vram to play with. Hugging face gives the size in gb next to each quant.

Start small with like high quant of qwen 3 8b. Then a gemma 12b, then work your way up to a medium quant of deephermes 24b.

Thinking models are better at math and logical problem solving. But you need to know how to communicate and work with llms to get good results no matter what. Ask it to break down a problem you already solved and test it for comprehension.

kobold engine

Download kobold.cpp, execute it like a regular program and adjust settings in graphical interface that pops up. Or make a startup script with flags.

For input processing library, see if Vulcan processing works with Intel arc. Make sure flash attention is enabled too. Offload all layers of the model I make note of exactly how many layers each model has during startup and specify it but it should figure it out smartly even if not.