Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (donβt cherry-pick information from the title to fit your agenda).
-
No trolling.
-
No low-effort posts. This is subjective and will largely be determined by the community member reports.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
view the rest of the comments
I like that you are so focused on local models but I can't find any info on setting up local models in the clients setup https://github.com/Dark-Alex-17/loki/wiki/Clients
What am I missing?
Edit: well it seems this post is an entirely fictional origin story. Here is the first time OP posted about his project 6 months ago https://piefed.zip/c/rust/p/663115/loki-an-all-in-one-batteries-included-llm-cli
So actually, this was the original purpose of it. But all the help I tried to get on it didn't really have much interest in doing anything outside of the usual big model providers, so I tried advertising a more general use case to attract more input. I can't deny that agnostic support for even the big providers is helpful when you're trying to stay current with the rapid advances in LLMs.
After that, I kind of gave up on getting feedback on local-first models. So, instead, I just dove in head-first the way I wanted;Trying new things, building new agents to try and rival Claude Code, adding features as I found them useful and necessary to improve that reliability, etc., and iterating. Then, with the most recent release on Friday, I had done so many changes and improvements specifically for local models that I thought I finally had a strong enough tool to maybe pique enough people's interest to get some feedback and input. π
Oh, and the config example shows how to add Ollama models here
Ollama is enshittifying at a rate of knots, have you got a way to use llama-server (or preferably llama-swap) instead ?
Looking at Llama-swap, since it says it supports OpenAI-compatible API, it should just work natively already. Just set up the client to be
type: openai-compatibleand fill in the URL and provide the models. Should work out of the box!Hope so, bet it doesn't without some tweaking though, OpenAI-compatible seldom is, and ollama is bad for that. Still, worth checking out, I'll have a go at it sometime soonish and perhaps you'll see a PR (or some doco in the best case scenario).
Looking forward to it! Heads up in case you missed it: I had settled on renaming it to Coyote, so sometime this week will be a breaking change and release to get that done.
Biggest pains are just going to be updating the repo tokens for Crates.io and renaming the homebrew repo.
Crap. I was just starting to play with Ollama and thought it might be a good balance between running local models and using one of the proprietary services.
Could you elaborate on what's happening with them / what to watch out for?
If it gets you started with local models, by all means go ahead, their onboarding is the easiest and it works. Also a lot of 3rd party stuff uses it as a first class citizen allowing you to try out other things (e.g. Open WebUI) easily as you explore what's possible. Currently try the Qwen 3.6 and Gemma4 models as best bang for buck, somewhere there's a does it fit in my machine website that can help (search for it).
That said, basically all roads in local LLM lead to llama.cpp, which gets the innovations first and then others copy their homework. Ollama (looks like they're angling to go commercial) for a long time used it internally without attribution, now they use a bodged up engine of their own that is less performant and almost certainly a copy (possibly vibe coded) of llama.cpp. They heavily encourage using their own models / quantizations and don't let you play with a lot of parameters without a lot of friction (possibly because they're not implemented yet, but who knows, low transparency). You get the picture, wannabe techbros. That's off the top of my head, search for more authoritative sources.
After you've gotten the hang of things, have a look at llama-swap which just wraps llama.cpp, lemonade if you're on AMD, vLLM for nvidia, LM Studio for mac.
I suggest using unsloth studio to get a friendly GUI for not just downloading models and running inference but also finetuning and such. Underneath it just uses llama.cpp which is supported by a lot of apps but it also adds other APIs IIRC. You can run claude code, github codex, mistral vibe off either the llama.cpp API or the unsloth API depending on which agent you're using and they've got tutorials for setting those up. Other tools too.
That's not to say it's the only one or the best one, but I really like the UI, because it's both simple and advanced (if you look for it, you can set KV cache type, temperature, etc, but you can also run default settings without ever looking at the advanced stuff).