Selfhosted

50711 readers

647 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
No low-effort posts. This is subjective and will largely be determined by the community member reports.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz

259

I wanted Claude Code-style workflows without sending code to the cloud, so I built Coyote (lemmy.world)

submitted 4 days ago* (last edited 2 days ago) by aclarke@lemmy.world to c/selfhosted@lemmy.world

72 comments fedilink hide all child comments

For the longest time, I've been trying to figure out a way to "survive" in this new AI age without having to fork over a ton of money just to keep up. I've tried using local models via Ollama, and while they definitely work to a degree, they're (unsurprisingly) not as good as the big model providers.

The local models tend to

Forget what they're doing
Struggle to break larger tasks into smaller ones
Lose focus easily
Have weaker coding performance
Drift over longer sessions

So to improve the reliability of fully local, smaller models (and to keep all my data local and in my own network), I created Coyote.

It's a local-first, batteries-included command line tool and runtime for building and running LLM workflows locally. It's model agnostic and supports things like

Agents and agent delegation
Roles/personas
MCP Servers
RAG
Custom tools
Macros
Workflow Scripting

A lot of the features it supports are specifically designed to compensate for weaknesses in smaller local models. For example:

Auto continuation to keep pushing models to completion instead of stopping halfway through problems
Parallel agent delegation so tasks can be split into smaller, focused scopes
Workflow-based execution ("If this, do that") for building more reliable and repeatable automations

It also supports the major cloud providers if you want them (which definitely helped while testing 😄), but my long-term goal is simple:

Get as close as possible to Claude Code-style reliability using fully local models.

I'm always open to feedback, questions, or ideas.

Repo: https://github.com/Dark-Alex-17/coyote

you are viewing a single comment's thread
view the rest of the comments

[–] JollyForeheadRidges@lemmy.zip 1 points 3 days ago (2 children)

Crap. I was just starting to play with Ollama and thought it might be a good balance between running local models and using one of the proprietary services.

Could you elaborate on what's happening with them / what to watch out for?

[–] MalReynolds@slrpnk.net 5 points 3 days ago

If it gets you started with local models, by all means go ahead, their onboarding is the easiest and it works. Also a lot of 3rd party stuff uses it as a first class citizen allowing you to try out other things (e.g. Open WebUI) easily as you explore what's possible. Currently try the Qwen 3.6 and Gemma4 models as best bang for buck, somewhere there's a does it fit in my machine website that can help (search for it).

That said, basically all roads in local LLM lead to llama.cpp, which gets the innovations first and then others copy their homework. Ollama (looks like they're angling to go commercial) for a long time used it internally without attribution, now they use a bodged up engine of their own that is less performant and almost certainly a copy (possibly vibe coded) of llama.cpp. They heavily encourage using their own models / quantizations and don't let you play with a lot of parameters without a lot of friction (possibly because they're not implemented yet, but who knows, low transparency). You get the picture, wannabe techbros. That's off the top of my head, search for more authoritative sources.

After you've gotten the hang of things, have a look at llama-swap which just wraps llama.cpp, lemonade if you're on AMD, vLLM for nvidia, LM Studio for mac.

[–] boonhet@sopuli.xyz 1 points 3 days ago

I suggest using unsloth studio to get a friendly GUI for not just downloading models and running inference but also finetuning and such. Underneath it just uses llama.cpp which is supported by a lot of apps but it also adds other APIs IIRC. You can run claude code, github codex, mistral vibe off either the llama.cpp API or the unsloth API depending on which agent you're using and they've got tutorials for setting those up. Other tools too.

That's not to say it's the only one or the best one, but I really like the UI, because it's both simple and advanced (if you look for it, you can set KV cache type, temperature, etc, but you can also run default settings without ever looking at the advanced stuff).