this post was submitted on 24 Jun 2026

136 points (81.8% liked)

Selfhosted

60093 readers

498 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam.
Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.
Don't duplicate the full text of your blog or git here. Just post the link for folks to click.
Submission headline should match the article title.
No trolling.
Promotion posts require your active participation in selfhosting or related communities, or the post will be removed. No more than 10% of your posts or comments may be self-promotional, or your post will be removed. F/LOSS Exception: If your post is about a project that is completely open source & can be self-hosted in full without payment, and your account is at least 30 days old, your post is exempt from this rule as long as you continue to engage in comments.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago

MODERATORS

curbstickle@anarchist.nexus

curbstickle_lw@lemmy.world

136

Do you host your own AI? (aussie.zone)

submitted 2 days ago by SuspiciousCarrot78@aussie.zone to c/selfhosted@lemmy.world

199 comments fedilink hide all child comments

Do you host your own ML / AI / LLM? What do you use, and what do you use it for?

top 50 comments

sorted by: hot top controversial new old

[–] Kazumara@discuss.tchncs.de 3 points 1 day ago

No, I'm not interested in that topic

[–] fluxx@mander.xyz 13 points 1 day ago

I do, but I am becoming increasingly more disappointed as time goes on. Not just self hosted, llms in general. They sometimes help, but they mislead so many times and waste time that you don't even notice. I think that's the trap. When you succeed at a task, you become impressed but don't notice how many times it failed doing a simple task. And as soon as you scratch the surface, you see how you would have done it differently and perhaps in a better way. Even just googling is bad. It does research for you, but it has no critical thinking and can't decide what is better from the results it gets (other than google ranking) so it often leads you to think it did as good as you would, when it's nowhere near as good. Every time I did the googling myself after it did, I did it much better. And I mean MUCH better. Ask it to find the app, it misses the most important ones, hallucinates a bunch, for ex. I found this to be the case with frontier models as well.

Self hosting has its benefits, but seeing how the ecosystem looks right now, concluding this is a huge bubble is inevitable. It reminds me of crypto so much. It looks rich and plentiful, but as soon as you dig a mm under the surface - nobody has tested it, it's got a critical bug, it is overblown and there are issues with no response. No docs, no info, no nothing. For the biggest thing in technology in history, it is awfully hollow. I don't mean it in a condescending way, in fact community is enthusiastic and very helpful, it's just that it doesn't live up to what most would expect.

A caveat I need to mention is I have not used it for coding - I have an irrational fear and resistance towards it, being a programmer. I just won't touch it, even if it means the end of my career. I'm trying to be grown-up about it, but so far, I dont want to use it, for good and bad reasons.

[–] bier@lemmy.blahaj.zone 3 points 1 day ago

Hell naw my homelab is already sucking way too much power and running too hot.

[–] dotAlexX@lemmy.world 1 points 1 day ago

I would love to run and host a local LLM on my phone just to tinker and learn. I found a tutorial on setting DeepSeek on your Android phone using Termux but it is a year old. I'm sure there are better more efficient LLMs that can run on a phone now.

[–] edgyspazkid@lemmy.wtf 3 points 1 day ago

No I don't. Unforunetly using Claude (asking myself everyday why tf cuz I don't do crazy shit) but trying to move on to LumoAI even meaby will buy a premium version to check this out formyself.

[–] brucethemoose@lemmy.world 68 points 2 days ago* (last edited 2 days ago) (7 children)

An aside for anyone reading this:

https://sleepingrobots.com/dreams/stop-using-ollama/

And that barely scratches the surface. Please.

Use anything but Ollama. Even APIs.

[–] vagabond@lemmy.dbzer0.com 5 points 1 day ago

Didn't know this. Going to switch this weekend, thanks for sharing this!

[–] SchwertImStein@lemmy.dbzer0.com 2 points 1 day ago

thank you

[–] plasma8726@lemmy.today 10 points 2 days ago (3 children)

Thanks for this link. Because of this article, I had claude stand up a llama.cpp container next to my already running ollama container. It ran side by side tests with the same model and parameters, and the results blew ollama out of the water. I'm in the process of moving hermes and openwebgui over to the llama.cpp instance to see how it goes day to day.

load more comments (3 replies)

[–] SuspiciousCarrot78@aussie.zone 11 points 2 days ago (2 children)

Llama.cpp or death!

[–] tristynalxander@mander.xyz 4 points 2 days ago (1 children)

It's not that hard to use llama.cpp directly anyway. Why would I use a wrapper when I can just run a python script?

[–] BlackLaZoR@lemmy.world 2 points 1 day ago* (last edited 1 day ago)

I use LMStudio, because it has quality of life improvements like nice GUI and huggingface search engine. Also they have Vulkan backend that at least on 7900XTX is ~10% faster than rocm (on LLama 3 8b Q4_0 it gets 115Tokens/s vs 105 on rocm)

load more comments (1 replies)

[–] pinball_wizard@lemmy.zip 9 points 2 days ago

I agree that the concerns listed there are smells, and I wasn't aware of some of the options listed there.

Thank you for sharing this!

load more comments (2 replies)

[–] algernon@lemmy.ml 96 points 2 days ago (12 children)

Yes. My Actual Intelligence lives in my head, and runs mostly on coffee.

[–] portifornia@piefed.social 33 points 2 days ago (2 children)

Just coffee?!? That's cool.

Mine runs on:

coffee
spite
tortilla chips
& shame

[–] algernon@lemmy.ml 8 points 2 days ago (2 children)

Mostly on coffee, not exclusively. Noticable amounts of spite & tortilla chips are also present, yes, but... no shame.

[–] Diurnambule@jlai.lu 1 points 1 day ago

I replace tortilla by "raclette" but that cultural.

load more comments (1 replies)

[–] searabbit@piefed.social 11 points 2 days ago

If that's not already on a shirt it should be

[–] tal@lemmy.today 14 points 2 days ago (1 children)

Do you get many hallucinations?

[–] algernon@lemmy.ml 15 points 2 days ago (2 children)

Only when I'm deprived of coffee.

load more comments (2 replies)

load more comments (10 replies)

[–] BlackLaZoR@lemmy.world 2 points 1 day ago (1 children)

I was hosting LLM with LMStudio occasionally but can't access it anymore due to some fuckery with CORS and http vs https in browsers.

[–] irelephant@lemmy.dbzer0.com 2 points 1 day ago (1 children)

I googled it, and it seems like you can just enable cors.

[–] BlackLaZoR@lemmy.world 2 points 1 day ago (2 children)

Yes you can enable cors in LMStudio. But since few months it's blocked by all major web browsers if you aren't using HTTPS.

Which I don't. I had LMStudio server open to local network so I can use it on my phone or laptop via third party website.

load more comments (2 replies)

[–] PetteriPano@lemmy.world 11 points 2 days ago (3 children)

Running qwen3.6 27b through llama.cpp.

It's about as capable as sonnet 3.5.

I use it for light scripting, but real coding is done by cloud models.

I'm also using it as the brain for my Hermes agent. It sends me digests of news, subreddits, chats that I'd like to read but don't have time for. It does a great job researching things on the web for me, too.

load more comments (3 replies)

[–] Strider@lemmy.world 10 points 2 days ago

No. I still have no use for it and everything I use is automated without at a far lower footprint.

[–] orenj@leminal.space 6 points 1 day ago

If I wanted AI for some reason, it'd be self-host or nothing.

[–] domi@lemmy.secnd.me 12 points 2 days ago (8 children)

Yes, I got a Strix Halo machine before the RAM price hike and use it to run all my ML stuff on it.

Currently using llama-swap with llama.cpp/ComfyUI and opencode/Open WebUI as frontend.

I'm running Qwen3.6-27b, Voxtral Mini 4b, Piper and Qwen Image. Also, some embedding and reranking models.

I use them for:

Tagging and classification of my documents in Paperless
Home Assistant (voice assistant)
Translations (both text and image)
Transcriptions
Some light coding and debugging
Avatar/Backdrop generation for DnD sessions

load more comments (8 replies)

[–] frongt@lemmy.zip 43 points 2 days ago (8 children)

Yes. Openwebui/ollama for LLM, comfyui for stable diffusion. I just dick around with it as a toy.

load more comments (8 replies)

[–] Sabata11792@ani.social 5 points 1 day ago* (last edited 1 day ago)

Running decencored Qwen3.6-27b and a 9b Gemma for RAG and scrapes on Ollama with a mostly vibe coded discord bot. Just got it to run tools and scrape and post news on a schedule. The first model I can run locally that's smart enough to be useful. May give Jan a try for the back end after reading that other guys rant.

Mostly use it for stupid questions I could have googled and to brag to friends.

[–] D_Air1@lemmy.ml 23 points 2 days ago

Yeah, I'm using qwen 31b a3b on an amd 9070xt requires a bit of cpu offloading, but still plenty fast. Using it wall llama.cpp. Combine that with some mcp's such as ddg-search to make it truly useful by actually being able to search online.

I mostly use it for small tedious tasks with well defined inputs and outputs. For example when hyprland recently changed from their own configuration language to lua. At first I started going line by line translating my config to the new lua language until I realized oh wait this is exactly the type of thing that ML is useful for. Going from the well defined hyprland configuration language to their also well defined lua syntax. It banged it out in less than a minute with only a single mistake which I easily fixed. The mistake it made was that it forgot to translate the comments to lua. It did it in less than a minute and worked first try. Where as I had made several typos and gotten a few lines wrong when I was doing it by hand.

Not to say that I couldn't do it. I would have gotten it done in about half an hour, but less than a minute is a lot faster.

I also used it to transform a bunch of unstructured data into json data, so that I could then use purpose built tools like jq to parse that. If I'm having trouble finding certain information. I'll ask it to find me some resources to look at.

Basically small well defined tasks and parsing data is what I use it for and it seems to be pretty good at that.

What I don't like is the way companies try to market it to people. I don't believe people should be trying to summarize emails or messages from loved ones, writing essays or any other creative tasks for the most part. Translating is okay. I don't expect a machine to be able to decide things for me or to be some filter between me and others.

[–] robber@lemmy.ml 5 points 2 days ago

I currently run Qwen3.6-27b on llama.cpp and use it via openwebui. Mostly, I use it for web research via tavily, to a lesser extent for coding and interactively learning about things that are new to me but common in training data (such as basic math or ML concepts).

[–] Jakeroxs@sh.itjust.works 5 points 2 days ago

Yes, llama-swap and I use it for home assistant text-gen notifications, basic coding tasks, etc

If anyone here self-hosts definitely check out llama-swap as it has some nifty features for hotswapping LLMs, image generation models and voice models.

[–] chaospatterns@lemmy.world 4 points 2 days ago* (last edited 2 days ago)

Partially. I started with hosting my own llama3.2 + granite4 models using Ollama for my Home Assistant smart home and for general chat with OpenWebUI. I also run whisper for speech-to-text locally on my 1080 Ti GPU. I like the privacy and ownership of my self-hosted models, but I started to run into limitations with the small weights. So I built some tools that allow me to selectively route traffic to larger models hosted on DeepInfra depending on my need. For example, to GLM/Kimi models for code reviews or for my custom harnesses or harder problems.

[–] slazer2au@lemmy.world 17 points 2 days ago

Nope.

[–] jaykrown@lemmy.world 4 points 2 days ago

I hosted Qwen 3.5 9b uncensored on my site at https://masland.tech/ for a while. I didn't really use it and no one else used it so I took it down. These days I'm spending most of my time finding uses for AI and accessibility. One of the next things I'm planning is a video to text reasoning system, primarily for the purpose of grading used electronic devices.

load more comments