I think they know it's a somewhat viable option and is part of the reason they're doing the hardware cartel/circlejerk thing.
Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
-
No low-effort posts. This is subjective and will largely be determined by the community member reports.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
Sure but all these self hosted ais are still done by companies who used massive amounts of power and water to train it.
No.
Even the biggest open weights models are trained on pennies compared to OpenAI and Claude. They just don’t have the hardware to be so wasteful.
In fact, the Nvidia GPU ban was the best thing to ever happen to “small” AI devs. It made them thrifty.
Which is an interesting dilemma: Those AIs are already trained. That power and water was used. If you use them, you will not pollute anything. But you may encourage those companies to train another AI
P100s are dirt cheap on ebay fyi
In practice, they’re not very good because of broken FP16, broken kernels, high idle usage and a bunch of other things.
Same with the AMD MI50 and MI100. Looks great on paper, not practical IRL, unless you want to pay a whole team of software devs to fix them for you.
Better to just save up for a 2080 TI or 3090, sadly.
Huh - cheaper than the P40s (though less VRAM). Good looking out
They rip
I was looking at that. Does it end up faster than something like a 1080?
Numbers about 3-4x. The P100 is near 800 GB/s. The 1080 is what... 192GB/s? Hell, even if it were double that, HBM2 simply has larger bandwidth. The 1080 was a gaming card; the P100 is a server / number cruncher.
Yeah.
It’s not even about efficiency, really, but independence from corporations, privacy, and principle. Kind of like Lemmy.
My issue with the orphan-crushing machine isn't only that it's not in my children's bedroom
not gonna self host bullshit that wastes resources and makes me dumber.
Me, looking at my Jellyfin server…
Oh. Ok.
NO that makes you dumber in a GOOD WAY THO.
People will buy intelligence from us on a meter'
We have governmental surveillance and we have surveillance capitalism. Surveillance capitalism works so well that governments are now very interested in the data they collect, which is alarming. Unfounded conspiracy theory: It's probably one of the reasons that governments don't seem interested in AI's regulation. If I had the proper equipment to run AI entirely local and efficiently so that the expenditure would justify it, I would.
You probably could. A Tesla P4 or P40 (old data centre cards) are more than up to the job. My Lenovo tiny hosts a P4 (card cost $100 on eBay; the lenovo itself was $200ish) and runs Qwen3.5-35B-A3B at about 20 tok/s. Smaller models are even faster.
https://www.youtube.com/watch?v=8F_5pdcD3HY
If you're not bound by the one liter shoebox design, then the P40 is still a great and inexpensive card.
I think I mentioned elsewhere but right now I'm trying to figure out if I can use a magic packet from the Raspberry Pi to wake up the Lenovo as needed rather than leaving it on all the time.
Thing is, if I were going to do in house AI, I'd want to do it up right and from what I can gather, a system like that is going to cost me some jack.
If you're already using node-red, the Wake On Lan node works well, and with node-red it's easy to trigger the magic packet based on whatever trigger condition you want.
The only limitation I know is WOL doesn't work after a power outage, because the switch and RPI doesn't know where to find the target machine
Thanks for the tips on reusable enterprise cards btw
The only limitation I know is WOL doesn't work after a power outage, because the switch and RPI doesn't know where to find the target machine
maybe, but the pi does not need to know that, only the mac address and the interface. the switch doesn't need to know either because it's a broadcast frame, it's forwarded to all cables. the problem sometimes is that if you configure WOL from linux, the network adapter will probably forget on power cycling that it is supposed to react to magic packets. I think not all hardware is susceptible to that, but even then it could help to configure WOL in the BIOS
Maybe something else going on then, but ive never gotten WOL to work after a blackout when there's two switches between sender and receiver. After powering up the receiver once, WOL works again
Switches probably need to figure out which way a particular MAC is (unlike a hub, which just express everywhere). That's the switching part. If they power off, the tables will be empty.
that's probably the BIOS only loading the configuration on the first boot. you could try enabling fast boot or disabling the right energy saving settings in the BIOS and see if that fixes it.
Good tips - thanks!
PS: sad to report the 24GB Tesla p40s are now around $250 USD on eBay, so not quite as cheap as I remembered. P4s are still cheap tho, though frankly if you're going that end of town, a 1080 is about on par, less fussy and probably cheaper - it just won't fit in a uSFF.
Does anyone have a recommendation for a local model that can run well on a 5070 12GB? It pretty much would only get used for help with homelabbing and simple scripts.
Depends on how much CPU RAM you have, and how fast it is.
As others said, Qwen 35B at the very least. But you can get better models with more CPU RAM.
Ive got 32GB DDR5 6000mhz
Probably Qwen 35B then. ~9GB free VRAM + (let's say) ~16GB of free CPU RAM is a good size for that, and squeezing bigger models in would be hard unless it's a headless linux server.
There's an argument to be had regarding a MoE versus a small dense model. I guess it depends on what exactly you need doing with it. I would be tempted to run a smaller dense model (like a Qwen 3-14B or a Qwen 3.5 9B) as at a reasonable quant, it might fit mostly or entirely on the GPU, thereby giving you excellent speeds.
PS: I'm actually in the process of designing an expert system (not a LLM) for pretty much the task you described. The intention is that you would still interact with it like a large language model, but the actual brains underneath it would be something more traditional.
MoEs can be very fast with hybrid inference. I run Xiaomi Mimo 2.5 (a 310B model, 116GB weights) on my single 3090 + 7800 CPU, and it outputs faster than I can read it.
It's also easier to fit long context, if you need that.
It's best to use the ik_llama.cpp fork for that, though. It gives a huge boost to hybrid MoE speeds.
Qwen 3.6-35B-A3B (which OP mentioned) would work great as long as you have some system RAM to offload it.
Altman can try to hype up how everyones going to subscribe to them someday all the while their subscriber base is being eaten up by competitors.
Local stuff. I still believe the small parameter, ~1B free local, ones will suffice for the vast majority of how people use LLMs and there's still going to be a few years of improvements there until investments dry up. Eventually I bet more and more phone companies will include one of these small ones out the box. Pretty much like a nice search engine that works offline like if you're out on a major hike. Cloud stuff, there'll be stuff like Proton's Lumo where they're taking free open weight stuff and piecing them together for users.
OpenAI's thing is they'll make up for falling subscribers with advertising. So pretty much we're advancing fast in the search engine race of the 90s/early aughts. We'll at least have Gemini. ChatGPT maybe ends up crashes in value someday and bought up by Microsoft or some other company. Deepseek, Qwen, Kimi. Claude like ChatGPT maybe survices or crashes and gets adsorbed by another company. Proton continue to exist as the company making AI products out of free stuff. Eventually the pace of improvements moves at a crawl and it's pointless to be paying for the best paywalled stuff. Just use the free stuff like how everyone mostly uses free search engines
Agree. And re small models - very agree. In fact I made a ablated version of Qwen 3.5-2B for use with my pi, before thinking a bit harder and realising I can probably code something bespoke that doesn't need a stochastic parrot as a squwake box at all.
https://huggingface.co/BobbyLLM/polaris-heretic-Q4_K_M-GGUF
Still, as a SLM, it's perfectly cromulent and does well with tool calling etc which is what I wanted it for.
You're still paying for electricity and a big part of the world is in a electricity crisis. "AI" has few real uses and LLMs are not one of them.
This is a “feel guilty about missing recycling” kind of complaint.
Having a server run for an hour or two (?) a day is negligible. You use more energy running a fridge, or leaving a few lights on, or browsing Lemmy for a while. Or running a docker container for other services. You release more greenhouse gasses eating beef, or driving anywhere, or even opening your front door a few times, and individual industries are going to use vastly more electricity than a few self hosters ever would. If you own an EV, you’ve probably blown out your entire zip code of self hosters.
…But if it still bothers you, you can find an ewaste smartphone(s) and host on that. This is actually a very neat use case IMO.
However, if you get to the homelab scale of “an EPYC + 3090s running all the time” that electricity use does start to add up. But that’s quite a rare hobbyist tier, I’d say, and it really shouldnt be running 24/7.