You are right. On the one hand, it's kind of bad, naive distributed architecture (my day job), it could have been done much better. On the other hand, the more important point is that it demonstrates an alternative to centralized. We'll learn a lot about usage patterns here, get new ideas, and either improve Lemmy or build something better from the ground up. Big thanks to Reddit for driving users this way to test scalability and get much better knowledge of usage.
Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
-
No low-effort posts. This is subjective and will largely be determined by the community member reports.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
What makes a distributed system good that Lemmy hasn't done? Seems like a pretty robust system to me, seems like scaling issues are on the instance host themself. With Reddit's experience, I don't see how there are issues
If there was an easy solution that balanced decent UX and performance, we'd have it by now!
It's not distributed architecture as you normally think it - it's a decentralised federation. It's an important distinction from your typical distributed architecture app.
it could have been done much better.
Care to expand on this point?
Disclaimer: I've only looked a bit at the protocols and high levels descriptions of how it works, and this is just my understanding of it. But it seems to track.
let's take .. Selfhosted@lemmy.world for example. Right now lemmy.world is the Source of Truth on this, which means if you sign up for it on a different host, let's say myawersomeinstance.com, that first contacts lemmy.world, copies over posts, and then subscribes on new posts for that. Actually not 100% sure if lemmy.world contacts myawersomeinstance.com when there's a new post, or myawersomeinstance.com polls lemmy.world.. But anyway, point is, lemmy.world is authority on it. myawersomeinstance.com also have Selfhosted@lemmy.world data, but it's a copy of it. And lemmy.world is only authority. So if you post something, your server then sends it to lemmy.world and waits a reply. Then lemmy.world contacts all instances that has at least one user following this to tell about the new post. And that new post now exists on a few hundred databases.
The problem is the scaling is whack. Okay, you can have 5000 federated servers with users subscribing to Selfhosted@lemmy.world, but that means lemmy.world needs to update 5000 servers per post, and there'll be 5000x storage used for that post, and ALL 5000 servers contacts lemmy.world to get the new good stuff.
Frankly, it's a scaling nightmare. As for a different approach, you could have private / public keys and sign updates from lemmy.world and allow the other instances to fetch the new data from each other. That would also allow more relaxed caching, since it would be generally lower cost to re-fetch the data. Now you need aggressive caching because you don't want lemmy.world to keel over and die form every server on the planet wanting to hear the latest and greatest posts all the time.
A network of (“thousands of”) servers has — like most things — pros and cons.
Some of the pros are:
- The network is more resiliant against outages. If lemmy.ml is down, all other users can still access the network.
- It's hard to take legal action against the network or to buy it out (like Big Players™ like to do to get rid of potential competitors).
- It allows various similar or even conflicting moderation policies. The network, i.e. the infrastructure doesn't allow or prohibit any specific opinion (the communities do).
- It allows for different ways to pay the bills: goodwill of the admin, donaitions, ads, fee or selfhosting. The latter also allows great control over the data so you control your privacy.
Some of the cons are:
- Content is replicated across servers, which increases the total amount of data stored.
- Latency and speed suffer.
- Interoperability with the wider Fediverse is less than 100%, which can create confusion and frustration.
- Discovery is more difficult.
Yeah, and this post about how to use some (a lot of) servers that are doing nothing to participate in "pros" while the top 20 of servers are suffering from these cons.
I just commented on this in another thread: https://lemmy.world/comment/76011
TL;DR: The server-to-client interactions on Lemmy are a lot heavier than the server-to-server interactions, so even if you're just using your own server to interact with communities on other servers, it should still take load off of the servers you would have been using directly.
That's news to me. I thought serverto-server interactions would be heavier since other instances will keep fetching contents from your instance once they start federating. I guess it's better to join less populated instances instead of crowding on a single instance.
This has definitely been a problem with communities being created on the bigger instances and not utilising smaller instances. Happy for someone to say I'm wrong etc, but I think there would be merit in capping instances to x number of users or communities, to force the user base to spread out.
Also, the way signups work, (ie you find a community you like then click sign up but that signs you up to that instance), further exacerbates the issue and the confusion around how federation works. The sign up links on each instance should lead either to a page with an instance finder, or to a random instance that matches the profile of, and is already federated with, the instance you were on. Otherwise the larger instances have a monopoly and are just going to lead to a bad user experience when they can't cope with the traffic.
It's a self defeating prophecy if users only want to sign up to the instances with the big communities, because then everyone is going to keep creating communities there and nobody is going to want to join a smaller instance.
I might be talking nonsense and am happy to be told why that is all wrong :)
Yes, there should be instance caps, and they should be visible to users.
That way users can scale, choose, without much thinking.
This same techinque works everywhere, for example MMO games. You have availability visible and choose servers according to it.
This would fix scaling partially without much technical changes.
If that cap idea was to exist, it would make sense to have it based on the balance of users across the federated servers, so of there's enough with a similar amount it raises the cap
What's the alternative? You go full-banana decentralised or mega-site Reddit. I think Lemmy is a nice middle ground
Since Lemmy instance are not backed by commercial interest, but rather by nice volunteers and donors that have money and time to spare, they will be heavily affected by economic downturns (we still can see commercial interests still affect users negatively tho with reddit). Here are my thoughts on the matter:
- as far as I understand the owner of the domain: https://lemmy.world even has to pay for this fancy domain name in the DNS system ... every month subscription service style
- (and tbh I hate the Domain name system) why should I fund it with my own money?
- if you hosted with an onion site over tor that expenditure would not exist, but how would users discover your site then? Let me know if you know something about this
- in times of deflation (meaning money becomes worth more, spending some money on a self hosted lemmy instance becomes nonsensical)
- tbh if I hosted a lemmy instance and the users of my instance posted high quality content in quantity I would use it to train my own LLM, that would at least create some economic incentive for me to host such a page ... but managing spam and bots will be HARD
That is why you should always back up your comments on your personal device, would be nice if lemmy had an automated way of doing this (I should look into this more)
Domain names are cheap, like $25/year.
I've suggested a routing protocol to the lemmy devs - to use federated instances to route all the messages to other federated instances. The idea was received with some interest, but it seems that people believe that there's still a ton of performance that can be squeezed out from the current architecture through optimisations.
I'm quite worried of how well this federation system will work in the long run, especially when more people coming from Rexxit. As people make more post/comments, every federated instance will have to cache more redundant contents from each other, which also will use more storage thus increasing the fee of every instance hoster. There's also another problem of visibility in search engines. Because Lemmy/Kbin can be hosted by anyone, it makes searching on a specific domain impossible, unlike how I can just add "reddit" in the search query. Also since there are multiple Lemmy/Kbin instances, there's a chance there'll be similar communities spread over, fragmenting the communities even further. Until they can find a way to fix those problem, I don't think federation is suited for large scale communities.
As for fragmentation problem, maybe adding a global search for communities like this will help reducing fragmentation. Users can still make their own community in their instance, while other people who don't need to can easily find the community they want.
I've created my own instance in order to not create more load on others and it took a minute to realise I needed to populate it myself, would be nice to have a default view aggregating popular posts etc. across instances. But maybe I'm just asking for too much hehe
I did the exact same thing. Ended up looking up the more popular communities on the bigger instances and searched for them on mine to index them.
I wish there was an easier way, but for now there isn't.
That's an interesting idea. Maybe you could even choose the "default subs" for your instance from across lemmy.
That would be awesome I think! I am toying with the idea of building a proper instance, put my devops skills to use, but at the same time, few features missing!
Matrix suffers extremely of this issue. It feels like 95% or more are on matrix.org instance. And all major chat rooms are hosted there too.
I think something like a weekly cap for new registrations as an option would be good. With a hint to other instances.
It's kinda the same issue that some games have, like MMOs. People tend to make new accounts on the biggest and overloaded servers because there is the most activity even though stability could be an issue, or login queues.
But that doesn't make sense on matrix or Lemmy. Because you can still access all content no matter where you are.
I have my own Lemmy instance running on my home server, but I'm here. "But Bizzle," you may be asking yourself, "why go through all the trouble of configuring your own instance just to wind up on Lemmy.World anyway?"
I'm glad you asked! And the answer is that federation only fetches parent comments. I'm glad Lemmy exists, and I'm going to keep using it, but we need federated sibling comments for this to actually be good, in my opinion.
I'm not sure I understand what you're saying. Did you mean that child comments are not federated?
That sounds like a huge oversight, if so.
Privacy wise for me it is more convenient to run my own instance and have my own private communities.