128
submitted 2 years ago* (last edited 2 years ago) by Averrin@lemmy.world to c/selfhosted@lemmy.world

Correct me if I'm wrong. I read ActivityPub standards and dug a little into lemmy sources to understand how federation works. And I'm a bit disappointed. Every server just has a cache and the ability to fetch something from another known server. So if you start your own instance, there is no profit for the whole network until you have a significant piece of auditory (e.g. private instances or servers with no users). Are there any "balancers" to utilize these empty instances? Should we promote (or create in the first place) a way how to passively help lemmy with such fast growth?

you are viewing a single comment's thread
view the rest of the comments
[-] theterrasque@infosec.pub 6 points 2 years ago

Disclaimer: I've only looked a bit at the protocols and high levels descriptions of how it works, and this is just my understanding of it. But it seems to track.

let's take .. Selfhosted@lemmy.world for example. Right now lemmy.world is the Source of Truth on this, which means if you sign up for it on a different host, let's say myawersomeinstance.com, that first contacts lemmy.world, copies over posts, and then subscribes on new posts for that. Actually not 100% sure if lemmy.world contacts myawersomeinstance.com when there's a new post, or myawersomeinstance.com polls lemmy.world.. But anyway, point is, lemmy.world is authority on it. myawersomeinstance.com also have Selfhosted@lemmy.world data, but it's a copy of it. And lemmy.world is only authority. So if you post something, your server then sends it to lemmy.world and waits a reply. Then lemmy.world contacts all instances that has at least one user following this to tell about the new post. And that new post now exists on a few hundred databases.

The problem is the scaling is whack. Okay, you can have 5000 federated servers with users subscribing to Selfhosted@lemmy.world, but that means lemmy.world needs to update 5000 servers per post, and there'll be 5000x storage used for that post, and ALL 5000 servers contacts lemmy.world to get the new good stuff.

Frankly, it's a scaling nightmare. As for a different approach, you could have private / public keys and sign updates from lemmy.world and allow the other instances to fetch the new data from each other. That would also allow more relaxed caching, since it would be generally lower cost to re-fetch the data. Now you need aggressive caching because you don't want lemmy.world to keel over and die form every server on the planet wanting to hear the latest and greatest posts all the time.

[-] ultraHQ@beehaw.org 3 points 2 years ago* (last edited 2 years ago)

Thanks for the in depth write up! I haven't looked too far into the docs or the subscription model, but is this a fault on Lemmy's end, or is this a function of how activity pub handles federated communication? (I'm very new to activity pub/federation, just now reading through the activity pub docs)

I do like your idea of distributed replication via keys,much better than what I had brainstormed

Edit: yeah it does look like it's a function of activity pub, wonder if theres a more scalable federation protocol out there

[-] Fizz@lemmy.nz 2 points 2 years ago

Could lemmy.world put a load balancer in front and use that to direct requests to different instances of lemmy.world? Not sure if that question is dumb I'm not a technical guy.

[-] theterrasque@infosec.pub 3 points 2 years ago

It's not dumb at all, and it's a common scaling technique. But the software needs to support it, and I have no idea if lemmy has support for running multiple instances for one server.

this post was submitted on 12 Jun 2023
128 points (89.5% liked)

Selfhosted

40565 readers
172 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS