This post is a sort of partial dump of my efforts towards an idea/proposal for improving discoverability and onboarding for the Fediverse while avoiding new users just being dumped on a centralised instance. I've seen people suggest that one of our secondary defenses from megacorp social media (like Meta) is improving our UI, so this is part of my attempt to do that.

We can use our non-monetizability to construct algorithms specifically for the purposes of people finding the content and groups they want, rather than for the purposes of selling them shit.

I actually started working on this during the Reddit Migration, but got sidetracked with other things ^.^, so I'm dumping it here for everyone else to make more progress!

I want to discuss a rough proposal/idea that eases the onboarding of new users to the fediverse, and discovery of groups, while hopefully distributing them across more instances for better load balancing and decentralization. More generally, it should enable easier discovery of groups and instances aligned with your own sentiments and interests, with a transparent algorithm focused on user control and directly connecting people with entities that align with what they want to see.

I may interleave some ActivityPub terms in here because I've been working on a much larger proposition for architectural shifts (capable of incremental change from current) that might allow multi-instance actors and sharding of large communities' storage - I want the fediverse to be capable of arbitrary horizontal scaling. Though of course that will depend heavily on my attention span and time and energy. I might also just dump my incomplete progress because honestly my attention is on other projects related to distributed semiconductor manufacturing atm ^.^

What this post addresses is the current issue of onboarding new users ^.^, and helping users discover communities/instances/other users. These users typically are pointed to one of about 5 or 6 major instances, which causes those instances to have to eat costs, especially since loads of users in one place means loads of communities - and the associated storage needs - in one place (as users create communities on their instances).

My proposition/idea consists of the following:

A mechanism by which instances can declare their relevant purposes in a hierarchical, "refinement" manner
A mechanism by which instances can declare what sort of instance they are - lemmy, mastodon, kbin, etc.
A mechanism to specify those purposes such that different terms can be merged in a given instance - for example, multi-language terms for the same item
A relatively simple algorithm that lets instances select hopefully other reliable instances that are relevant to someone and automatically link over to them on signup.
A proposition for a hopefully intuitive UI with sensible defaults ^.^
(maybe in another post) an idea for simplified Fedi signin.

Self-Tagging Structure

The first part of the proposal is specifying a way for instances to tag their general topics and category at varying levels of specificity.

Tagging the "Type" of Social Media an Instance is Running

Each instance should have a descriptor of what software it is running.

This serves as a proxy for what "type" of social media it is (reddit-like, twitter-like, whatever kbin is, etc.), taking into account that users are likely to have visited an instance based on reports that the type of software it runs is what they want.

I propose some string endpoint like instance_software in the top-level instance actor.

Tagging the Focus of Instances

Generally speaking, instances fall into several categories:

General purpose instances
Instances which lean towards some topics but are general purpose.
Instances that are very focused towards some topics to the exclusion of others.

There are also instances with varying levels of moderation, which may be encompassed in this. ^.^

To solve this problem, instances should provide an endpoint (for now, let's call it instance_focus) in their representative actor that produces a collection of so-called subject trees with associated weights.

Subject Trees/Sentiment Trees

Each subject tree is a nested list that looks like the following:

{ 
  "weight": 1,
  "polarisability": -0.7,

  "subject-tree": { 
    { 
      "subject": "programming", 
      "terms": {
          {"en", "programming"}, 
          {"en", "coding"}, 
          {"en": "software-development"} 
       }
    },
    {
       "subject": "language",
       "terms": {
           {"en", "language"}
        }
    },
    {
       "subject": "rust",
        "terms": {
            {"*", "rust"},
            {"*", "rustlang"}
         }
     }
  }
}

This indicates an instance/other-group that is interested in programming, specifically programming languages or a programming language, and specifically the programming language rust. It also indicates an estimated polarisability by this instance for /programming/language/rust/ of "-0.7" i.e. they estimate that people who feel a certain way towards one subtopic of /p/l/rust/ will also likely feel a similar way to other subtopics of /p/l/rust/ unless explicitly specified. There may be other fields which indicate some of the more complex and specific parameters documented in [the proto-algorithm I wrote][algorithm-snippet], such as specific polarizability with sibling subjects (e.g. if rust had antagonistic sentiments toward cpp, it may have a "sibling-polarizability": { "cpp": 0.5 } field, or something similar).

A useful compact syntax to indicate the tree (for, for example, config files), might look something like the following: /programming{en:programming,en:coding,en:software-development}/language{en:language}/rust{*:rust,*:rustlang}/

This encodes the terms that it knows for these concepts, within the context of the subject above it, along with the language that term is in (star indicating many human languages where the same term is used, e.g. with proper names).

For this system to work, there must be a roughly-agreed upon set of names to use as keys.

The "subject-tree" for "general interest" is just an empty list {} ^.^

PART 2

top 21 comments

sorted by: hot top controversial new old

[–] sapient_cogbag@infosec.pub 12 points 2 years ago* (last edited 2 years ago) (1 children)

Common Interest Algorithm

The weighting system indicates how much interest (or avoidance) an instance has for a topic as specified by the subject tree. The value of weight for each subject tree should be a value from -1 -> 1 (inclusive), and applies to the deep-most component of the tree. We'll call this the sentiment of the instance towards that specific level of the tree.

The common interest algorithm specifies a rough way to estimate how "aligned" in sentiment a given pair of entities are using an incomplete collection of nested topic paths ^.^ and then using heuristics to fill in the "gaps" needed for direct comparison. It takes the partially specified trees - along with estimated polarisabilities - from federated instances, combines them together, then uses that to "complete" the sentiment weights specified by users and instances so they can be directly compared to determine the common interests of each to contribute to directing users to instances correct for them.

The default option should be that users are assumed to want "general sentiment/general topic/root topic" instances (i.e. with path /), and then they can specify much more refined interests using various methods, like taking search terms and using the collected known topics for them in various languages to construct a user-friendly search function based off the common interest algorithm heuristic, or allowing direct specification of interests, for more advanced users ^.^.

The full (but slightly incomplete) details of my approximate proposed Common Interest Algorithm are in this gitlab snippet, written in poorly-organised Rust code.

Tagging the Willingness for New Users

Different instances have a different level of desire (and gatekeeping) for new users.

Some don't allow any new users at all. Others require filling out a form and waiting for approval. Many require an email or captcha, and some don't require anything whatsoever.

Some don't want any new users, some do accept new users but only can handle a small number, and others are free-for-all open registration.

Many users will want the ability to create communities without needing to seek approval. For defaults on the "maximum" level of "inconvenience" an instance presenting other instances should show to the user, it makes sense for an instance to use it's own level of "inconvenience".

nodeinfo2 (also see here for all keys) already exists to provide some basic information, but it's not enough for this feature ;p

As such, I suggest we instead construct a property on the main server actor, for now called instance_onboarding_meta. This is an object of the form:

{
    "accepting_new_users": bool, // if this is false, no other references need be present
    "capacity_used": float (>= 0), // Must be present, represents one-minus the remaining amount of users it can take as a fraction of total estimated capacity. Alternatively, represents an approximate fraction of resource usage. If it's >1, this implies the server is over-capacity.
     "preferred_max_users": integer (>= 0), // If present, represents the approximate maximum number of users this instance wants to host. If unset, assume unlimited but perform estimates based on the fraction. 
    "signup_requirements": {
          "captcha",
          "email",
          "approval",
     }, // Must be present, a list of the signup requirements. May need more options as new authentication and validation mechanisms are added to the various Fedi servers ^.^
     "signup_uri": "https://example.com/signup/finalized" // "final" signup page, rather than one providing alternate instance suggestions. Should take e.g. a `?username=<new username>` parameter.
}

Instance Signup Redirection Algorithm

Now that a system has been proposed for giving instances to describe how much effort it takes to sign up, how much they can really take new users, and what kind of community they're interested in, we can use this data to construct a method to split signup across the fediverse.

We'll describe things in terms of what happens either as the list of instance values is changed while they are polled, or finally what happens when a user actually looks for an instance ^.^. Though, a lot of the ideas are also mentioned in the Common Interest Algorithm Snippet, which also at least partially discusses some other things.

Step 1 - Candidate Instance Collation

The first step is to collate information about potential candidate instances, by making requests to the endpoints described above to instances the current instance is federated with - including itself! (it might be useful to combine all the metadata into one endpoint as well, but that's all bikeshedding):

instance_software - the software of each instance
instance_focus - the list of weighted subject-trees that indicate what the community is oriented around - see the algorithm snippet for efficiently merging in information from instances without having to recalculate the full weights every time, via use of BTrees/BTreeMap.
instance_onboarding_meta - Information about how the instance accepts new users, and it's resources to do so.

Instances shouldn't poll this very frequently - certainly not on every attempted user signup! - and instead should cache it and poll periodically (say, every hour or so ^.^). This avoids slamming large portions of the network.

Step 2 - Software Filtering

The next step is filtering out candidate instances running different fediverse software than ourselves.

Step 3 - User Acceptance Filtering & Weighting

Our instance should then filter out instances that aren't accepting users, and perform the following steps to assign weights to instances (may be configurable if the user is ok with accepting more effort than our instance requires - as most users are likely to use the default settings it should be cached too):

For each instance, if it requires more things to sign up (email when we don't need it, etc.), then remove it from the list.

For captcha, mark that instance with a "0.5" weight multiplier rather than eliminating it, if we don't also require captcha.

From a user-configurability perspective, each possible requirement to signing up can either:
- Eliminate from the list (a user doesn't want to deal with forms) - this is the default for things required by another instance that aren't required by ours, except captcha
- Reduce it's chance of selection (as in captcha) - this is the default for instances if the respective instance has captcha but the current instance doesn't.
- Have no effect - this is the default if we also have a requirement.
For each instance, if it has a preferred max user count, then calculate the current approximate user count by multiplying it by the resource usage capacity.

Then, calculate the approximate available user slots by subtracting the approximate user count from the preferred maximum. Note that this value may be negative in the case of an overloaded server.
Find the instance with the largest preferred max user count (if none exists, then use the current server's user count instead, though remember that if your server does have such a preferred max count, it should be in the list). If any server has an estimated total user slots consumed greater than the maximum preferred user count, use this instead.

Then, assume that the preferred maximum for servers with no specified maximum is approximately 2x that value. Calculate the approximate available user slots of instances without an existing preferred maximum, using this estimate in combination with the resource consumption fractions.
For any instance with available user slots <0 - that is, overloaded servers - divide those (negative) available user slots by some value such as 4.

If any instance has a negative number of available user slots, add the most-negative number back on to every instance's count of available user slots, so that the smallest value is zero.

The division by 4 (or some other number) means that all overloaded servers are avoided more than they would be if we just added the most-negative value back directly.
Assign weights to each instance depending on their proportion of available user slots compared to the total. If the instance has already been tagged by a weight (from e.g. having captcha), then multiply by that weight.

PART 3

[–] sapient_cogbag@infosec.pub 10 points 2 years ago* (last edited 2 years ago) (1 children)

Step 4 - Term Merging

Each instance has provided subject trees of what it's community is meant to be like. Moreover, it has provided the terms it believes to refer to various concepts within their subject tree.

This step is where all those terms get merged together to then be used later via some kind of search algorithm, for the more sophisticated cases.

The steps are as follows.

Collect all the subject trees from each instance into some way of iterating over them.
Construct a BTree-based map of topic paths plus associated term information, merging in new values for every level from every federated server ^.^. Much more sophisticated versions of doing this efficiently are documented in the Common Interest Algorithm snippet, even if not for the terms, so just look at that :)

Step 5 - Common Interest Weighting

Apply Common Interest Weighting via the Common Interest Algorithm between the user and each possible instance.

There may be a way to use Heaps or some hierarchical datastructure to sort the instances to do this more efficiently, but as long as the implementation of the Common Interest Algorithm uses BTrees and pre-calculates lexicographically ordered maps of data it can be ensured that the cost of this kind of commonality assessment only grows with the size of the tree specified by the user and the single instance to be compared, rather than all instances (for an individual instance/user comparison ^.^).

There may also be ways to compare the user against all instances at once more efficiently that I don't know of. But the point is, we can use the Common Interest Algorithm to assign weights for each instance/group/etc. relative to each user.

We could also use some way to convert a user search query into their Common Interest Algorithm tree weights, using the list of known terms. This is for slightly more advanced terms or people perhaps searching for communities or other groups too.

Step 6 - Elimination of Anti-Aligned Instances

Any instances/groups/communities/etc. with alignment <0 should be immediately eliminated from the list of suggested instances/groups/communities/etc. to the user.

Step 7 - Combining Sentiment Alignment Weights & Other Ranking, plus Final Selection

We already have some ranking information based on how willing and able an instance is for new users, plus we have information on how aligned each instance is with this hypothetical new user - now all a fraction from 0 to 1, as we cut out instances that have a negative alignment with the user ^.^. Then I suggest we find some simple way to join those two values together. For now, I suggest simply multiplying the alignment fraction with the weights for each instance, and then use probabalistic selection to direct the user to an instance that aligns with what they want ^.^

It may also be desirable for instances to prioritise somewhat older instances with better uptime, or more trustability (e.g. using some kind of heuristic to detect bot instances or similar), and modify the weightings based on that, or eliminate some instances ^.^

For non-instance searching or discovery, we can use the alignment ranking directly as a form of search ranking :)

Step 8 - Redirection

Redirect the user to the "final" signup page as listed in the instance metadata, along with the parameter for their desired username. Perhaps it would be worth using webfinger to make sure the username isn't taken on any selected instance, and automatically selecting different instances from the list until you find one without the username taken already, with a warning.

If we're talking about discoverability of communities or similar, you just put those in order of their direct sentiment alignment rank ^.^

[–] Igotz80HDnImWinning@kbin.social 4 points 2 years ago (1 children)

I would LOVE to see a user tuneability control for continued content discovery along these same weighted relationships. Kind of like a Discover Weekly meter that you could adjust/threshold to see suggested content from instances that are more vs less similar to ones we follow. You may have said that in here but either way this seems really useful for instance steering/selection and distribution.

[–] CaptBobbers@mas.to 1 points 2 years ago

@Igotz80HDnImWinning @sapient_cogbag
Seems super useful.

[–] Xylight@lemmy.xylight.dev 10 points 2 years ago

I do not have the patience to read all of this but I upvotes because it looks cool

[–] Sterile_Technique@lemmy.world 7 points 2 years ago (2 children)

New user here. I don't understand code, but I like the sound of everything else.

In my "whatthefuck is the Fediverse..." stage of onboarding, I had a real hard time actually deciding where to start. Most of the advice I found was "It doesn't matter - just pick any instance!"

...kbin looked neat, so I started there; but I downloaded Jerboa and couldn't log in with a kbin account (which went against the whole 'you can use one chunk of the fed to engage with others!" spiel).

Okay, so I need a Lemmy account, of which there are still quite a few instances, but now I'm suspicious that my selection will actually work, so I just go with the popular one.

So, feedback from a newbie:

It would have been REALLY helpful to have a flow-chart (or questions I can click through) that started me out on deciding which platform best matched what I was after, and then work its way through the subcategories: do you like a social feed ala twitter? → Mastodon!; do you like topic-specific forums ala Reddit? → Lemmy/Kbin!; are you a waste of fucking oxygen? → exploding.heads! lol you get the point. Something to guide me through the TON of options that the fediverse represents would have been great.

[–] Kichae@kbin.social 5 points 2 years ago (2 children)

couldn’t log in with a kbin account (which went against the whole 'you can use one chunk of the fed to engage with others!" spiel).

In fairness, it goes against the speil in the same way that "I tried to download Internet Explorer on Linux" goes against the "you can use any computer to access the internet" speil.

You can access all of the same content from websites running Lemmy or kbin, but they're still totally different pieces of kit.

[–] Sterile_Technique@lemmy.world 7 points 2 years ago (1 children)

For sure - the point is I was clueless. My first steps navigating the fediverse boiled down to trial and error, even after reading a few posts attempting to unmuddy the waters.

Completing those little milestones was kind of gratifying tbh, but most potential users aren't looking for a challenge, and will be turned off by a counterintuitive or nonexistent onboarding process.

[–] TrueStoryBob@lemmy.world 3 points 2 years ago

My experience looking to join Mastodon instances (after the FB whistle blower a few years ago) was very similar. I ended up in a large "more general" instance, ignoring the local feed in favor of my home feed. I slowly built up my follows and now I feel right at home, but that was a process... as opposed to, say, when I signed up for Twitter ten-ish years ago where an algorithm held my hand for days or weeks until it figured out what it could sell me. I think OP is really onto something, so they've got my attention and my up vote.

[–] skullone@lemmy.world 4 points 2 years ago

I think the point was that it wasn't very intuitive for a new user.

[–] adonis@kbin.social 1 points 2 years ago

Same here!

I even went so far to think, I could log in to Jerboa with my existing Mastodon account.

Asking on the Internet, I got a reply like "this should work, maybe Jerboa has a bug".

Doing a bit more research, I found the rabbit hole is deeper than I thought.

[–] larlyssa@lemmy.world 6 points 2 years ago* (last edited 2 years ago) (1 children)

I’d love for you to post this as a proposal on the Lemmy GitHub.

[–] sapient_cogbag@infosec.pub 4 points 2 years ago (1 children)

This is actually kind of a general activitypub thing. I might do that but that feels like I'd have to make it was more refined and go through some formal process, and I hate doing that kind of thing and find it quite difficult, especially since my attention is elsewhere now.

I kinda just want to put it out here so people with more attention and time and knowledge can push it forward or e.g. boost it onto mastodon. Though if it really goes nowhere I might do something? Idk ;p

[–] sapient_cogbag@infosec.pub 0 points 2 years ago (1 children)

In particular, I've figured out a way to specify sentiment/interests efficiently and combine it reliably over federation, and the data structures required to do that.

I've also provided some ideas for sensible defaults (automatic selection of instances, and accounting for instance load), with incremental enhancements to specificity for more advanced users ^.^, as well as a general search mechanism that can be derived from this - though for efficiency, it might be worth trying to develop some sort of probabalistic reverse index to avoid a linear scan, if we're talking about discovering entities like users or groups where there may be very large numbers.

I hope that if people are interested they will boost the post onto Mastodon, which afaik is where the devs and ActivityPub standards people are, and try and get the ball rolling, because my focus is elsewhere right now, and the social aspects of developing things like this are much more difficult for me than the algorithmic and architectural parts ;3

[–] sapient_cogbag@infosec.pub 0 points 2 years ago

Also it might be worth attaching this as a secondary standard, "DiscoveryPub" or something, to avoid scope creep.

[–] Emotional_Series7814@kbin.cafe 5 points 2 years ago

I like this idea, would probably do well if proposed on the kbin codeberg as well.

I really hope we don’t force users who sign up to pick one of a few preselected communities to subscribe to. No Skip button, no option to search for other communities, you must select some communities from this small list in order to move on. I’ve seen the same pattern in habit tracker apps with preselected habits instead of communities, and likely in other contexts that I’m forgetting right now. Walk the user through a tutorial to get them up and running as soon as possible, but no option to skip it or customize anything if you’re tech-savvy and don’t like the default options, and don’t need to be handheld through. It was always incredibly annoying.

[–] misericordiae@kbin.social 5 points 2 years ago

This looks great!

[...] There are also instances with varying levels of moderation, which may be encompassed in this.

I know you didn't really touch on what the moderation aspect might involve (and maybe you have provisions for this in a section I skimmed), but ideally I would want defederation lists to be included somehow, at least as an advanced option. Does the user want/not want an instance federated with [other instance]? And then, before final instance selection, present the full list just for verification.

[–] kglitch@kglitch.social 3 points 2 years ago

That sounds cool.

In the meantime:
https://fediverse.observer/
https://fedidb.org/software

[–] MilkToastGhost@lemmy.world 3 points 2 years ago

I was huge into stumble upon and even though that didn't pan out the best thing it did was start off with the easiest most basic likable subs to click to follow to craft interests C/memes C/pics C/shitpost C/food

And then you use that to expand on to relatable subs and on and on until you've gotten as specific as possible .

Then users have 200 subs they've already subscribed to and have crafted a full experience for themselves. On reddit I followed like 50 subs for the first 2 years. I often scrolled through posts ad far as a week back and got bored. The moment I expanded and started actively looking at new subs regularly I was rarely getting past the day mark on a full scroll.

We also have to get rid of the porn posts on the regular scroll. That's something that if I want to see I don't want it to be the default and not on my main account. Let me get my scuzball on outside of my work scroll and when I'm not sitting at a table with friends wondering why there's strap on porn on my phone.

[–] Spzi@lemm.ee 1 points 2 years ago (1 children)

I very much approve the effort, which continues in comments explaining more of the idea. I also see great value in "the onboarding of new users to the fediverse, and discovery of groups, while hopefully distributing them across more instances for better load balancing and decentralization."

It's interesting, because I came to the rather opposite conclusion. Instead of refining the process and giving users more options, I voted for dumbing down the process and ~~giving~~ showing users less options. The (a bit provocant) proposal is: Don't let people choose their server, but choose one at random.

We have two types (and the spectrum inbetween) of people who try to join lemmy: Those who understand everything and can make an informed decision, and those who understand nothing. Group 1 already finds ways to achieve their goals. This could be eased, but it exists.

People from group 2 have a less comfortable position. They are faced with lots of technical explanations of alien concepts before making an account, before they can relate what this is all about, what impacts their decisions will have.

I feel the default way to join the fediverse should be to let the fediverse make that choice for you. The fediverse can make a random choice, or use a system with attributes and weights which you proposed.

In different aspects, a random choice can be better or worse than letting users decide, who have incomplete and sometimes inaccurate information, and do not want to spend energy on the topic.

An advanced signup method with user-defined choices should still be available. The default signup way should be big and easily visible, the advanced less prominent but still available.

I wrote a more lengthy comment on why I think signup should be as quick and easy as possible here.

[–] sapient_cogbag@infosec.pub 1 points 2 years ago

My idea is meant to allow for a spectrum from simply "pick an instance for me" using the weightings for an assumed "the user is interested primarily in general discussion", to "search for an instance for me related to xyz topics as a search query", to fine-tuned discovery ^.^

The weighting is always necessary to use because it allows instances to have more control over who they accept and avoid overloading smaller instances. But you can make the default UI very simple.

load more comments