RoundSparrow

joined 2 years ago
MODERATOR OF
[–] RoundSparrow@lemmy.ml 1 points 2 years ago

It could also be a filtered view based on the subscribed/all feed which provides a single API call that can return material from multiple communities.

"that can return material from multiple communities" - that's exactly how Reddit does multi-reddit, what feature do you think multi-reddit is?

[–] RoundSparrow@lemmy.ml 2 points 2 years ago

But it should definitely be off by default and have a clear warning when you try to enable it.

I was afraid people would say that. The easier way is to just not touch it at all, as adding new code to opt in/opt out is more Rust code programming that is in rare supply with developers.

The easiest solution is to avoid it and not introduce sharing of personal communities at all. Which was what I was afraid this discussion would yield. So we start fresh with empty MultiPass lists and build them up from scratch.]

[–] RoundSparrow@lemmy.ml 2 points 2 years ago

the amount of low-effort drive by comments and off-topic posts communities gets just because they are similarly named is bad enough as it is.

which is why I actually want it.

I think a well-cultivated list of quality communities that people share is a means to escape the heavy amount of noise that grew out of the explosion in the number of low-effort barely-any-moderation instances.

Another way to look at this feature is really simple: multiple subscribe lists, the ability to organize what you subscribe to into your cultivated groups. I don't see why anyone thinks a limitation of having only one community list per login is beneficial in organizing the duplicate choices all over the place.

[–] RoundSparrow@lemmy.ml 2 points 2 years ago* (last edited 2 years ago) (1 children)

why does a multi-reddit need multiple instances to collaborate to create the feed?

by "create the feed", I assume you mean "provide posts" when API call post/list is called?

content is replicated in all federated instances. You only need to use the local copy and merge all the communities of the multi-reddit.

Yes, that is what MultiPass would do, query the local PostgreSQL database. Right now Lemmy only allows this for a single Subscribe/Follow list per user... you have to create 3 different logins if you want 3 different lists of communities. For example, a "games" list, "music" list, "news" list.... Plus, the current design does not accommodate logged-out users, they have no way to list multiple communities (other than "All", local or merged remote+local).

[–] RoundSparrow@lemmy.ml 2 points 2 years ago* (last edited 2 years ago) (6 children)

Multi-reddits as they exist on Reddit itself could be implemented entirely client-side, the server side stuff just syncs the behavior of multiple client apps.

Can you explain how? As the only way I can see this is if you did 50 different API requests for all 50 subreddits, merged the results, and then sorted them again by the desired order.

[–] RoundSparrow@lemmy.ml 1 points 2 years ago (3 children)

Why does the concept of a multi-reddit need to extend outside of the user’s instance?

it doesn't need to. But why would you not want it when communities are multi-instance?

perhaps I made a mistake introducing the privacy concern first. As now the whole topic seems to negate the very reason so many people have requested MultiReddit on Lemmy. The privacy issue isn't even essential, I just wanted to have a discussion about it as a general topic. I'm already building the code so that it can be done entirely without anyone sharing their personal subscribed list.

[–] RoundSparrow@lemmy.ml 6 points 2 years ago* (last edited 2 years ago) (1 children)

report a bug to lemmy about the broken quoting.

I have, weeks ago.

Consider posting to the official rust playground and creating a shared link.

I did share a link to GitHub, is that not good enough or something? Here is a screen shot for you.

[–] RoundSparrow@lemmy.ml 35 points 2 years ago* (last edited 2 years ago) (3 children)

Most common cause is people changing their language settings in their profile. It's a daily occurrence. The app really needs to tell people "25 messages not displayed because you are only viewing in Spanish".

[–] RoundSparrow@lemmy.ml 6 points 2 years ago

Recently I’ve noticed my feed has become almost entirely the main meme instance. The algorithm gives me 4 meme posts then a technology post then load more memes

Yes, same issue, and I'm using lemmy-ui...

Lemmy's backend Top/Active/Hot are pretty primitive. I'm experimenting with some ways to weigh smaller less-popular communities... because +20 vote on meme topics is noise, but +20 on some focused community can be a big deal. hot_rank doesn't take that into account and just looks at published date and score. It's pretty tricky to get new things into the backend, so it may be a while.

[–] RoundSparrow@lemmy.ml 2 points 2 years ago* (last edited 2 years ago) (1 children)

I personally would need to dig into testing and code again to give answers with confidence. What I'm trying to say more than anything is... don't assume. The level of mistakes in Lemmy's more technical back-end code are pretty high from my experience, especially when it comes to multiple servers involved (comment deletes not being sent to all servers was a situation I tracked down). What I do know is that there is very little written out there about people actually tearing it apart and showing what works... a lot of stuff gets logged in server logs as errors that almost nobody can explain. Either it's mistakes in apub JSON or other non-lemmy servers, or older versions of Lemmy, etc.

When you say packages get forwarded to whatever instance wanted (if I understand correctly) you don’t “unpack” (e.g check if it’s a valid request)

the pack metaphor isn't that great. But it is signed, and the receiving server checks a signature. But I really have not seen anyone discuss how those signatures are exchanged in the first place, and I've seen people say they re-installed their entire instance - which I assume generates a new set of signing keys for the same domain name.... and I know Lemmy starts with 1 in index for post, comment, person - and would end up generating the same numbers for different content.

I haven't seen much eye towards auditing any of this works, and if it even is a good design. Even 2 months ago there were some aggressive timeouts that were causing delivery to fail. And when something fails, the person who comments or posts doesn't get notified....

There is some deep stuff in lemmy., every community has a private key/public key pair, as does each person - but I'm not even sure that is used at all and was an ambition. I rarely see the topic actually come up and I've been listening for this kind of deeper technical topics... and created !lemmyfederation@lemmy.ml to try and better organize it.

Thanks again, and sorry for the ramblings.

I'm pretty much rambling myself... my repeat point is: don't assume. I would not describe Lemmy as battle-hardened against attacks or spoofing that someone can find to bypass the current logic.

[–] RoundSparrow@lemmy.ml 2 points 2 years ago

I wouldn't trust that assumption, goarmy.com

[–] RoundSparrow@lemmy.ml 1 points 2 years ago* (last edited 2 years ago) (1 children)

It’s still early days

Lemmy has been on GitHub since February 2019, over four years. It isn't new at all. Several instances go way back.

The answer is: ORM.

 

Right now querying posts has logic like this:

WHERE (((((((((("community"."removed" = $9) AND ("community"."deleted" = $10)) AND ("post"."removed" = $11)) AND ("post"."deleted" = $12)) AND (("community"."hidden" = $13)

Note that a community can be hidden or deleted, separate fields. And it also has logic to see if the creator of the post is banned in the community:

LEFT OUTER JOIN "community_person_ban" ON (("post"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post"."creator_id"))

And there is both a deleted boolean (end-user delete) and removed boolean (moderator removed) on a post.

Much of this also applies to comments. Which are also owned by the post, which are also owned by the community.

 

This is taking longer than other aggregate updates, and I think the join can be eliminated:

CREATE FUNCTION public.community_aggregates_comment_count() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
begin
  IF (was_restored_or_created(TG_OP, OLD, NEW)) THEN
update community_aggregates ca
set comments = comments + 1 from comment c, post p
where p.id = c.post_id
  and p.id = NEW.post_id
  and ca.community_id = p.community_id;
ELSIF (was_removed_or_deleted(TG_OP, OLD, NEW)) THEN
update community_aggregates ca
set comments = comments - 1 from comment c, post p
where p.id = c.post_id
  and p.id = OLD.post_id
  and ca.community_id = p.community_id;

END IF;
return null;
end $$;

pg_stat_statements shows it as:

update community_aggregates ca set comments = comments + $15 from comment c, post p where p.id = c.post_id and p.id = NEW.post_id and ca.community_id = p.community_id

TRIGGER:

CREATE TRIGGER community_aggregates_comment_count AFTER INSERT OR DELETE OR UPDATE OF removed, deleted ON public.comment FOR EACH ROW EXECUTE FUNCTION public.community_aggregates_comment_count();

 

Is .moderators supposed to be on GetCommunity() result? I can't seem to find it in lemmy_server api_tests context. All I'm getting is languages and community_view

EDIT: Wait, so there is a "CommunityResponse" and a "GetCommunityResponse", object? What call do I use to Get a GetCommunityResponse object?

https://github.com/LemmyNet/lemmy-js-client/blob/2aa12c04a312ae4ae235f6d97c86a61f58c67494/src/types/GetCommunityResponse.ts#L7

 

in Communities create community/edit community there is a SiteLanguage::read with no site_id, should that call to read have site_id = 1?

For reference, on my production instance my site_language table has 198460 rows and my site table has 1503 rows. Average of 132 languages per site. counts: https://lemmyadmin.bulletintree.com/query/pgcounts?output=table

 

A general description of the proposed change and reasoning behind it is on GitHub: https://github.com/LemmyNet/lemmy/issues/3697

Linear execution of these massive changes to votes/comments/posts with concurrency awareness. Also adds a layer of social awareness, the impact on a community when a bunch of content is black-holed.

An entire site federation delete / dead server - also would fall under this umbrella of mass data change with a potential for new content ownership/etc.

 

We have an urgent performance problem to get finished. The SQL changes are fine, but it seems the Lemmy test code in Rust is defective. This test is failing after we fixed a faulty stored procedure function in PostgreSQL: https://github.com/LemmyNet/lemmy/blob/13a866aeb0c24f20ed18ab40c0ea5616ef910676/crates/db_schema/src/aggregates/site_aggregates.rs#L157

The underlying Rust code needs to be enhanced to query the SQL table with SELECT site_aggregates WHERE site_id = 1, hard-coded 1 is fine, that is always the local site in Lemmy.

Can you please detail all the code changes so that the read method takes an integer parameter for site_id field?

https://github.com/LemmyNet/lemmy/blob/13a866aeb0c24f20ed18ab40c0ea5616ef910676/crates/db_schema/src/aggregates/site_aggregates.rs#L10C7-L10C7

Right now the query has no WHERE clause, pulling the first row it gets. Thank you.

 

person_aggregates is interesting, because it is tracked for all known person accounts on the server. Where site_aggregates does not track all know instances on the server.

Personally I think lemmy-ui needs to be revised to clearly identify that a profile is from another instance and that the count of posts, comments, and the listing of same is incomplete. If a user from another instance is being viewed, you only see what your local instance has comments, posts, and votes for. This will almost always under-represent a user from another instance.

PREMISE: since person_aggregate has a SQL UPDATE performed in real-time on every comment or post creation, I suggest we at least make that more useful. A timestamp of 'last_create', either generic to both post or comment, or individual to each type. I also think a last_login timestamp would be of great use - and the site_aggregates of activity could look at these person_aggregates timestamps instead of having to go analyze comments and posts per user on a scheduled job.

 

Over a short period of time, this is my incoming federation activity for new comments. pg_stat_statements output being show. It is interesting to note these two INSERT statements on comments differ only in the DEFAULT value of language column. Also note the average execution times is way higher (4.3 vs. 1.28) when the language value is set, I assume due to INDEX updates on the column? Or possibly a TRIGGER?

About half of the comments coming in from other servers have default value.

WRITES are heavy, even if it is an INDEX that has to be revised. So INSERT and UPDATE statements are important to scrutinize.

 

Given how frequent these records are created, every vote by a user, I think it is important to study and review how it works.

The current design of lemmy_server 0.18.3 is to issue a SQL DELETE before (almost?) every INSERT of a new vote. The INSERT already has an UPDATE clause on it.

This is one of the few places in Lemmy that a SQL DELETE statement actually takes place. We have to be careful triggers are not firing multiple times, such as decreasing the vote to then immediately have it increase with the INSERT statement that comes later.

For insert of a comment, Lemmy doesn't seem to routinely run a DELETE before the INSERT. So why was this design chosen for votes? Likely the reason is because a user can "undo" a vote and have the record of them ever voting in the database removed. Is that the actual behavior in testing?

pg_stat_statements from an instance doing almost entirely incoming federation activity of post/comments from other instances:

  • DELETE FROM "comment_like" WHERE (("comment_like"."comment_id" = $1) AND ("comment_like"."person_id" = $2)) executed 14736 times, with 607 matching records.

  • INSERT INTO "comment_like" ("person_id", "comment_id", "post_id", "score") VALUES ($1, $2, $3, $4) ON CONFLICT ("comment_id", "person_id") DO UPDATE SET "person_id" = $5, "comment_id" = $6, "post_id" = $7, "score" = $8 RETURNING "comment_like"."id", "comment_like"."person_id", "comment_like"."comment_id", "comment_like"."post_id", "comment_like"."score", "comment_like"."published" executed 15883 times - each time transacting.

  • update comment_aggregates ca set score = score + NEW.score, upvotes = case when NEW.score = 1 then upvotes + 1 else upvotes end, downvotes = case when NEW.score = -1 then downvotes + 1 else downvotes end where ca.comment_id = NEW.comment_id TRIGGER FUNCTION update executing 15692 times.

  • update person_aggregates ua set comment_score = comment_score + NEW.score from comment c where ua.person_id = c.creator_id and c.id = NEW.comment_id TRIGGER FUNCTION update, same executions as previous.

There is some understanding to gain by the count of executions not being equal.

 

a lemmysever .social / LemmyFanatic / post / xxxxx

Same with comments

 

Details here: https://github.com/LemmyNet/lemmy/issues/3165

This will VASTLY decrease the server load of I/O for PostgreSQL, as this mistaken code is doing writes of ~1700 rows (each known Lemmy instance in the database) on every single comment & post creation. This creates record-locking issues given it is writes, which are harsh on the system. Once this is fixed, some site operators will be able to downgrade their hardware! ;)

view more: ‹ prev next ›