29
submitted 1 year ago by jerry@fedia.io to c/fedia@fedia.io

So here's the deal with kbin: kbin uses of symphony messenger processes, which are roughly equivalent to sidekiq in mastodon.

After I moved fedia from the docker hosted environment to a bare metal instance, I had all manner of database issues - the dump and reload didn't work well, creating many duplicate records. That caused the messenger services to die and the queue of activitypub records to process grew huge. Restarting the messenger service worked, however it would never finish, so I increased the number of messenger workers to 16. That kept the queue nice and clean.

HOWEVER, it appears that running multiple messenger processes creates race conditions where things like images ids are created and assigned to different entity records (like posts) but there is no actual image record created, so when kbin goes to draw a page, it runs a complex query to pull magazine info, post info, comments info, user info and all of their respective images. Those records LOOK like they have an image, but there is no actual image, and so kbin says 💩​ I ain't working and gives the wonderful 500 error.

Setting the messenger services back to 1 seems to be at least not be making the problem worse, but now I have to go find all the broken database record linkages.

top 6 comments
sorted by: hot top controversial new old
[-] Australis13@fedia.io 2 points 1 year ago

Ouch! Thanks for all your hard work on this. As somebody who has one foot in the IT world, I can empathise with the difficulty of managing parallel processes.

[-] photography@fedia.io 1 points 1 year ago

What a mess, thanks Jerry for digging into this.

The magazine I created (photography) seems to be suffering from many 500 errors but makes sense that it had several image posts.
When it works, the images appear to be there but also got more 500's than Indianapolis in May.

Though at the moment seems to be stable.

[-] jerry@fedia.io 2 points 1 year ago

have you seen any errors over the past 3 hours?

[-] chris@fedia.io 1 points 1 year ago

Mine was stable when you first posted this, but after maybe an hour I started getting 500s again. Currently if I try to look at subscriptions it throws that error, other areas seem ok at the moment.

[-] jerry@fedia.io 1 points 1 year ago

There are definitely still a few magazines or threads or messages or posts or users or all of the above that are hosed up. I’ve been running mad SQL statements to try to hunt them down and I’m now into the long tail of issues.

[-] troed@fedia.io 1 points 1 year ago

Thanks for all the work you've put it! Seems to work fine now since a while back.

this post was submitted on 22 Jun 2023
29 points (100.0% liked)

Fedia Discussions

2 readers
3 users here now

founded 1 year ago
MODERATORS