jerry

joined 2 years ago
MODERATOR OF
 

My apologies for how long the site was down. Just about everything that could go wrong did, but I think we are working now...

 

In an effort to reduce costs due to the exchange rate and declining donations, I’m going to be consolidating Fedia.io onto another existing server so I can decommission and return the server that it is running on. I’ll be turning off unauthenticated access to content again because that drastically reduces the bandwidth/compute load that Fedia uses. I am not sure on timing yet - likely around 2PM ET/6PM UTC. I’ll plan for an hour downtime but I’m hoping it should only be a few minutes if I do things correctly.

[–] jerry@fedia.io 1 points 3 weeks ago

No kicking on my side. I’ll see if there was anything going on in the logs yesterday.

[–] jerry@fedia.io 4 points 1 month ago

@BeAware@mementomori.social fedia.io was being swamped with crawlers from thousands of IPs causing the site to grind to a halt and periodically crash. I had to limit access to only logged in users while I try to sort out a better way to manage all those crawlers.

@Blaze@lemmy.dbzer0.com

 

I have some time to babysit the server now and so reenabled anonymous access. I've also removed the prior ASN blocks, but may add those back in as needed based on various AI datacenter crawling.

[–] jerry@fedia.io 4 points 1 month ago (1 children)

I understand. I have tried hard to make fedia.io work - it’s been far and away the most challenging app I’ve managed (note: the problems are all legacy kbin issues, the mbin team has been nothing but amazing). I am stuck in a difficult position - the site isn’t useful if I keep it locked down like it is now, and the site is super slow/requires constant attention if I make it open. I’ll have to assess my options and decide what the future for fedia is

[–] jerry@fedia.io 2 points 1 month ago (1 children)

Apologies for the delay, but this is fixed now

[–] jerry@fedia.io 2 points 1 month ago

Ohh - that is possible. I will check when I get back to my computer.

[–] jerry@fedia.io 10 points 1 month ago

I will add that to the donation page

[–] jerry@fedia.io 5 points 1 month ago

You and the mbin team continues to amaze me. Thank you so much!

[–] jerry@fedia.io 10 points 1 month ago (2 children)

It’s an application level ddos. Blocking anonymous access helped a bunch, but I am still getting about 5-10 login requests per second from hundreds of different IPs

[–] jerry@fedia.io 10 points 1 month ago

Thanks. Just trying to give people some alternatives

[–] jerry@fedia.io 10 points 1 month ago

We think it’s a csrf prevention measure in the php symphony library that creates a lot of database calls.

[–] jerry@fedia.io 7 points 1 month ago (1 children)

Not really. We have to accept incoming connections from thousands of other fediverse instances that would be blocked by that.

 

Hi all. Fedia.io has for a long time been subject to ddos attacks, including many that are "accidental", caused by myriad scrapers constantly hammering the site. I gave up on trying to play whack-a-mole with blocking them based on IP address (they do not honor robots.txt and do not use a conspicuous user agent string) since I was inadvertently blocking some legitimate users. So, I've restricted access to the content of fedia.io to only those that are logged in. That will mean we don't show up in search engines and whatnot, which for some will considered a good thing and will likely cause others to leave.

There is a remaining problem related to the login form. Calls to the login page are breathtakingly expensive, computationally speaking, and so I also have a script that monitors unusual numbers of calls to that form and blocks at the firewall any offenders. I strongly suspect I'm catching some legitimate users with this too, and so I continue to try to tune it, but it's maddening, y'all.

These issues have been causing performance problems for everyone (despite the fedia.io app running on a dedicated 96 core, 256GB server with nvme disks), and became unavailable for certain people that accidentally tripped various thresholds. I'm hoping most of this is resolved now.

Thanks for the patience.

 

My apologies for the recent spate of problems. I think I’ve narrowed the problem down to the /m/fediverse and /m/random magazines. For some reason, mbin is generating an enormous amount of outbound delivery messages for these two magazines. I first tried removing the hashtags from /m/fedivese, but that was only a quick fix. So I deleted the magazine. (Note, the notifications appear to be related to the “microblog” function, and were originating from accounts on lots of mastodon instances, so I think there is a bug somewhere).

I noticed /m/random doing something similar. I have removed all the subscribers from that magazine to try to reduce the number of notifications it is sending. I don’t know if that will help - I have a feeling the instance can’t keep up with that happening in both random and fediverse.

Anyhow, the queues are draining fast now. I purged about 600000 queued delivery messages that (based on a random sample) all appeared to be associated with fediverse and random. That should let the rest of whatever is backed up get moving again. and hopefully stay moving.

 

The following instances will be offline briefly on Saturday, December 14 from 9am ET / 2pm UTC for approxmately 10 minutes: infosec.exchange infosec.town infosec.pub pixel.infosec.exchange books.infosec.exchange matrix/element.infosec.exchange relay.infosec.exchange meetup.infosec.exchange video.infosec.exchange infosec.press infosec.place fedia.io fedia.social elk,.infosec.exchange infosec.space convo.casa

The servers supporting these instances require a reboot. The Dell servers these instances run on take a very long time to boot, so I am estimating 10 minutes of downtime. It could be more, could be less.

We use live patches to minimize reboots needed for patching, however Ubuntu only provides livepatch support for a year, which is how long most of these systems have been running for.

 

It’s been a long day. I will fix it when I am back in front of a computer. It might be a few hours. My apologies.

 

I have sort of given up in fixing the problem, and will instead work on auto-detecting and auto-recovering when the problem happens.

 

I just saw this: https://every.to/p/the-disappearance-of-an-internet-domain

I have no idea if it's real, but if it is, that will be most unfortunate

 

After I resolved the federation issue, I had to clean up a few things and so the site may have been unavailable for a bit. I'm done fussing with it and will keep an eye on it to make sure things are working.

IF YOU SEE PROBLEMS - please let me know. As far as I know, I've fixed all of the federation and error 500 issues we've had, so please don't assume it's just more of the same if you see them.

Thanks for your patience.

 

Fedia.io is sort of like she Ship of Theseus right now - I literally replaced nearly everything trying to get it back working.

The problem ended up being a silent out of memory error that php-fpm was running into. I had to increase the memory limit to about 10x what the docs require to get it to work, but once I did that, it works great.

I was only able to sort this out after @bentigorlich recommended I move the site to debug mode (which requires me to lock everyone else out). Once I did that, it started giving some useful errors.

My apologies for the amount of time it took to fix this. I learned a lot about php today.

 

Hi all. As some of you have reported, outbound federation to at least some other instances is broken from fedia.io. At the moment. I don't know why and I don't have any leads as there are no logs or other indications of what is going wrong, but I am working on it.

view more: next ›