algernon

joined 2 years ago
[–] algernon@lemmy.ml 1 points 7 hours ago

I can do that, probably over the weekend.

[–] algernon@lemmy.ml 3 points 22 hours ago (2 children)

It's for self-hosters, I'm afraid. The idea behind it is that your webserver / reverse proxy forwards all GET and HEAD requests to it, it does some heuristics and either returns a page filled with garbage your webserver/reverse proxy can serve, or a 421 (misdirected request) error code, at which point your webserver / reverse proxy can serve the real contents. So to run it, you need to be able to make it play nice with your webserver or reverse proxy, which pretty much means you'll have to self-host, yeah.

The garbage part is important, because that's how iocaine serves poisoned URLs (urls that have an identifiable substring), so if any of the crawlers come back, and manage to get through all other heuristics iocaine puts in front of them, if they land on a poisoned URL, they'll get caught anyway.

[–] algernon@lemmy.ml 1 points 1 day ago (1 children)

I've been drooling over the open tab ever since the Kickstarter went up. I have many keyboards from my firmware hacking days, but one with a keywell is missing from my collection. Sadly can't afford it, but it'll remain on my wishlist, so whenever that changes, whenever I'll be able to afford one, I will get one.

I'm not sure it would replace my Keyboardio Model 100 (the thumb key on that is just so incredibly useful), but it's the closest thing I've seen so far.

Good luck!

[–] algernon@lemmy.ml 9 points 1 day ago (4 children)

The best I can do against AI is prevent their crawlers from accessing my work. I've built iocaine for this purpose, and it's happily serving an infinite maze of poison to all the crawlers, and they ain't getting through. A number of other people and organizations also use it, and that gives me warm, fuzzy feelings.

I also wrote about surviving the crawlers, and helped a couple of friends have a li'l fun with them. That's actually quite fun.

I also helped our twins ween their teacher off of ChatGPT: she's otherwise a great teacher, but she's been using ChatGPT to give them homework. It had so many mistakes, typos! So they started to quietly correct the exercises with red pen, and handing the homework in, with the questions corrected too. Other kids saw that too, started doing the same. Then it spread to other classes. Two months later, none of the teachers use ChatGPT (or any other "AI") anymore, and the word of them being laughably crap spread to households that otherwise wouldn't be aware. New kid came to town, "you guys know about chatgpt?!" - he stopped talking about it by the end of the week.

Sometimes being openly, vocally very AI-hostile pays off. So I'm going to continue doing that.

[–] algernon@lemmy.ml 0 points 5 days ago

Way back when in 2015, I assigned all my past and future copyrights to any contributions I made to Debian-owned projects (including any packaging work I made, etc) to the SFC, because our values aligned at the time, and it made sense. Following this post of theirs, I sent them an email asking them to reassign copyright back to me going forward.

I have not contributed to Debian in years, and the chance of me doing so in the foreseeable future is slim, but I could not stand and watch without sending a message.

I will continue to tell anyone who dares come close to my projects with an LLM to fuck off, and will ban them from any spaces I have control over. "AI" companies have caused immeasurable harm to FLOSS projects, not shunning them is ceding ground to them. And when you let them in, sooner or later, only they will remain.

I'll stick to people, thankyouverymuch.

[–] algernon@lemmy.ml 1 points 1 week ago

I'm using a setup similar to what you had in mind: I have a small €4/month VPS as my front, with scrapers taken care of by iocaine (it both blocks them, and firewalls the worst off automatically). That's over 90% of the HTTP(s) traffic never making it past the VPS, greatly reducing the traffic into my home network. My actual servers are behind a WireGuard tunnel.

It does not protect against a non-HTTP DDoS, but that wasn't part of my threat model to begin with. My VPS provider (Hetzner) has DDoS protection even for €4/month servers - that doesn't include the scraper DDoS, but includes other kinds - I have luckily not been a victim of any, so no idea whether it works reliably.

Against the scrapers, a VPS + bot defense + Wireguard works like a charm. Can recommend.

[–] algernon@lemmy.ml 3 points 1 week ago

Depends on what kind of DDoS OP wants to defend against. Defending against an AI crawler DDoS is entirely possible with a tiny VPS. I've been doing that for the past ~1.5 years on a €4/month CX23 Hetzner VPS.

[–] algernon@lemmy.ml 3 points 1 week ago

I wouldn't contribute back, and would switch to (or write) an alternative ASAP.

There are cases where I have to use AI-tainted dependencies (though, none of the AI-tainted dependencies I currently use across my projects are fully vibe coded, they're merely tainted), but if I have to patch one? That's gonna be a fork or rewrite, and there's no chance in hell I'm contributing back.

[–] algernon@lemmy.ml 2 points 1 week ago

Complain first, malicious compliance after, job seeking next, then move on to a better company. If the C-suite has a very strong hard-on for AI, skip the first two. Once the bubble pops, many of these companies that mandated AI will pop too - leave the ship before that happens.

Been there, done that, there are jobs without AI requirements, and increasingly more that forbid AI. It's not easy, but it is doable.

[–] algernon@lemmy.ml 13 points 1 week ago (4 children)

No, it isn't. In my opinion, using LLMs/"AI" for anything is unethical, and unacceptable.

That includes "open-source" models too, because they're all trained by scraping the internet, and many of them (especially Qwen) try very hard to get around any and all attempts at blocking them. Not only do they not respect neither /robots.txt nor x-robots-tag headers, Qwen - and many other models - collect training data by using residential proxies, and by trying to fake real browsers, to get around crawler defenses.

For receipts, see here for example: AliBaba sent over a million requests my way in a single day. That's already a lot, but: I'm firewalling these crawlers off for 12 hours after the first hit. It would have been a lot more if it weren't for the firewall (before the firewall, I often had 60-70 million requests / day from Alibaba alone). Here's how it looked prior to the firewall. Look at the "Rule hit distribution" panel. That near constant 200req/sec "asn" is almost entirely Alibaba. Much of the ~300req/sec "faked-browser" too, and I suspect that at least half of the "generated-url" wave with its ~800req/sec top are also Alibaba through residential proxies.

These crawlers are DDoSing the entire internet, and we have to come up with stupid defenses to keep ourselves online. By using any of these models, you're enabling them. Don't do that.

If you want to learn a new programming language, have its docs open, find small projects written in the language, find well documented libraries, packages, etc, and explore those. Far more accurate than any LLM, and you're not supporting the AI bubble and the relentless DDoS. You can even shove those resources into a personal search engine and query that. Hister is a decent option for that, for example.

[–] algernon@lemmy.ml 10 points 1 week ago

A simple, but guilty pleasure of mine is reading HackerNews story titles (and titles only, most of the time), and tooting sarcastic reactions about them. It's very therapeutic. (And for easier browsing, I made a HN-like website (read-only) for said toots of mine).

[–] algernon@lemmy.ml 1 points 4 weeks ago

It can stop them nowadays, by firewalling some of the crawlers off. The reason it doesn't stop them by default is because it serves them poisoned URLs, which it can later identify if the crawlers come back riding a headless Chrome. But once they do that, and hit a poisoned URL, there's little reason to let them wander in an endless maze further: serve one request, and block the IP.

I've been running that on my own infra, and my daily number of requests went down from ~50+ million to... 2 million.

view more: next ›