this post was submitted on 10 May 2026
1 points (100.0% liked)
The Smol Web
412 readers
5 users here now
Community for the appreciation and sharing of links, resources, and culture of: the smol web / small web / ~(w)~ / the indie web / or even the non-www internet (gemini, gopher, etc).
Back of a napkin definition, subject to change: if it's internet accessible and is maintained by a person, especially for non-commercial aims, then I would consider it smol. There are, however, much stricter definitions.
Definitions
- https://smolweb.org/
- https://indieweb.org/
- https://smallweb.page/
- https://wikipedia.org/wiki/IndieWeb
- https://cheapskatesguide.org/articles/small-internet.html
Resources
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I need to join more communities, because I'm noticing these anti-scraper questions way too late.
I'd like to direct your attention to iocaine. It's somewhat similar to Anubis in the sense that it sits between your reverse proxy and the real content, but unlike Anubis, it does not use proof of work. It exploits the fact that most of the scrapers are incredibly dumb, and can be trivially detected:
Firefox/orChrome/in the user agent, but sent nosec-fetch-modeheader? Pretty much guaranteed to be a crawler, with few exceptions (eg, Googlebot, Bingbot - but I'd classify those as hostile crawlers too)Serve garbage or a static page with poisoned URLs to these, and you got rid of 90%+ of the bots. Why the poisoned URLs? Because when they come back riding headless chromes, they usually crawl URLs the dumb bots collected. If you poison those URLs in a way that you can identify them trivially, you can block the headless chromes too, which you wouldn't be able to detect otherwise. Whether they come through residential proxies or not, as long as their queue is collected by the dumb bots, you can catch them.
On top of this, to reduce the load on your servers, iocaine can also block requests. It can be configured to serve garbage & poisoned URLs to the dumb bots, and then firewall anything that hits a poisoned URL.
The false positive rate is surprisingly low.