32
submitted 1 week ago* (last edited 4 days ago) by xXPoisonFoxXx@sh.itjust.works to c/piracy@lemmy.dbzer0.com

I was able to get a list of the most recent anime from aniwave using this reddit thread Goofhey made: https://old.reddit.com/r/animepiracy/comments/1f2xbg7/archived_aniwaves_12000_anime_pages_on_wayback/ and scraping all 411 pages archived in the wayback machine. Back in March I built a web scraper using python requests and beautiful soup and got a list of all of aniwaves current anime sorted in alphabetical order. I compared that list to what was most recently saved in wayback machine by Goofhey. I discovered that some anime were missing. I guess its because the pages saved by Goofhey in the wayback machine were sorted by recently updated and since recently updated is constantly changing it caused some anime to be excluded but I think I got all or most of them by combining both list. Then a using a Disqus scraper I made I fed it links from the list I made and downloaded the comments. I tested the scraper on various sites(myasiantv, gogoanime, aniwave) the scraper can most likely work on most websites that use disqus with a bit of tweaking.

I also managed to get all of Gogoanime's old comments from before 2021 going all the way back to 2014/2015. Something interesting I found is that a few copycat websites(6anime, gogoanimes) still have all of gogoanimes old comments from before 2021.


Most commented pages on each site sorted from most(Aniwave) to least(Anitaku) amount of comments:

Aniwave(9anime): Attack on Titan The Final Season Part 3 Episode 1

Gogoanime Old comments: Yuri on Ice Category page

Anitaku(Gogoanime): Kimetsu no Yaiba Yuukaku Hen Episode 10

Folders were compressed into tarballs with zstd level 9 compression:

Aniwave(9anime): TOTAL GB UNCOMPRESSED: 23.7 GiB TOTAL GB COMPRESSED:1.4 GiB

Gogoanime: TOTAL GB UNCOMPRESSED: 16.4 GiB TOTAL GB COMPRESSED: 769.5 MiB

Anitaku(Gogoanime): TOTAL GB UNCOMPRESSED: 7.2 GiB TOTAL GB COMPRESSED: 326.7 MiB

DOWNLOADS:

Aniwave(9anime) Comments: https://archive.org/details/aniwave-comments.tar

Anitaku(Gogoanime) March 2024: https://archive.org/details/anitaku-feb-2024-comments.tar

Gogoanime Comments Before 2021: https://archive.org/details/gogoanimes-comments-archive-prior-2021.tar

EDIT: I replaced all the mega links with archive.org links and removed all images to reduce file size

you are viewing a single comment's thread
view the rest of the comments
[-] hexagonwin 1 points 1 week ago

Awesome. May I ask the Disqus scraper you used?

[-] xXPoisonFoxXx@sh.itjust.works 1 points 4 days ago

it my own custom scraper that I have been slowly working on since January.

this post was submitted on 22 Nov 2024
32 points (97.1% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

54746 readers
615 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):


💰 Please help cover server costs.

Ko-Fi Liberapay
Ko-fi Liberapay

founded 2 years ago
MODERATORS