I will grab this torrent when I get home and make it a permanent seed, alongside the one outing nazis in Patriot Front.
196
Community Rules
You must post before you leave
Be nice. Assume others have good intent (within reason).
Block or ignore posts, comments, and users that irritate you in some way rather than engaging. Report if they are actually breaking community rules.
Use content warnings and/or mark as NSFW when appropriate. Most posts with content warnings likely need to be marked NSFW.
Most 196 posts are memes, shitposts, cute images, or even just recent things that happened, etc. There is no real theme, but try to avoid posts that are very inflammatory, offensive, very low quality, or very "off topic".
Bigotry is not allowed, this includes (but is not limited to): Homophobia, Transphobia, Racism, Sexism, Abelism, Classism, or discrimination based on things like Ethnicity, Nationality, Language, or Religion.
Avoid shilling for corporations, posting advertisements, or promoting exploitation of workers.
Proselytization, support, or defense of authoritarianism is not welcome. This includes but is not limited to: imperialism, nationalism, genocide denial, ethnic or racial supremacy, fascism, Nazism, Marxism-Leninism, Maoism, etc.
Avoid AI generated content.
Avoid misinformation.
Avoid incomprehensible posts.
No threats or personal attacks.
No spam.
Moderator Guidelines
Moderator Guidelines
- Don’t be mean to users. Be gentle or neutral.
- Most moderator actions which have a modlog message should include your username.
- When in doubt about whether or not a user is problematic, send them a DM.
- Don’t waste time debating/arguing with problematic users.
- Assume the best, but don’t tolerate sealioning/just asking questions/concern trolling.
- Ask another mod to take over cases you struggle with, if you get tired, or when things get personal.
- Ask the other mods for advice when things get complicated.
- Share everything you do in the mod matrix, both so several mods aren't unknowingly handling the same issues, but also so you can receive feedback on what you intend to do.
- Don't rush mod actions. If a case doesn't need to be handled right away, consider taking a short break before getting to it. This is to say, cool down and make room for feedback.
- Don’t perform too much moderation in the comments, except if you want a verdict to be public or to ask people to dial a convo down/stop. Single comment warnings are okay.
- Send users concise DMs about verdicts about them, such as bans etc, except in cases where it is clear we don’t want them at all, such as obvious transphobes. No need to notify someone they haven’t been banned of course.
- Explain to a user why their behavior is problematic and how it is distressing others rather than engage with whatever they are saying. Ask them to avoid this in the future and send them packing if they do not comply.
- First warn users, then temp ban them, then finally perma ban them when they break the rules or act inappropriately. Skip steps if necessary.
- Use neutral statements like “this statement can be considered transphobic” rather than “you are being transphobic”.
- No large decisions or actions without community input (polls or meta posts f.ex.).
- Large internal decisions (such as ousting a mod) might require a vote, needing more than 50% of the votes to pass. Also consider asking the community for feedback.
- Remember you are a voluntary moderator. You don’t get paid. Take a break when you need one. Perhaps ask another moderator to step in if necessary.
Shit good idea, didn't even know you could do this.
What else should we seed? I've got a homelab and am eager to put some storage to use for something like this.
🚨 BIG NEWS Y'ALL! 🚨
Someone just saved ALL the CDC's public data before it could disappear! 🦅
What's the Deal?
Some mystery hero downloaded everything from the CDC's website (that's 98 GIGABYTES of health info!) and uploaded it to the Internet Archive on Jan 28th. Think of it like making a backup copy of your phone before it breaks!
Why Should You Care?
- This is YOUR health data - stuff about vaccines, diseases, and public health that your tax dollars paid for! 🏥
- Once this info is gone from CDC's website, it could be really hard for your doctor to get important updates
- Researchers need this to keep studying ways to keep Americans healthy 💪
What's Next?
Smart folks at places like Harvard are making sure this data stays safe by keeping copies. It's like having multiple backups of your family photos - can't be too careful!
Remember folks: Knowledge is power, and someone just made sure we didn't lose a whole bunch of it! 🎯
#SaveTheData #PublicHealth #AmericanRight2Know
Source: Internet Archive upload by anonymous user on Jan 28, 2025 Post by Ed Summers (@edsu@social.coop) - Feb 3, 2025
it's weird that I learned of this through this community and not a security or health community. something to look into tomorrow
As a reminder, AI generated content is against the rules in this community—see the sidebar. I appreciate your instinct to bring some quality content to this space, but let’s please keep in mind that genuine interaction with diverse voices is what makes this community beautiful. :)
My reasoning:
- You have personally admitted to writing AI comments in the past: https://sh.itjust.works/comment/16482371
- Heavy use of markdown headings, bullets, and section dividers is a common pattern in LLM output
- Use of “it’s like” or “it’s about” phrases as the conclusion to a paragraph are very common in LLM models like ChatGPT
- Verbatim replication of content from my original post that is common in LLM output and highly indicates an LLM was instructed to create something based on the text of the original post
- Use of 🎯 emoji does not match context
- “100% AI generated” response on multiple AI detection websites (GPTZero, Quillbot)
Any single one of these facts would not lead me to comment, but with all of it combined it makes a pretty strong case. Thank you for your contribution to this community but please let’s keep it genuine in the future! We love and appreciate the real you :)
We are screwed if the Internet Archive goes down, right?
Seems like a huge point of failure for one entity.
Agreed, I think the biggest issue though is just scale. It’s over 100 petabytes of data. Not outside the realm of big cloud providers to mirror, but they don’t really give a shit. It would require some sort of significant distributed software solution for the community to work with. Not impossible, but as far as I know, nobody’s taken up the mantle yet as I think it would need custom software just to begin the solution of how to distribute it as a sharded set of community mirrors, different people just mirroring individual pieces.
So about 104,857,600 GB? You'd need 105,000 people with 1 TB each to save that. Or...
Assuming you bought 30 TB SSDs, you'd need about 3,500 of those, costing €80 each.
That'd be €280k, but let's round it to €300k.
If every person spent €960 (or €80 per month), then each person could get 12 of those SSDs. You'd need 8,750 people to do that.
Should be doable if crowdfunded by a community, or if you had some big donor. Then you'd need to connect it.
Looking at diskprices.com, lowest prices for storage are around $8 (used) or $15 (new). I didn't look too hard, but a 30TB SSD for $80 (~$2.5/TB) seems wrong?
100K TB * $15/TB = $1.5 million
Assuming 100PB is the amount of data, we'd also need redundancy. Idk what best practices would be, but I'll say 3ish copies, so 300PB total.
So a grand total of ~$5 million.
Which is crazy cheap, all things considered. Like, it would be no problem for a single rich person to handle that.
Hell, subsidize/give away cheap little computers that you just plug power and an Ethernet cable into. Raspberry pi + 4TB drive ($60) + casing would be like... $100? Though I guess you'd need 75K of them, and the cost per TB is pretty bad.
This guy is 20TB for $280: https://a.co/d/17UOtFi
If we stick with $40 of overhead for rpi etc, that's $320 for 20TB ($16/TB), and we'd need 300PB/(20TB/unit) = 15K units. And at $320 each, all in would be $4.8 million.
The software seems to exist for connecting them all... So idk seems like it would be absolutely feasible? Would be interested to learn if I'm missing a major cost.
For the 30 TB SSD i looked at sites like Luntek.
HexOS has a plan for shared encrypted data. With the simplicity of installation and management it could take off mainstream as personal NAS are gaining popularity, but its still in early development.
Interplanetary File System can do it
Get ready to Donate to their legal defense fund
It it long past overdue for the Internet Archive to move to the EU or Switzerland or something.
Yep.
I wish they also could implement a decentralised hosting protocol, though I know currently that technology is in it’s infancy.
you’re right and you should say it but it makes me sad
As long as money still means something after Elon is through with the Treasury...
hi spujb. Only 98gb? I can mirror that 🤷♀️
e: https://kate.fail/cdc_2025_01_28/archive.org/download/20250128-cdc-datasets/
I suggest also mirroring on https://academictorrents.com/
posted the link, i think there are a few files missing, not sure why. but the folder reads as 95GB
sry i dont know what that is but once i have all the data ill post a link here. im hosting in france and i am also outside the us so i will not take down the data at tronald dumps request tyvm.
Use his original last name. ~~Drumph~~ Drumpf. It pisses him off as much as being told that he has baby hands.
His father or grandfather changed it.
I'm gonna download it when I get home and put on a few USBs. They won't be connected to any device and will be stored in safes.
Can't remote wipe data that's not connected.
The more backups of important information we have the better.
How would you recommend someone go about archiving important parts of the IA? Just external drives?
The Internet Archive is, and I really want to emphasize this, Fucking Huge. If you want to help archive it, every upload has an associated torrent you can download and help seed. Torrenting itself isn't illegal, only torrenting illegal stuff like copyrighted movies. You can buy a relatively cheap refurbished HDD of whatever size you want, set up qBittorrent, and torrent the uploads that you want to make sure are available even if the Internet Archive has to take them down or has a critical data loss failure.
Thank you so much for the advice! I want to preserve important documents like the bill of rights and the constitution, as well as sexual education material, especially stuff pertaining to women and reproductive health. Also banned books. Things the facists are trying to purge and things that are important to me.
In the case of books, Anna's Archive is looking for help seeding their enormous collection of books and research papers. Consider reading that page and helping them as well!
I know what you're talking about is important and a necessary comment but something about your comment hit me hard. It's just so absolutely insane that it has to be said/done.
Ikr? It's wild that all of this is happening.
If anyone is looking for something specific to preserve, consider Our Bodies, Ourselves. It's a seminal feminist work that seeks to educate women on their bodies. It's extremely comprehensive, thicker than most textbooks.
well and truly based
Because the feds didn't already have it out for IA.
Good thing they’re based far from the US in… oh.
Okay, given how things are going, do we know if the Internet Archive has a backup plan for when these fucks attack it in earnest?
Was this previously public data? Not illegal to download an torrent, right?
from the linked page
Excludes corrupt datasets and data not publicly accessible.
it sounds like it's only stuff that was already publicly available tho
Some of the publicly available data is disappearing under the new administration. Most notably information about COVID, long COVID, vaccines, and bird flu is disappearing. Presumably, this data dump contains the missing data.
Importantly they are also removing all mentions of climate change. I imagine they'll be deleting data on that front as well.
key word was
Inb4 it gets DDoS'd again