this post was submitted on 31 Jan 2026

279 points (99.6% liked)

datahoarder

9897 readers

322 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 6 years ago

MODERATORS

archivist@lemmy.ml

279

Epstein Files Jan 30, 2026 Release - Archived from Justice.gov (lemmy.world)

submitted 4 days ago* (last edited 2 days ago) by xodoh74984@lemmy.world to c/datahoarder@lemmy.ml

200 comments fedilink hide all child comments

Epstein Files Jan 30, 2026

Data hoarders on reddit have been hard at work archiving the latest Epstein Files release from the U.S. Department of Justice. Below is a compilation of their work with download links.

Please seed all torrent files to distribute and preserve this data.

Ref: https://old.reddit.com/r/DataHoarder/comments/1qrk3qk/epstein_files_datasets_9_10_11_300_gb_lets_keep/

Epstein Files Data Sets 1-8: INTERNET ARCHIVE LINK

Epstein Files Data Set 1 (2.47 GB): TORRENT MAGNET LINK
Epstein Files Data Set 2 (631.6 MB): TORRENT MAGNET LINK
Epstein Files Data Set 3 (599.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 4 (358.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 5: (61.5 MB) TORRENT MAGNET LINK
Epstein Files Data Set 6 (53.0 MB): TORRENT MAGNET LINK
Epstein Files Data Set 7 (98.2 MB): TORRENT MAGNET LINK
Epstein Files Data Set 8 (10.67 GB): TORRENT MAGNET LINK

Epstein Files Data Set 9 (Incomplete). Only contains 49 GB of 180 GB. Multiple reports of cutoff from DOJ server at offset 48995762176.

ORIGINAL JUSTICE DEPARTMENT LINK

TORRENT MAGNET LINK (removed due to reports of CSAM)

/u/susadmin's More Complete Data Set 9 (96.25 GB)
De-duplicated merger of (45.63 GB + 86.74 GB) versions

TORRENT MAGNET LINK (removed due to reports of CSAM)

Epstein Files Data Set 10 (78.64GB)

ORIGINAL JUSTICE DEPARTMENT LINK

TORRENT MAGNET LINK (removed due to reports of CSAM)
INTERNET ARCHIVE FOLDER (removed due to reports of CSAM)
INTERNET ARCHIVE DIRECT LINK (removed due to reports of CSAM)

Epstein Files Data Set 11 (25.55GB)

ORIGINAL JUSTICE DEPARTMENT LINK

TORRENT MAGNET LINK

SHA1: 574950c0f86765e897268834ac6ef38b370cad2a

Epstein Files Data Set 12 (114.1 MB)

ORIGINAL JUSTICE DEPARTMENT LINK

TORRENT MAGNET LINK
INTERNET ARCHIVE FOLDER LINK

SHA1: 20f804ab55687c957fd249cd0d417d5fe7438281
MD5: b1206186332bb1af021e86d68468f9fe
SHA256: b5314b7efca98e25d8b35e4b7fac3ebb3ca2e6cfd0937aa2300ca8b71543bbe2

This list will be edited as more data becomes available, particularly with regard to Data Set 9 (EDIT: NOT ANYMORE)

EDIT [2026-02-02]: After being made aware of potential CSAM in the original Data Set 9 releases and seeing confirmation in the New York Times, I will no longer support any effort to maintain links to archives of it. There is suspicion of CSAM in Data Set 10 as well. I am removing links to both archives.

Some in this thread may be upset by this action. It is right to be distrustful of a government that has not shown signs of integrity. However, I do trust journalists who hold the government accountable.

I am abandoning this project and removing any links to content that commenters here and on reddit have suggested may contain CSAM.

Ref 1: https://www.nytimes.com/2026/02/01/us/nude-photos-epstein-files.html
Ref 2: https://www.404media.co/doj-released-unredacted-nude-images-in-epstein-files

top 50 comments

sorted by: hot top controversial new old

[–] o_derr889@lemmy.world 4 points 3 hours ago (1 children)

Here is the download link for a text file that has all the original URL's https://wormhole.app/PpjJ3P#SFfAOKm1bnCyi-h2YroRyA The link will only last for 24 hours.

[–] acelee1012@lemmy.world 1 points 1 hour ago (1 children)

I have never made a torrent file before so feel free to correct me if it doesn't work. Here is the magnet link for this as a torrent file so its up for more than an hour magnet:?xt=urn:btih:694535d1e3879e899a53647769f1975276723db7&xt=urn:btmh:12207cf818f0f0110ca5e44614f2c65e016eca2fe7bc569810f9fb25e80ff608fc9b&dn=DOJ%20Epstein%20file%20urls.txt&xl=81991719&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

[–] jandrew13@lemmy.world 1 points 1 hour ago (1 children)

What does this contain? anything new?

[–] Wild_Cow_5769@lemmy.world 1 points 19 minutes ago

its a file list but not the actual files tho.

[–] acelee1012@lemmy.world 3 points 14 hours ago

Has anyone made a Dataset 9 and 10 torrent file without the files in it that the NYT reported as potentially CSAM?

[–] activeinvestigator@lemmy.world 3 points 15 hours ago (3 children)

Do people here have the partial dataset 9? or are you all missing the entire set? There is a magnet link floating around for ~100GB of it, the one removed in the OP

I am trying to figure out exactly how many files dataset 9 is supposed to have in it. Before the zip file went dark, I was able to download about 2GB of it. This was today, maybe not the original zip file from jan 30th In the head of the zip file is an index file, VOL00009.OPT, you don't need the full download in order to read this index file. The index file says there are 531,307 pdfs the 100GB torrent has 531,256, it's missing 51 pdfs. I checked the 51 file names and they no longer exist as individual files on the DOJ website either. I'm assuming these are the CSAM.

note that the 3M number of released documents != 3M pdfs. each pdf page is counted as a "document". dataset 9 contains 1,223,757 documents, and according to the index, we are missing only 51 documents, they are not multipage. In total, I have 2,731,789 documents from datasets 1-12, short of the 3M number. the index I got also was not missing document ranges

it's curious that the zip file had an extra 80GB when only 51 documents are missing. I'm currently scraping links from the DOJ webpage to double check the filenames

[–] Arthas@lemmy.world 2 points 11 hours ago* (last edited 11 hours ago)

i analyzed with AI my 36gb~ that I was able to download before they erased the zip file from the server.

Complete Volume Analysis

  Based on the OPT metadata file, here's what VOL00009 was supposed to contain:

  Full Volume Specifications

  - Total Bates-numbered pages: 1,223,757 pages
  - Total unique PDF files: 531,307 individual PDFs
  - Bates number range: EFTA00039025 to EFTA01262781
  - Subdirectory structure: IMAGES\0001\ through IMAGES\0532\ (532 folders)
  - Expected size: ~180 GB (based on your download info)

  What You Actually Got

  - PDF files received: 90,982 files
  - Subdirectories: 91 folders (0001 through ~0091)
  - Current size: 37 GB
  - Percentage received: ~17% of the files (91 out of 532 folders)

  The Math

  Expected:  531,307 PDF files / 180 GB / 532 folders
  Received:   90,982 PDF files /  37 GB /  91 folders
  Missing:   440,325 PDF files / 143 GB / 441 folders

  ★ Insight ─────────────────────────────────────
  You got approximately the first 17% of the volume before the server deleted it. The good news is that the DAT/OPT index files are complete, so you have a full manifest of what should be there. This means:
  - You know exactly which documents are missing (folders 0092-0532)

I haven't looked into downloading the partials from archive.org yet to see if I have any useful files that archive.org doesn't have yet from dataset 9.

[–] Wild_Cow_5769@lemmy.world 2 points 14 hours ago

thats pretty cool...

Can you send me a DM of the 51? if i come across one and it isnt some sketchy porn i'll let u know

[–] GorillaCall@lemmy.world 1 points 14 hours ago

I have heard its 186gb

[–] BWint@lemmy.world 2 points 14 hours ago

The BBC is now reporting that "thousands" of documents have been removed because the DOJ improperly redacted information that can be used to identify the victims: https://www.bbc.com/news/articles/cn0k65pnxjxo

[–] Wild_Cow_5769@lemmy.world 3 points 18 hours ago

@wild_cow_5769:matrix.org If someone has a group working on finding the dataset.

There are billions of people on earth. Someone downloaded dataset 9 before the link was taken down. We just have to find them :)

[–] jandrew13@lemmy.world 5 points 21 hours ago (2 children)

Holy shit

The entire Court Records and FOIA page is completely gone too! Fuckers!

[–] jandrew13@lemmy.world 2 points 19 hours ago

Have a scraper running on web.archive.org pulling all previously posted Court-Records and FOIA (docs,audio,etc.) from Jan 30th

[–] Wild_Cow_5769@lemmy.world 1 points 19 hours ago

I told you…

We need dataset 9…

[–] Wild_Cow_5769@lemmy.world 3 points 19 hours ago (2 children)

Someone mentioned a matrix group. Can they DM and invite me. I want to help. Thx

[–] kutt@lemmy.world 1 points 8 hours ago

Count me in!

[–] jandrew13@lemmy.world 2 points 18 hours ago

same

[–] TavernerAqua@lemmy.world 1 points 17 hours ago* (last edited 17 hours ago) (2 children)

In regard to Dataset 9, it's currently being shared on Dread (forum).

I have no idea if it's legit or not, and Idc to find out after reading about what's in it from NYT.

[–] Wild_Cow_5769@lemmy.world 1 points 15 hours ago

where... I dont see it here https://dreadytognbh7m5nlmqsogzzlxjy75iuxkulewbhxcorupbqahact2yd.onion/

[–] jandrew13@lemmy.world 1 points 16 hours ago* (last edited 16 hours ago) (1 children)

this dude on pastebin posted his filetree in his epstein ubuntu env - i have a high confidence in whatever lives in his DataSet9Complete.zip file haha

[–] Wild_Cow_5769@lemmy.world 1 points 9 hours ago

No doubt. High confidence…. :)

[–] jandrew13@lemmy.world 4 points 23 hours ago (2 children)

While I feel hopeful that we will be able to reconstruct the archive and create some sort of baseline that can be put back out there, I also cant stop thinking about the "and then what" aspect here. We've see our elected officials do nothing with this info over and over again and I'm worried this is going to repeat itself.

I'm fully open to input on this, but I think having a group path forward is useful here. These are the things I believe we can do to move the needle.

Right Now:

Create a clean Data Archive for each of the known datasets (01-12). Something that is actually organized and accessible.
Create a working Archive Directory containing an "itemized" reference list (SQL DB?) the full Data Archive, with each document's listed as a row with certain metadata. Imagining a Github repo that we can all contribute to as we work. -- File number -- Dir. Location -- File type (image, legal record, flight log, email, video, etc.) -- File Status (Redacted bool, Missing bool, Flagged bool
Infill any MISSING records where possible.
Extract images away from .pdf format, Breakout the "Multi-File" pdfs, renaming images/docs by file number. (I made a quick script that does this reliably well.)
Determine which files were left as CSAM and "redact" them ourselves, removing any liability on our part.

What's Next: Once we have the Archive and Archive Directory. We can begin safely and confidently walking through the Directory as a group effort and fill in as many files/blanks as possible.

Identify and dedact all documents with garbage redactions, (remember the copy/paste DOJ blunders from December) & Identify poorly positioned redaction bars to uncover obfuscated names
LABELING! If we could start adding labels to each document in the form of tags that contain individuals, emails, locations, businesses - This would make it MUCH easier for people to "connect the dots"
Event Timeline... This will be hard, but if we can apply a timeline ID to each document, we can put the archive in order of events
Create some method for visualizing the timeline, searching, or making connection with labels.

We may not be detectives, legislators, or law men, but we are sleuth nerds, and the best thing we can do is get this data in a place that can allow others to push for justice and put an end to this crap once and for all. Its lofty, I know, but enough is enough. ...Thoughts?

[–] PeoplesElbow@lemmy.world 2 points 15 hours ago

We definitely need a crowdsourced method for going through all the files. I am currently building a solo cytoscape tool to try out making an affiliation graph, but expanding this to be a tool for a community, with authorization to just allow whitelisted individuals work on it, that's beyond my scope and I can't volunteer to make such an important tool, but I am happy to offer my help building it. I can convert my existing tool to a prototype if anyone wants to collaborate with me on it. I am an amateur, but I will spend all the Cursor Credits on this.

[–] Wild_Cow_5769@lemmy.world 3 points 22 hours ago* (last edited 22 hours ago) (1 children)

GFD….

My 2 cents. As a father of only daughters…

If we don’t weed out this sick behavior as a society we never will.

My thoughts are enough is enough.

Once the files are gone there is little to 0 chance they are ever public again….

You expect me to believe that a “oh shit we messed up” was accident?

It’s the perfect excuse… so no one looks at the files.

That’s my 2 cents.

[–] jandrew13@lemmy.world 1 points 1 hour ago

I've been thinking a lot about this whole thing. I don't want to be worried or fearful here - we have done nothing wrong! Anything we have archived was provided to us directly by them in the first place. There are whispers all over the internet, random torrents being passed around, conspiracies, etc., but what are we actually doing other than freaking ourselves out (myself at least) and going viral with an endless stream of "OMG LOOK AT THIS FILE" videos/posts.

I vote to remove any of the 'concerning' files and backfill with blank placeholder PDFS with justification, then collect everything we have so far, create file hashes, and put out a clean + stable archive on everything we have so far. a safe indexed archive We wipe away any concerns and can proceed methodically through blood trail of documents, resulting in an obvious and accessible collection of evidence. From there we can actually start organizing to create a tool that can be used to crowd source tagging, timestamping, and parsing the data. I'm a developer and am happy to offer my skillset.

Taking a step back - Its fun to do the "digital sleuth" thing for a while, but then what? We have the files..(mostly).. Great. We all have our own lives, jobs, and families, and taking actual time to dig into this and produce a real solution that can actually make a difference is a pretty big ask. That said, this feels like a moment where we finally can make an actual difference and I think its worth committing to. If any of you are interested in helping beyond archival, please lmk.

I just downloaded matrix, but I'm new to this, so I'm not sure how that all works. Happy to link up via discord, matrix, email, or whatever.

load more comments