this post was submitted on 05 Feb 2026
521 points (98.9% liked)

Technology

80632 readers
3514 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

A new tool searches your LinkedIn connections for people who are mentioned in the Epstein files, just in case you don’t, understandably, want anything to do with them on the already deranged social network.

404 Media tested the tool, called EpsteIn—as in, a mash up of Epstein and LinkedIn—and it appears to work.

“I found myself wondering whether anyone had mapped Epstein's network in the style of LinkedIn—how many people are 1st/2nd/3rd degree connections of Jeffrey Epstein?” Christopher Finke, the creator of the tool, told 404 Media in an email. “Smarter programmers than me have already built tools to visualize that, but I couldn't find anything that would show the overlap between my network and his.”

you are viewing a single comment's thread
view the rest of the comments
[–] Sterile_Technique@lemmy.world 4 points 11 hours ago (2 children)

Is there a tool that crunches the entirety of the documents and sorts the individual words by frequency? For example, doing it the stupid way (semi-manually) I copied OP's article into Word and replaced every space with a page break to turn the entire article into a one-word-per-line list, then plugged that into Excel and sorted alphabetically, then manually counted and deleted the repeats. Then sorted those to put the most frequent on top.

This reduced the 525 word article down to a list of 284 individual words. If I added another article to this list, the number of entries would only be increased by the number of words in the 2nd article that didn't appear in the first one, so basically as more and more articles are added, the number of unique additions from each would be fewer and fewer. Do this to a thousands-of-pages of documents like the Epstein files, and you could instantly condense like dozens of pages worth of just the word "the" down to a single entry, making the entirety of the documents much easier to skim for highlights... like, if the word 'velociraptor' was just randomly hidden in the article, most readers would probably skim right passed it; but in the list below it would stand out like a sore thumb, prompting a targeted search in the full document for context. Especially if we could flag words as not interesting, and like click to knock "the" "of" "and" etc off the list.

...maybe a project for someone who actually knows what they're doing... my skills hit a brick wall after things like 'find and replace' in Word, but you get the gist.

Word used: # found:
The 37
Of 16
And 14
To 14
Epstein 11
In 11
Tool 9
A 8
I 8
Files 7
But 5
For 5
Is 5
Linkedin 5
Many 5
On 5
That 5
With 5
404 4
Also 4
An 4
Connections 4
Found 4
Media 4
Not 4
People 4
All 3
Anything 3
Are 3
As 3
Him 3
It 3
My 3
Network 3
Them 3
Were 3
Who 3
Already 2
Appears 2
Case 2
Common 2
Con 2
Def 2
Documents 2
DOJ 2
Dump 2
Each 2
Excerpts 2
Find 2
Finke 2
Founder 2
From 2
How 2
Jeffrey 2
Me 2
Mentioned 2
Moss 2
Name 2
Names 2
Obviously 2
Other 2
Overlap 2
Page 2
Positives 2
Repository 2
Said 2
Search 2
Their 2
This 2
Up 2
Vincenzo 2
Work 2
Your 2
5 1
22 1
35 1
1st 1
2nd 1
3rd 1
Acknowledges 1
Across 1
Adam 1
Added 1
After 1
Although 1
Anyone 1
Api 1
Appearance 1
Approached 1
Attended 1
Audio 1
Away 1
Badges 1
Based 1
Be 1
Because 1
Behind 1
Between 1
Brin 1
Built 1
Called 1
Can 1
Chose 1
Christopher 1
Company 1
Conference 1
Contained 1
Contains 1
Context 1
Could 1
Couldn't 1
Court 1
Covered 1
Co-Worker 1
Creator 1
Days 1
Deep 1
Degree 1
Department 1
Deranged 1
Did 1
Didn’t 1
Do 1
Document 1
Does 1
Don’t 1
Down 1
Duggan 1
Easily 1
Elites 1
Email 1
Epstein's 1
Far 1
First 1
Free 1
Fully 1
Ghislaine 1
Girls 1
Github 1
Gut 1
Hacker 1
Hacking 1
Had 1
Have 1
He 1
His 1
Hits 1
Images 1
Incidental 1
Included 1
Inclusion 1
Initial 1
Introduce 1
Investigations 1
Involvement 1
Iozzo 1
Jeff 1
Just 1
Justice’s 1
Keep 1
Know 1
Known 1
Larry 1
Last 1
Likely 1
Links 1
Lot 1
Made 1
Make 1
Mapped 1
Mash 1
Massive 1
Matching 1
Material 1
Maxwell 1
May 1
Mean 1
Mention 1
Mentions 1
Mentions 1
Mentions 1
Million 1
Moss’s 1
Multiple 1
Musk’s 1
Myself 1
Necessarily 1
Nefarious 1
Never 1
New 1
No 1
Nude 1
Number 1
Off 1
Offered 1
Only 1
Or 1
Original 1
Others 1
Output 1
Pages 1
Paid 1
Patrick 1
Peter 1
Photos 1
Pointed 1
Position 1
Post 1
Previous 1
Produce 1
Programmers 1
Publicly 1
Published 1
Purposefully 1
Reads 1
Realize 1
Recordings 1
Reddit 1
Related 1
Released 1
Relevance 1
Report 1
Reported 1
Result’s 1
Review 1
S 1
Saw 1
Scenes 1
Searched 1
Searches 1
Sergey 1
Show 1
Shows 1
Smarter 1
Social 1
Some 1
Stay 1
Stuff 1
Style 1
Suppose 1
Surprising 1
Taking 1
Tech 1
Tested 1
Than 1
Thankfully 1
There 1
These 1
Thiel 1
Those 1
Told 1
Tools 1
Total 1
Touch 1
Tried 1
Trusting 1
Understandably 1
Unredacted 1
Upload 1
Verify 1
Very 1
Videos 1
Visualize 1
Want 1
Warn 1
Way 1
We 1
Wealth 1
Website 1
Week 1
Well 1
Went 1
Where 1
Whether 1
Wikipedia 1
Wild 1
Wired 1
Women 1
Wondering 1
Would 1
Wrote 1
You 1
Zero 1
[–] thanks_shakey_snake@lemmy.ca 2 points 6 hours ago

Seriously, if you're motivated enough to do this, you should give programming a try. Python or Ruby or Javascript are ideal for this kind of thing, and you can solve problems like this in a few lines of code... just look up "word frequency in Python" or whatever language for examples.

If you want to see what the next level of this kind of analysis looks like, watch a few videos about how Elasticsearch works... not so much so you can USE Elasticsearch (although you can, it's free), but just to get a sense of how they approach problems like this: Like imagine instead of just counting word occurrences, you kept track of WHERE in the text the word was. You could still count the number of occurrences, but also find surrounding text and do a bunch of other interesting things too.

[–] mlg@lemmy.world 4 points 10 hours ago

There's probably a nice shell multiline command that does what you want lol. cat + awk unique count + sort

I'm just forgetting is there's an easy way to keep the line numbers or filename so you can easily go back to the full page reference.