Large-Scale Online Deanonymization with LLMs : privacy

[–] AmbitiousProcess@piefed.social 12 points 3 days ago* (last edited 3 days ago)

I mean, makes sense to me.

We already reveal a lot of small details in comments. I mentally note every time I've done it and I know for sure someone could probably get pretty close to finding at least the general area I live in by just my post history here, let alone any other social media accounts.

Even with all their other flaws, LLMs are fairly good data parsers. Specifically when it comes to taking unstructured data (e.g. "In SF we've got...") and turning it into structured data (e.g. city_of_residence: San Francisco), so it's not surprising you could use this to just build a dossier of someone's info and cross-match it with other databases.

Nothing humans couldn't do before, and nothing intelligence agencies and data brokers don't already have technology to do, but LLMs will make this a lot more accessible to anyone since it requires less specialization, custom text filters, stuff like that.

[–] HelloRoot@lemy.lol 9 points 4 days ago (1 children)

time to stop writing on social media

[–] Mac@mander.xyz 15 points 4 days ago (1 children)

location: "social media"
likes: "writing"
desires: "to stop"

Got your number, pal.

[–] HelloRoot@lemy.lol 2 points 3 days ago* (last edited 3 days ago) (1 children)

you honestly probably could get my number that way. It's public on the internet together with more words from me (for work).

I think it's time to delete my lemmy...

[–] Mac@mander.xyz 2 points 3 days ago

Few of us are truly anonymous.

[–] Zak@lemmy.world 2 points 3 days ago* (last edited 3 days ago)

Some years ago, I made a thing that could determine whether two different player characters in an online game with global chat were likely to be the same person by using a classification algorithm on their public chat. The popular text classification algorithms at the time didn't work very well for that use case, but I came up with one that did. It was fun and useful that my internet friends and I could know who we were dealing with when they thought they were being sneaky.

I read that DARPA was offering grants for exactly that kind of work, and thought up several ideas for commercializing the technology. Then I did exactly none of that because privacy is good and accelerating the availability of de-anonymization technology is bad.

[–] hector@lemmy.today 3 points 3 days ago

Researchers long ago showed how combining two or more sets of data allowed them to effortlessly de anonymize data with computers programs. That has been known for at least 2 decades, back when it was maddening and worrisome but not yet an existential threat to representative democracy and our wellbeing, before malign forces weaponized the unprecedented mass of data now collected, in ways we don't even know about yet.

Plenty of people with anonymous accounts on social media have been identified also by groups to punish them for comments, typically ones critical of Israel, supposedly identified as such, but it could be just as likely the US government leaked the information to them, or the social media company. But combining all the information you have let slip narrows it down considerably if a computer is used to sort all the information.

[+] hector@lemmy.today 1 points 3 days ago

[deleted]

[–] VibeSurgeon@piefed.social 1 points 4 days ago (1 children)

The only defense would be to post only a single comment from each username, so that putting multiple points of data together becomes impossible.

[–] solrize@lemmy.ml 5 points 3 days ago

Again 4chan was ahead of its time.

https://wakaba.c3.cx/shii/shiichan

[+] LemonyLickets@lemmy.world -16 points 4 days ago* (last edited 1 day ago) (1 children)

[removed by mod]

[–] CorrectAlias@piefed.blahaj.zone 5 points 4 days ago (2 children)

Why do you keep posting about this everywhere?

[–] justsomeguy@lemmy.world 5 points 3 days ago

Don't talk to the crypto advertisement bot. You'll scare it.

[–] hector@lemmy.today 1 points 3 days ago

What is it on about? I assumed it was related to the subject, is it spam instead?

Privacy