44
submitted 6 days ago* (last edited 6 days ago) by Nexy to c/fuck_ai@lemmy.world

Bluesky may have said it won't use user data to train generative AI, but someone else just published a dataset of million Bluesky posts for "machine learning research". Already very popular dataset, your data may be scraped

Without paywall

you are viewing a single comment's thread
view the rest of the comments
[-] ladicius@lemmy.world 2 points 6 days ago

Is that a problem for a proper scraper? Give the machine a list of domains and some hints about the relevant protocols, and then the computer runs until the end of the list.

this post was submitted on 27 Nov 2024
44 points (95.8% liked)

Fuck AI

1346 readers
375 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 8 months ago
MODERATORS