230

AI models fed AI-generated data quickly spew nonsense (www.nature.com)

submitted 4 months ago by ArcticDagger@feddit.dk to c/science@lemmy.world

52 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] metaStatic@kbin.earth 33 points 4 months ago

we have to be very careful about what ends up in our training data

Don't worry, the big tech companies took a snapshot of the internet before it was poisoned so they can easily profit from LLMs without allowing competitors into the market. That's who "We" is right?

[-] WhatAmLemmy@lemmy.world 19 points 4 months ago* (last edited 4 months ago)

It's impossible for any of them to have taken a sufficient snapshot. A snapshot of all unique data on the clearnet would have probably been in the scale of hundreds to thousands of exabytes, which is (apparently) more storage than any single cloud provider.

That's forgetting the prohibitively expensive cost to process all that data for any single model.

The reality is that, like what we've done to the natural world, they're polluting and corrupting the internet without taking a sufficient snapshot — just like the natural world, everything that's lost is lost FOREVER... all in the name of short term profit!

[-] veganpizza69@lemmy.world 2 points 4 months ago

The retroactive enclosure of the digital commons.

this post was submitted on 26 Jul 2024

230 points (96.7% liked)

science

14885 readers

332 users here now

A community to post scientific articles, news, and civil discussion.

rule #1: be kind

<--- rules currently under construction, see current pinned post.

2024-11-11

founded 2 years ago

MODERATORS

m3t00@lemmy.world

Joleee@lemmy.world

laverabe@lemmy.world

DeadPand@midwest.social