527
submitted 8 months ago by L4s@lemmy.world to c/technology@lemmy.world

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

you are viewing a single comment's thread
view the rest of the comments
[-] Exatron@lemmy.world 2 points 8 months ago

The problem is that a human doesn’t absorb exact copies of what it learns from, and fair use doesn't include taking entire works, shoving them in a box, and shaking it until something you want comes out.

[-] S410@lemmy.ml -1 points 8 months ago

Expect for all the cases when humans do exactly that.

A lot of learning is, really, little more than memorization: spelling of words, mathematical formulas, physical constants, etc. But, of course, those are pretty small, so they don't count?

Then there's things like sayings, which are entire phrases that only really work if they're repeated verbatim. You sure can deliver the same idea using different words, but it's not the same saying at that point.

To make a cover of a song, for example, you have to memorize the lyrics and melody of the original, exactly, to be able to re-create it. If you want to make that cover in the style of some other artist, you, obviously, have to learn their style: that is, analyze and memorize what makes that style unique. (e.g. C418 - Haggstrom, but it's composed by John Williams)

Sometimes the artists don't even realize they're doing exactly that, so we end up with with "subconscious plagiarism" cases, e.g. Bright Tunes Music v. Harrisongs Music.

Some people, like Stephen Wiltshire, are very good at memorizing and replicating certain things; way better than you, I, or even current machine learning systems. And for that they're praised.

[-] Exatron@lemmy.world 2 points 8 months ago

Except they literally don't. Human memory doesn't retain an exact copy of things. Very good isn't the same as exactly. And human beings can't grab everything they see and instantly use it.

[-] S410@lemmy.ml 0 points 8 months ago* (last edited 8 months ago)

Machine learning doesn't retain an exact copy either. Just how on earth do you think can a model trained on terabytes of data be only a few gigabytes in side, yet contain "exact copies" of everything? If "AI" could function as a compression algorithm, it'd definitely be used as one. But it can't, so it isn't.

Machine learning can definitely re-create certain things really closely, but to do it well, it generally requires a lot of repeats in the training set. Which, granted, is a big problem that exists right now, and which people are trying to solve. But even right now, if you want an "exact" re-creation of something, cherry picking is almost always necessary, since (unsurprisingly) ML systems have a tendency to create things that have not been seen before.

Here's an image from an article claiming that machine learning image generators plagiarize things.

However, if you take a second to look at the image, you'll see that the prompters literally ask for screencaps of specific movies with specific actors, etc. and even then the resulting images aren't one-to-one copies. It doesn't take long to spot differences, like different lighting, slightly different poses, different backgrounds, etc.

If you got ahold of a human artist specializing in photoreal drawings and asked them to re-create a specific part of a movie they've seen a couple dozen or hundred times, they'd most likely produce something remarkably similar in accuracy. Very similar to what machine learning images generators are capable of at the moment.

[-] PipedLinkBot@feddit.rocks 1 points 8 months ago

Here is an alternative Piped link(s):

C418 - Haggstrom, but it's composed by John Williams

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.

this post was submitted on 09 Jan 2024
527 points (98.2% liked)

Technology

58076 readers
3694 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS