14
submitted 1 year ago by possum@lemmy.ml to c/lemmy@lemmy.ml

Since Reddit content being used to train AI was part of what triggered their Dumb Actions™️, is there a way to deal with this on Lemmy? If there's a way to license API access or the content itself under, say, LGPL to prevent commercial AI from using it that would be awesome. With the way ActivityPub works I'm not sure if that's possible though.

top 6 comments
sorted by: hot top controversial new old
[-] chobeat@lemmy.ml 10 points 1 year ago

Right now the whole model of generative AI and in general LLM is built on the assumption that training a machine learning model is not a problem for licenses, copyright and whatever. Obviously this is bringing to huge legal battles and before their outcome is clear and a new legal pratice or specific regulations are established in EU and USA, there's no point discussing licenses.

Also licenses don't prevent anything, they are not magic. If small or big AI companies feel safe in violating these laws or just profit enough to pay fines, they will keep doing it. It's the same with FOSS licenses: most small companies violate licenses and unless you have whistleblowers, you never find out. Even then, the legal path is very long. Only big corporate scared of humongous lawsuits really care about it, but small startups? Small consultancies? They don't care. Licenses are just a sign that says "STOP! Or go on, I'm a license, not a cop"

[-] nachtigall@feddit.de 6 points 1 year ago

That is actually a quite interesting question. What is the license of the content posted to Lemmy? Would it be legal to share posts? Or use code posted here in proprietary projects? Do people retain full copyright, thus make sharing illegal? Can an instance in its legal terms define a standard license for content (like stackoverflow does)?

Finally, who would enforce the license?

Also, I don't think people that scrape training data care about all of this.

[-] simple@kbin.social 4 points 1 year ago

The licensing doesn't matter, most AI are trained off proprietary and copyrighted data. There's still a lot of talks in governments about whether this is legal or not, but at this point the cat's out of the bag and I doubt we'll regress back to using smaller amounts of data.

[-] gredo@lemmy.world 3 points 1 year ago

In Europe they are currently trying to publish a law that sources have to be given by AI if the result is based on proprietary source material. See https://www.reuters.com/technology/eu-lawmakers-committee-reaches-deal-artificial-intelligence-act-2023-04-27/

[-] simple@kbin.social 2 points 1 year ago* (last edited 1 year ago)

Aside from the fact that I don't think this law will pass, I doubt it'll be effective at all. Companies will just move AI training to countries where it is legal. The most the EU can do right now is play whack-a-mole and start blocking AIs that don't meet its requirements, but at that point people will just host mirrors or use a VPN. It's just not enforceable, and the EU knows that, which is why they're so stressed out trying to figure out a reasonable law regarding AI.

[-] gredo@lemmy.world 2 points 1 year ago

Yeah I think so too.

load more comments
view more: next ›
this post was submitted on 10 Jun 2023
14 points (100.0% liked)

Lemmy

11948 readers
24 users here now

Everything about Lemmy; bugs, gripes, praises, and advocacy.

For discussion about the lemmy.ml instance, go to !meta@lemmy.ml.

founded 4 years ago
MODERATORS