this post was submitted on 02 Jul 2025

174 points (98.3% liked)

Fuck AI

3449 readers

1623 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 1 year ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

themaninblack@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

ogmios@sh.itjust.works

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

174

Anthropic can legally train AI on books without authors permission, judge rules (cybernews.com)

submitted 1 week ago by Noerknhar@feddit.org to c/fuck_ai@lemmy.world

38 comments fedilink hide all child comments

cross-posted from: https://lemmy.world/post/32347293

you are viewing a single comment's thread
view the rest of the comments

[–] altkey@lemmy.dbzer0.com 4 points 1 week ago (5 children)

Your comment made me think of the LLM piping this way (as if it could've started legal):

Shit goes in: sourcing material should be treated not like for a personal, but for a commercial use over some volume by default. It's clearly differentiated in licenses, pricing, fees, etc.
Shit goes out: the strictiest license of all dataset is applied to how the output can be used. If we can't discern if X was in the mix, we can't say it wasn't, and therefore assume it's there.

To claim X is not in the dataset, the LLM's owner's dataset should be open unless parts of it are specifically closed by contract obligations with the dataminer\broker. Both open and closed parts with the same parameters should produce the same hash sums of datasets and the resulting weights as in the process of learning itself. If open parts don't contain said piece of work, the responsibility is on data providers, thus closed parts get inspected by an unaffilated party and the owner of LLM. Brokers there are interested in showing it's not on them, and there should be a safeguard against swiftly deleting the evidence - thus the initial trade deal is fixed by some hash once again.

Broker with someone's pirated work can't knowingly sell the same dataset unless problematic pieces are deleted. The resulting model can continue learning on additional material, but then a complete relearning should be done on new, updated datasets, otherwise it's a crime.

Failure to provide hashes or other possible signatures verifying datasets are the same, shifts the blame onto LLM's owner. Producing and sharing them in the open and observable manner, having more of their data pool public grants one a right to make it a business and shields from possible lawsuits.

Data brokers may not disclose their datasets to public, but all direct for-profit piracy charges are on them, not the LLM owner, if the latter didn't obtain said content themselves but purchased it from other party.

It got longer than I thought.

[–] HobbitFoot@thelemmy.club 3 points 1 week ago (2 children)

Except that some derivative works are allowed by humans under current copyright law. This has been degraded to the point where reaction videos have some defense as a derivative work.

If a reaction video is a derivative work, why can't an AI trained on that work also count?

[–] ParadoxSeahorse@lemmy.world 2 points 1 week ago (1 children)

“Derivative” is less questionable than “work”.

For eg. AI Gen imagery is not copyrightable for the most part, legally closer to plagiarism than art?

[–] HobbitFoot@thelemmy.club 1 points 1 week ago

Derivative describes what happened to the copyrighted work, not what slop was churned out by it.

If the plagiarism is far enough from the original work, it isn't protected by the original copyright.

load more comments (2 replies)