1041
Make illegally trained LLMs public domain as punishment
(www.theregister.com)
This is a most excellent place for technology news and articles.
It won't really do anything though. The model itself is whatever. The training tools, data and resulting generations of weights are where the meat is. Unless you can prove they are using unlicensed data from those three pieces, open sourcing it is kind of moot.
What we need is legislation to stop it from happening in perpetuity. Maybe just ONE civil case win to make them think twice about training on unlicensed data, but they'll drag that out for years until people go broke fighting, or stop giving a shit.
They pulled a very public and out in the open data heist and got away with it. Stopping it from continuously happening is the only way to win here.
But wouldn't that mean making it open source, then it not functioning properly without the data while open, would prove that it is using a huge amount of unlicensed data?
Probably not "burden of proof in a court of law" prove though.
in civil matters, the burden of proof is actually usually just preponderance of evidence and not beyond a reasonable doubt. in other words to win a lawsuit, you only need to have more compelling evidence than the other person.
But you still have to have EVIDENCE. Not derivative evidence. The output of a model could be argued to be hearsay because it's not direct evidence of originating content, it's derivative.
You'd have to have somebody backtrack generations of model data to even find snippets of something that defines copyright material, or a human actually saying "Yes, we definitely trained on unlicensed data".
so like I am not making any comment on anything but the legal system here. but it’s absolutely the case that you can win a lawsuit on purely circumstantial evidence if the defense is unable to produce a compelling alternative set of circumstances which can lead to the same outcome.