44
Can coding agents relicense open source through a “clean room” implementation of code?
(simonwillison.net)
All about open source! Feel free to ask questions, and share news, and interesting stuff!
Community icon from opensource.org, but we are not affiliated with them.
In theory: Yes, future works are not yet part of the training data.
In practice: It takes months or years for an open source project (or any new technology) to take off and be considered valuable.
The other argument relies on said tech organization doing the right thing, and spending resources on training their own model (years and 100+ million) instead of including the cost of the lawsuit and pending fine in their cost/benefit analysis. I'm not aware that any such tech organization (with the means) exists.
Again, while this may be currently true for the most part, this is not considering the future evolution of technology. Models are only going to continue getting cheaper to produce. While it is possible that it is prohibitively expensive today (and I'm not convinced that that's the case universally) that will not be the case in the future as model training is essentially guaranteed to get dramatically cheaper in the coming years due to hardware advancements. Burying our heads in the sand now isn't going to help anything.