GNU and the AI reimplementations

19 points by amontalenti

dzwdz

Moreover, this time the inbalance of force is in the right direction: big corporations always had the ability to spend obscene amounts of money in order to copy systems,

But now they can do it for a negligible amount of money. Whereas previously buying a commercial license for otherwise copylefted software was a reasonable choice to avoid spending said "obscene amounts of money", now we've killed that business model. The FOSS funding situation was already horrible pre-AI; yet maintainers gotta eat nonetheless.

Now, small groups of individuals can do the same to big companies' software systems:

I assume most maintainers would prefer to be paid in money, which can be exchanged for food and shelter, rather than getting paid in the ability to do "clean-room" rewrites of proprietary software.

dzwdz

by the way, I got clickbaited :( when I saw this on the IRC channel I thought this would be an actual response from the GNU project. oh well

st3fan

I think it is dangerous that people like Antirez, who we look up to, write these kind of articles without a clear disclaimer at the top "this is an opionion, i am not a copyright expert or lawyer".

Because, and this is a fact, most of this is opinions and wishful thinking about a topic with a largely unclear legal status right now. There is barely any jurisprudence to refer to. This story is also incomplete because copyright law is very complicated and it is not just copying, there is also derivative works and there is context and ownership and a whole thing that happens on the side of the LLM providers, etc. Nothing about this simple or remotely similar to what we did decades ago when GNU was born.

antirez

Hi st3fan. What you write is not how jurisprudence works. For things that never happened before, it's grey area, like, is it fair use or not to train an LLM on XYZ? But this is different. Unless a new law is made, the old copyright law applies perfectly when you create code with LLMs. Does it violates the copyright law? It is a problem. Otherwise it is not. If you invent a new gun the old laws still apply if you kill somebody: this is a trivialization, but that's how it works. The grey areas are for new things. Copyright laws perfectly describe if some code is in violation or not of some other code.
- jrwren
  
  We aren't lawyers, but I see no reason why one creative work, e.g. an image, is different from another creative work, e.g. code. SCOTUS declined to look at Thaler v. Perlmutter. LLM generated work is not copyrightable. Vibed code is in the public domain.
  - simonw
    
    That's a big claim you're making there. Have you seen any commentary from genuine legal experts that agrees with your claim there?
    
    Given that billions of dollars of software has been created by serious (brand name) companies using AI assisted programming tools over the past 24 months I would expect there to be way more credible commentary on this than I've seen so far.
  - quasi_qua_quasi
    
    A huge difference is that in Thaler v. Perlmutter, Thaler listed the AI itself as the work's sole author. He later tried to claim that he was the actual author because he made the AI, but the court said "no, that's not what you said on the application, you don't get to change your mind now".
- kingmob
  
  I think the broader point is still that you aren't a lawyer (most of us here aren't), and whether it's well-trodden territory or novel, our legal opinions are less informed and less useful because of that.
  
  Domain expertise doesn't generalize to other domains.
timthelion

Given how poor a track record actual lawyers have at predicting the outcomes of copyright cases, maybe this is just gatekeeping.

kornel

Rewriting proprietary software to make a copyleft version is good. Rewriting a copyleft version to bypass rights of users is bad.

The article's whole gotcha is based on misunderstanding of GNU. They don't care about copyright per se, they care about people having freedom to control their software. Licenses are merely a tool, and also one that evidently has stopped working.

jmtd

Tanenbaum protested about the architecture (in the famous exchange), not about copyright infringement. So, we could reasonably assume Tanenbaum considered rewrites fair

Only if you take as given Tanenbaum believed Linux to be a rewrite of minix. And I’m pretty sure he did not.

jrwren

No mention of Thaler v. Perlmutter misses the most important part of the story: vibe coded source code has no copyright in USA.

satvikberi

The Thaler vs. Perlmutter opinion is very readable, I encourage people to read the first few pages: https://media.cadc.uscourts.gov/opinions/docs/2025/03/23-5233.pdf

It addresses a much narrower claim: whether AI can be legally considered the sole author of a work of art for the purpose of copyright. It explicitly does not opine on whether Thaler would have been granted the copyright if he listed himself.
st3fan

I find it a bit difficult to point at that case because it is very different. It is a case where somone tried to copyright a completely new work of art (a picture). That copyright application was rejected by the copyright office. They basically said "AI cannot be an author under copyright law". (This is what Thaler tried to do - he tried to make his program (The creativeity machine) the owner of that copyright). And the Perlmutter in this case is the person representing the copyright office. Not the owner of the original work. AFAIK there is no original work for that specific case. (Their "Almost Paradise" visual artwork)

It is of course relevant for GenAI in general but I think that is where the similarities end. "Reimplementations of software" is about .. software. Where there is an original and a reimplementation. And a reimplementation is usually not a completely new original work - there may be, such as with chardet, API similarities for compatibility. Or pieces of code that 1:1 map to another work.

I'm sure Thaler v. Perlmutter is relevant in some way but we haven't had a real case about software yet .. so it is really unclear what would happen there.
- cultpony
  
  I think focusing on Thaler v Perlmutter is a bit of a false goal to hunt down, because ultimately the appelate courts and the SC both basically said the gudiance of the Copyright Office was correct. If we presume this extends to their entire guidance on AI works (found here: https://www.copyright.gov/ai/ai_policy_guidance.pdf ) then AI works are only copyrighted when the human behind the AI was the actual author of the work and used AI to give it form. If the AI is merely following an automatic process, there would be no copyright.
  
  So given if we simply tell an AI to implement a cleanroom (with two teams of AI for the two sides of a cleanroom), then it would be a purely mechanical process that forms no copyright basis.
Picnoir

Moreover, this time the inbalance of force is in the right direction: (...) Now, small groups of individuals can do the same to big companies' software systems: they can compete on ideas now that a synthetic workforce is cheaper for many.

Well, this is omitting the elephant in the room: no small group of individual created one of these horizon LLMs so far. Only big corporation having access to truckload of data and hardware are currently creating and operating those. These corporations have total control on these new tools. The bleeding edge programmers tools are not open source anymore.

If you were up trying to build one of those, let alone hardware and energy access, I'd bet you'd end up in jail long before collecting all this copyrighted corpus these big corporations collected.

I'd argue the power imbalance is now even worse now than before.
eminence32

The other thing that is relevant here, in my opinion, is the slow erosion in the popularity of the GPL in favor of less restrictive licenses like Apache/MIT. This is relevant because as tools are reimplemented (either the old-fashioned way or the new-fashioned way), they tend to not adopt the GPL. This is troubling for anyone who is a proponent of the GPL because the GPL is only relevant when there is a certain critical mass of GPL-licensed products.
- pointlessone
  
  Is GPL relevant if any software can be generated on the spot? The point of GPL is to democratise software, take away exclusivity from Big Corpos. Do we still need GPL if pretty much any software is can be made at any time? And also under permissive license. That is while not GPL, the code is still available if you want it for some reason, and you still can use it.
  - st3fan
    
    I think software, specifically source code, in general is becoming less relevant. It is now an option to roll your own instead of using open source or closed source or an proprietary OS-provided libary, business licensed software, etc. You can ask an AI to build you software either unique or following some specification.
    
    Crazy story as an example:
    
    I was looking at https://github.com/openai/symphony which is a project OpenAI did in Elixir. Their README literally says: "if you do not like our implementation in Elixir then run our 2100 line SPEC.md past your coding agent and ask it to implement this project in your preferred language"
    
    This just blows my mind. We've gone very rapidly to a situation where software can now be a natural language Specification that you feed into a program and as a result a program rolls out of it. Do I now own that? Can I put any license on that? Is that now 100% mine without worries? (IANAL but the answer is most likely yes?)
    
    Sofware as we know it is not dead yet but we you can see where things are heading.
  - eminence32
    
    I think the GPL is even more relevant, not less. The point of the GPL license is to protect the source code in such a way that users can continue to access it if the source code gets modified and distributed. In a world when AI code generators are prevalent and the barriers to making code changes plummets, I think a GPL proponent would be even more concerned that the code changes are accessible to everyone
    
    zetashift
    
    This was/is one of my first thoughts as well. I would like to use strong copyleft licenses more, because putting something out there for people to enjoy/experience and getting it ripped off from a LLM to serve to users for a subscription, without attribution, feels bad.
  - gerikson
    
    Do we still need GPL if pretty much any software is can be made at any time?
    
    It's easy to imagine a future where the purveyors of GenAI gate access to certain features behind higher prices.
    
    Generate a cute flyer for a birthday party? Free.
    
    Generate fan-art? $10/month for 100 pieces.
    
    Generate software? $200/month, because you need to pay it to keep up with the competition.
    
    Don't mistake the current all-you-can-eat buffet as anything other than VC-funded loss-leading to entrench GenAI in all parts of society, in expectation of collecting rent once they succeed.
- gerikson
  
  GPL licenses started losing mindshare long before LLMs appeared. There's always been a robust counternarrative (chiefly from the BSD camp) against the views of Stallman/FSF on how to best organize non-restrictive software licensing.
  
  I don't know the exact numbers, but I would wager that before the widespread adoption of LLMs, the ratio of non-GPL licenses on places like Github was maybe 80%. Of course, some projects are more "foundational" than others so just looking at raw project counts is misleading.
SamRW
I personally think that taking the code that somebody wrote, feeding it to Claude code, and asking it to create a rewrite of it for purposes of avoiding a copyleft license is tempting, but a bad idea.
- I think it's unlikely the code will function in the same way, i.e., won't be compatible in any reasonable way, and will have new bugs that are not present in the original implementation. If we are to base the coding performance based on existing vibe coded software performance this will have atrocious performance & security implications (e.g., How vulnerable are vibe-coded apps?)
- You will have to maintain this code yourself. Depending on what code you're trying to steal this will be a very large codebase with a lot of dependencies that themselves may have e.g., security updates. Maintaining vibe coded software is notoriously painful based on some people I know's experience.
- IANAL but I don't think you can grab an open source library, feed it to an LLM as a training set and grab whatever output is there and take it as clean code, legally. There's been plenty of examples of images that were AI generated that still had the watermark of the original author, for example. So the re-generated code may have copyrightable material within. You would need to carefully review it.
- This is not the same as grabbing the Java API and reimplementing it. Even cases like this are not black and white, such as Google LLC v. Oracle America, Inc took years of litigation. If I was running a business I'd rather not have this legal risk.