The FSF considers large language models

22 points by runxiyu


dzwdz

[snip] asked whether the FSF is working on a new version of the GNU General Public License — a GPLv4 — that takes LLM-generated code into account.

Like, allows it? IANAL but I think LLMs trained on GPL code are in pretty clear violation of that license already - just as LLMs trained on MIT code, MirBSD code, etc.

All these licenses require attribution, which LLMs notably don't do, even if they output some training data verbatim (as the article itself acknowledges).

If the point isn't to allow LLM training, I don't really see what's there to gain by explicitly disallowing it. That just seems like a stance that the attribution clause isn't enough - whereas, if the FSF does care about this stuff, maybe they should just double down on this?

There is also, of course, the question of copyright infringements in code produced by LLMs, usually in the form of training data leaking into the model's output. Prompting an LLM for output ""in the style of"" some producer may be more likely to cause that to happen. [snip] suggested that LLM-generated code should be submitted with the prompt used to create it so that the potential for copyright infringement can be evaluated by others.

Key words: "more likely".

The example I've been using recently is this guy vibe coding an exact copy of someone's existing shader. I don't think the prompt had anything about it being "in the style of" anyone. Any time you generate and use a nontrivial amount of code with an LLM, there's a risk of plagiarizing an existing work. I suppose the question is how high of a risk do you accept - but for me the very obvious answer is "I don't accept any of this risk".

I think it's a bit weird for the FSF in particular to be taking a different stance here?

A member of the audience pointed out that the line between LLMs and assistive (accessibility) technology can be blurry, and that any outright ban of the former can end up blocking developers needing assistive technology, which nobody wants to do.

Oh hey, this shit again - no, it isn't, and the line is very clear. I'm yet to see any assistive technology just spit out a copyrighted work without explicitly being asked for it. This is not what the discussion is about.