On the use of LLM assistants for kernel development

12 points by gnyeki


dzwdz

I’ve clicked through to The Linux Foundation’s generative-AI guidance.

If any pre-existing copyrighted materials (including pre-existing open source code) authored or owned by third parties are included in the AI tool’s output, prior to contributing such output to the project, the Contributor should confirm […]

I would ask how the contributor is supposed to know that – do they just need to be aware of all open source code in existence? However, they do actually have an answer:

[…] some tools provide a feature that suppresses responses that are similar to third party materials in the AI tool’s output, or a feature that flags similarity between […] materials owned by third parties and the AI tool’s output and provides information about the licensing terms that apply to such third party materials.

This seems very naive. This will only detect the most blatant cases of plagiarism similarities. For the most part AI doesn’t output training data verbatim. It’s similar, but not an exact replica (see: that one IEEE article). Will this detect variable names being changed to fit the surrounding code? Coding conventions translated to whatever the context is? The output being in a different language than the original?

What if the AI plagiarizes multiple codebases at once, taking pieces from each? I think this is still very problematic. It pretends the problem is solved, while it very much is not (and might never be?)


On a completely unrelated note, have you seen the list of corporate members of The Linux Foundation? Meta and Microsoft are platinum members, Google is a gold member, and there are probably more AI companies that I’m missing.

This is a very obvious conflict of interest. Their guidance won’t ever paint AI tools in a negative light, such as acknowledging the risk of accidental plagiarism. They ever won’t say that The Linux Foundation doesn’t condone usage of AI tools.

I hope the kernel maintainers won’t fall for this. This feels like another attempt at manufacturing consent for LLMs.