Contributor Poker and Zig's AI Ban

69 points by kristoff

bakkot

I expected this to connect the first half of the essay (investing in contributors) to the second half (banning LLMs) more explicitly:

Feedback given to people submitting purely LLM-authored PRs is ~useless once that PR is done: you might improve that PR but it isn't going to make any difference for later contributions. The LLM isn't learning from the feedback and the person submitting the PR usually lacks the context (and frequently the desire) to internalize it for future work. To use the metaphor of the article, effort spent on feedback for new contributors is worth it when contributing is an iterated game, but because LLMs don't learn from feedback it turns it into a one-shot game, which changes the payoff drastically.

conartist6

Yep. It's sound logic, theirs and yours.

liberty

I appreciate the betting on human contributors. It seems like many places are shutting out people in favor of letting LLMs do the work instead. It's refreshing to see the opposite take place here. I'm confident this approach will provide more experienced maintainers and end up an excellent investment in the long-term.

ngrilly

I'm glad you published this post showing the reasoning for the ban. It will save people from getting the wrong idea.

square_usual

I guess this is in response to the post by Bun about their fork of Zig to speed up debug builds: https://xcancel.com/bunjavascript/status/2048427636414923250; they've said this was LLM-written and they can't upstream it per policy. My understanding from some of the people who've reviewed this code is that it is pretty bad and wouldn't make the cut to be upstreamed anyway 🤷‍♂️

andrewrk

It's not, but this is.
- fanf
  
  Oh wow, that reminds me of the caveats about parallel checking in TypeScript 7.0
mtlynch

So while one could in theory be a valid contributor that makes use of LLMs, from the perspective of contributor poker it’s simply irrational for us to bet on LLM users while there’s a huge pool of other contributors that don’t present this risk factor.

This reminds me of when companies think that they can get better candidates by making candidates jump through more hoops. "We have 1,000 applicants, so let's ask them to explain their high school grades to us, even if they graduated 20 years ago. The lazy candidates will give up, leaving only the best ones."

The developers least interested in jumping through hoops are the ones that make the most valuable contributions. If you tried to get Jeff Dean to collaborate with you and told him that he's not allowed to ever use LLMs, he'd happily go off and work on the 10,000 other projects that would gladly take a Jeff Dean, LLM-assisted or not.

It comes down to conflicting assumptions about where the top performers are. I believe the top performers are overwhelmingly using LLMs, and it sounds like the Zig team believes that's not true.

Unfortunately the reality of LLM-based contributions has been mostly negative for us, from an increase in background noise due to worthless drive-by PRs full of hallucinations (that wouldn’t even compile, let alone pass CI), to insane 10 thousand line long first time PRs.

This is a bad argument for banning LLMs. These PRs will continue to flow in regardless of what rules you put in place because the contributors generating these PRs don't read the rules. It's like making it illegal to own a ski mask because you notice that many criminals use ski masks to hide their faces.
- kristoff
  It comes down to conflicting assumptions about where the top performers are. I believe the top performers are overwhelmingly using LLMs, and it sounds like the Zig team believes that's not true.
  
  That's a legitimate opinion to hold, but our experience with triaging PRs pre-LLM ban does not match your assumption at all.
  
  These PRs will continue to flow in regardless of what rules you put in place because the contributors generating these PRs don't read the rules.
  
  That's a shortsighted argument. We don't expect the policy to automagically make LLM contributions disappear, but instead:
  
  users are warned upfront about the consequences of using LLMs, which is in general good practice whenever banning is involved (i.e. users who read the rules should not be surprised when their behavior leads to a ban).
  
  users who have good intentions and who like to use LLMs know to not waste time (and money!) on generating code that we don't intend to merge anyway.
  
  There is literally nothing to gain by not informing users of how we operate wrt LLM-assisted PRs.
  - mtlynch
    
    That's a legitimate opinion to hold, but our experience with triaging PRs pre-LLM ban does not match your assumption at all.
    
    I don't understand how you'd be able to tell.
    
    You'd be able to identify low-value contributors who use LLMs irresponsibly, but you can't identify someone who's both competent and uses LLMs effectively.
    
    For example, if Mitchell Hashimoto didn't disclose his LLM usage, would you have assumed he's a high-performing non-LLM contributor?
    
    These PRs will continue to flow in regardless of what rules you put in place because the contributors generating these PRs don't read the rules.
    
    That's a shortsighted argument. We don't expect the policy to automagically make LLM contributions disappear, but instead:
    
    users are warned upfront about the consequences of using LLMs, which is in general good practice whenever banning is involved (i.e. users who read the rules should not be surprised when their behavior leads to a ban).
    
    users who have good intentions and who like to use LLMs know to not waste time (and money!) on generating code that we don't intend to merge anyway.
    
    There is literally nothing to gain by not informing users of how we operate wrt LLM-assisted PRs.
    
    That's not the argument I'm making.
    
    I'm saying that the LLM ban filters out potential high-value contributors who use LLMs and does not effectively filter the behaviors you described (hallucinated APIs, hopelessly broken code). By banning LLMs, you lose high-value contributors who like LLMs, but you'll continue to receive negative value contributions from people who ignore contribution guidelines.
    
    I don't understand your mental model of a user who contributes a 10 KLOC PR that doesn't compile who would have not done it had you had a rule against LLMs. You already have rules against the behaviors you described and the PRs come anyway, so why is an extra rule going to stop them?
    
    krig
    
    Wouldn't duplicitousness (ie. a "high-performer" concealing their LLM usage) be a disqualifying character trait all on its own?
    
    mtlynch
    
    Prior to the LLM ban, theres's nothing duplicitous about using LLMs and not disclosing it. I don't disclose my whole hardware and software stack as part of every open source contribution I make .
    
    orib
    
    And after it, there is. If you use an LLM to contribute, your contributions aren't welcome. If you use an LLM to contribute and lie about it, you're an asshole. While I'm not involved in the zig community, so I can't speak for them, but if you were discovered, I imagine you would be unwelcome as well.
    
    For whatever reason, when people say "I don't want you to spit on my sandwich", nobody goes "but I could do it so you'd never find out!". When they say "I don't want you to spit in my code", everyone who isn't contributing seems to come out of the woodwork to volunteer to secretly spit in the code.
    
    ahelwer
    
    I don’t want to work with high-performing slop merchants, so I’m happy their talents are desired elsewhere. At least half the happiness I derive from open source comes from my interactions with the people I collaborate with, if not more.
    
    hmpc
    
    The developers least interested in jumping through hoops are the ones that make the most valuable contributions.
    
    This is probably true, but relies on a flawed analogy. The Zig team is not requiring anyone to jump through hoops: they're selecting from among all possible contributors the subset which they believe might be the most valuable to the health of the project in the long term. For you, perhaps because you place individual reputation above other qualities, those contributors would be the LLM-equipped Jeff Deans of the world; for Zig, clearly, they're the people who signal their willingness to put in their own time and effort for the good of the whole.
    
    It comes down to conflicting assumptions about where the top performers are. I believe the top performers are overwhelmingly using LLMs, and it sounds like the Zig team believes that's not true.
    
    You can believe that, of course, based on your own experiences and personal bubble. I, for the same reasons, believe the exact opposite. Your argument also implicitly relies on an assumption we have little evidence for but mounting evidence against: that "top performers", however you define them, remain top performers after starting to use LLMs. This is far from clear and, so far, the limited data we have (together with classic cognitive psychology findings; see "Ironies of Automation") suggests otherwise.
    
    yorickpeterse
    
    This reminds me of when companies think that they can get better candidates by making candidates jump through more hoops. "We have 1,000 applicants, so let's ask them to explain their high school grades to us, even if they graduated 20 years ago. The lazy candidates will give up, leaving only the best ones."
    
    On the /r/ProgrammingLanguages subreddit we've been dealing with a lot of AI slop in recent months. To combat that we tried a bunch of different things, the most recent one being that posts referring to GitHub are filtered (= basically hidden) automatically and the author is notified about needing to copy-paste a standard phrase into a comment. Essentially a dumbed down Reddit Turing test.
    
    On the surface that may seem pointless: surely the LLM users will just do that and lie about it? And indeed I've seen this sentiment (essentially "you can't stop LLM users") in other places.
    
    In reality this approach has thus far proven remarkably effective in spite of how stupid it is. Why? Because many LLM users are extremely lazy, so much in fact that just copy-pasting a phrase is apparently too much to ask.
    
    What I'm ultimately saying is is that while it may seem LLM users can't be stopped unless somebody comes up with a magical oracle, in reality you can filter out let's say 80% of the noise by applying a few rules that seem dumb at first but end up being surprisingly effective.
    
    jtolio
    
    I think this argument depends on your time horizon. Is the set of developers-who-make-the-most-valuable-contributions static? How did they become such developers? Surely Jeff Dean didn't get to where he is by having AI do all the work for him.
    
    We're in an interesting moment of time where we have a large pool of developers who definitely learned how to develop without AI, but that pool is going to shrink relative to the total over time. For the moment, maybe you're right (or maybe not) that the best developers are all using LLMs. But assuming our industry lasts beyond this moment, having at least some kind of policy that clearly invests in training seems smart.
    
    bakkot
    
    I believe the top performers are overwhelmingly using LLMs, and it sounds like the Zig team believes that's not true.
    
    This depends on what you mean by "top performer". I believe this is true in many projects, but only among contributors who already have deep familiarity with the codebase. I think this is almost always false for people who are not already deeply familiar with a codebase: a "top performing" new contributor is one who is making small, targeted improvements, and these are the kinds of changes for which having LLMs author code is the least helpful. The kinds of changes where LLMs have the largest positive impacts (such as big refactorings or changes which require lots of boilerplate) are precisely those which are least suited for new contributors.