Building Pi with Pi

19 points by guillego

pyfisch

The author claims that only 17% of issues from non-approved individuals are reopened, and implies the rest are useless and/or slop. This doesn't capture the process used by badlogic to triage and work on the issues. (edit: According to published contribution rules you shouldn't submit a pull request, unless allow-listed. Other pull requests aren't reviewed. This is so say that these are submitted by humans or clankers who didn't read the rules. It should be expected that less than 10% of those are merged, since they aren't even looked at.)

Take the 5 issues I reported. None were reopened, yet they are the reason for two improvements to Pi.

#3469 is about missing documentation for one feature of the extension interface. Fixed on the same day.
#3459 is about broken Markdown rendering for exported sessions. Reply by badlogic "there's no fix for this". I know a fix, so I open a new issue (the maintainer encourages this, as he only reads each issue once). Issue #3484 describes a fix, which badlogic uses the same day to fix it in main.
#3567 is a feature request for a llama.cpp provider, discussion continues in another issue.
#3464 is a report for a minor bug/surprising behavior, no response.

By my count 60% of the issues I opened directly resulted in a change to Pi. If you follow the issue tracker you will notice many issues with an "inprogress" label, they were addressed yet they don't show up in the statistic of reopened issues.

In my opinion Armin Ronacher undersells the contributions of the community to Pi. Mario Zechner (badlogic) prefers issues over pull requests, because using a clanker to write code based on a good issue is easier than reviewing contributed changes. This is one reason bugs are fixed in days (which is amazing), but the contributions of the community show up less in the git history than in other open source projects. I find it a bit sad that Earendil chooses to characterize the community reported issues as primarily slop and a problem to be managed, while missing a large segment of issues moving the project forward.

mediremi

Your link uses @me for author - here's a fixed link that uses your username: https://github.com/earendil-works/pi/issues?q=is%3Aissue%20state%3Aclosed%20author%3A%40pyfisch
mitsuhiko

Armin here

The author claims that only 17% of issues from non-approved individuals are reopened, and implies the rest are useless and/or slop.

I intentionally did not draw a line from the reopen rate to slop because of the process being the one it is. For instance one of the consequence of auto closing issues is now also that there are sometimes duplicates because people do not look for closed issues. Likewise, there are some issues that are being addressed despite never being opened because they are papercuts. I did not assume that number to be particularly high, but I will re-run my script and update the tally.

I find it a bit sad that Earendil chooses to characterize the community reported issues as primarily slop and a problem to be managed, while missing a large segment of issues moving the project forward.

I don't think we have made any particular choice here. We're spending an enormous amount of time reproducing and dealing with all kinds of issues at the moment and the volume is hard to work with, in parts because of the quality of the reports and pull requests, and that post is specifically scoped to working with that inflow of issues.

but the contributions of the community show up less in the git history than in other open source projects

I don't think that is true, at least I don't think Pi is particularly out of the ordinary here.

//EDIT: I re-run the analysis. With we count issues acted on that were not re-opened the number raises from 17% to 25%.
- pyfisch
  
  I intentionally did not draw a line from the reopen rate to slop because of the process being the one it is.
  
  Thanks for clarifying that this is not intentional. I understood it as implied because after the reopening rate is stated the next sentence is "Many of the issues and PRs are complete slop and in some cases the humans did not even realize that they created them."
  
  I re-run the analysis. With we count issues acted on that were not re-opened the number raises from 17% to 25%.
  
  You mentioned duplicate issues before, for a long time Pi had an OSS-weekend/refactoring where issues were auto-closed and and automated message was added telling people to create a new (duplicate) issue after the weekend. Should these issues be counted? Ideally users wouldn't send issues during the weekend and ideally issue creation was blocked. This likely doesn't have a major impact, but illustrates how hard this is to measure. I don't know if 25% is a good percentage for issues acted-on for an open-source project.
  
  the volume is hard to work with, in parts because of the quality of the reports and pull requests
  
  I don't understand the part about pull requests specifically: according to the contribution rules they aren't looked at unless pre-approved. If the closed PRs bother you GitHub allows restricting them. Badlogic complained in the past that this didn't work for personal repos the way he wanted, now the pi repo is part of an organization.
  
  That said Pi is an amazing piece of software and your work at Earendil reproducing and working on issues is very much appreciated, it is exemplary for an open source project.
  - mitsuhiko
    
    I don't understand the part about pull requests specifically: according to the contribution rules they aren't looked at unless pre-approved.
    
    We make no guarantee that we're going to look at a PR, but I can tell you from personal experience that if there is a legitimate issue and someone sent a PR up, I might still look at it.
    
    It's complicated and I wish we had a better solution. However I'm a huge believer into applying back pressure and right now that's the best thing on the table. The restricting of pull requests is not dramatically different from closing them, except it has the potential disadvantage that clankers then just go an make issues instead.
- mtlynch
  
  Take the 5 issues I reported.
  
  Heads up: Your list only includes 4 issues and your link points to 0 issues.
  - pyfisch
    
    Thanks. Fixed the link. There are 4 bullet points but 5 issues mentioned.
- joshka
  
  We use a custom slash command called /is, which specifically has this instruction in it:
  
  Do not trust analysis written in the issue. Independently verify behavior and derive your own analysis from the code and execution path.
  
  Unfortunately, it does not fully work, because when humans first throw their issue through the clanker wringer, their clanker expands scope almost immediately.
  
  https://github.com/earendil-works/pi/blob/fc8a1559017f1e581cfa971aa3cef11a507a4975/.pi/prompts/is.md?plain=1#L14
  
  Ignore any root cause analysis in the issue (likely wrong)
  
  https://github.com/earendil-works/pi/blob/fc8a1559017f1e581cfa971aa3cef11a507a4975/.pi/prompts/is.md?plain=1#L20
  
  Do not trust implementation proposals in the issue without verification
  
  I think this is partly a context-shaping problem: by the time /is runs, the context is already poisoned. Telling the model to ignore the issue’s analysis feels a bit like telling it not to think about pink elephants. A better approach might be two-step: first force the clanker to rewrite the issue into only observed, expected, actual, and exact repro/logs, then start a fresh investigation from that reduced prompt.
  
  More broadly, it seems like the current AGENTS.md and GitHub issue templates are not specific enough to reliably force clankers into that shape. If clanker-generated issue slop is a known failure mode, those are the obvious choke points for pushing back on it: require a strict facts-vs-hypotheses split, ban root-cause guesses in the initial report, and tell the clanker exactly what a good issue must contain. Right now the project documents that clankers are a problem and already has the means to instruct them better, but does not seem to have fully taken that step yet.
  
  The invariants point feels related. If clankers keep making the code more locally flexible in ways that violate global design constraints, that usually means those invariants and their rationale are not encoded locally enough in the code, docs, and types. Looking at Pi’s session code, the compaction side does explain some of its semantic rules locally, but the active session runtime is much less clear about the intended strictness policy, and in a few places the local comments, behavior, and tests actually make permissiveness look intentional.
  
  Historically, the clankers I have used have also been quite bad at writing and preserving the sort of local documentation and commentary that would help with this, which means they keep having to re-derive context from code alone. A lot of writing software that remains maintainable under clanker-driven changes is taking principles that currently live in maintainers’ heads and making them visible where local edits happen, ideally in forms that are hard to violate rather than just prose.