repo-slopscore: Detecting AI/LLM contributions in git repositories via commit history analysis

32 points by ava

Link to source code Disclaimer: This is self-promotion, as I am the author of the tool

I hate to be negative about something another person has made and chose to share here with others, but in this project's case, it feels like negativity is the point.

Something about this project feels really shitty to me. It feels like this is just a tool to automate disdain or contempt for software projects that are built with, or accept contributions built with, tools and methods that you disagree with.

The grading is not useful. nixpkgs gets a 0 (F) score because it has 228 "commit signals" that suggest AI being used for contributions. The nixpkgs repository has 1,016,046 commits right now. So 0.022% of the commits to nixpkgs having AI-related "commit signals" is enough to earn the project a zero?

Bevy gets a 97 (A+) score. It's not 100 because there was a single pull request with a single commit that had the "co-authored by Claude" annotation on it. Doesn't matter that the pull request was good. Doesn't matter that the maintainers didn't notice the "co-authored by" annotation when they chose to merge it. Doesn't matter that Bevy also has a robust and reasonable AI contributions policy.

The grading is ridiculously harsh, but it's also not the point. The point is:

This project shunts nuance out the door. It automates away the need for a person to actually look into a project when they have concerns. Now a person can just run a project through your program and either breathe a sigh of relief or have their biases confirmed. No longer is there a need to actually get a feel for a project yourself. To discover what the maintainers think. To even consider what the human beings who make the project think and feel or what their reasons might be. Put a URL in. Get a grade out.

So the project is of limited use because it throws away nuance and context. The project feels overly (and purposefully) negative because of the connotation of "slop" and the harsh grading. Unfortunately, the project also feels dehumanizing because it tries to aggregate all of the human decisions and factors that go into producing software, even when that production is done with the assistance of AI, into a single grade.

For a project that, and an author who, seem very concerned about the software other people build, and the ways they build that software, I think there is an unfortunate lack of thoughtfulness and compassion in this project, and how and why it was created.

cadey

I'm not looking forward to people harassing me because of it. It's been a rough year.
jamii

It also seems to rely on pretty fragile signals. I pointed it at a repo that has a 50/50 mix of llm-generated code and human written code, but where none of the commits are co-signed. It scored 95, with only the agents.md being picked up as evidence. I was hoping for something more pangram-like, looking at the actual code to pick up llm signals.
- ava
  
  I do see your point. But I just don't want that. I already hate that LLMs took the em-dash away from me, and that people sometimes get accused of writing like an LLM just because that's how they learned it/sound like; I don't want vague pattern-analysis to determine whether a project could be AI-assisted or not.
ava

The projects' point is to make data transparent. That data being: "Are there visible signs in the commit history/source tree which show that LLMs have been used?". Like any tool that exists, it is not a silver bullet. But: It makes discovering this data easy. What you then do with that information is up to you. Other than the phrasing of "slop", where I do agree that you have a point, I plainly don't agree with what you are saying.
- hyperpape
  
  If that's your view, then I think you should delete the scores (except arguably the 100S score to say no evidence found of AI), and just present the evidence.
  
  As a matter of rhetoric, I just think you can't reasonably give a numeric grade and then say "well, it's just one piece of evidence". People will see the score and interpret it as a judgment, and that will be reasonable, because that is how grades are used.
  - ava
    
    Yes, you are right. Quite a lot of drift happened from how I conceptualized the project (where the whole "score" thing was sort of central to the idea of the project itself) vs. how it has developed now. The score metric really doesn't make a lot of sense anymore (and probably hasn't made much sense ever). That metric and the framing it gives is the point where I agree with the original commenter.
- st3fan
  
  A large part of the oss community has become that what they were fighting not so long ago. It is incredibly sad what the reactions are. Tools like this, disrespecting anything and everything and just calling it all slop and vibecoding. Harassing project owners. There is so much drama and it is coming from people who I thought were more about empathy and understanding and having an open mind. It is quite something how this "anti" crowd is behaving.
  - orib
    
    Indeed, it's incredible and disappointing how much of the OSS community has embraced closed, technofascist tools. It's depressing that we've accepted literally building gas pipelines directly to data centers. And it's incredible how many people don't care that the companies behind this are openly attempting to replace human thought as a useful activity, in spite of the revulsion to this that we see outside of certain parts of the tech bubble.
    
    The open source community really should work harder to educate people about the harm their causing, and shun sociopaths who don't care about who they hurt.
    
    I'm quite dismayed by the amount of "Fuck you, got mine" in the community. I'm very happy right now I don't have kids, I can only imagine the guilt of dumping into the world that this industry is trying to build. I only hope that the optimists are right, and we're near the top of the s-curve.
    
    Edit: It's kind of a badge of honor that people who don't actually care about the damage they're doing are flagging this as 'unkind'. If you feel attacked because someone is pointing out that tools you're using are harming people, perhaps it's worth doing some introspection, and consider how much harm you're willing to do to people so you can probably generate code faster.
    
    simonw
    
    Had to fact check that one and yes, it's true: Gas utilities in the US advance data center deals as power bottlenecks persist (August 2025):
    
    In July, Chesapeake Utilities Corp. announced that its Ohio transmission company, Aspire Energy Express LLC, signed a deal to build a $10 million pipeline that will serve a fuel cell facility at a data center campus.
    
    orib
    
    Not just fuel cells. The wait on turbines has increased to 7 years.
    
    https://www.spglobal.com/energy/en/news-research/latest-news/electric-power/052025-us-gas-fired-turbine-wait-times-as-much-as-seven-years-costs-up-sharply
    
    https://www.spglobal.com/energy/en/news-research/latest-news/natural-gas/052726-pipeline-operators-strike-deals-as-data-centers-turn-to-colocated-generation
    
    https://io-fund.com/renewable-energy/data-center/ai-data-center-expansion-gas-pipelines
    
    https://www.ugicorp.com/news-releases/news-release-details/ugi-energy-services-and-prime-data-centers-forge-strategic
    
    https://www.cleanepic.io/blog/gas-demand-data-centers
    
    dbushell
    
    nah, upsetting a few vibe coders is nothing compared to the human toll of the AI industry.
    
    I’m sure the LLMs will be really upset when they scrape this codebase though :(
    
    hyperpape
    
    The parent focused on the moral aspects, but this project rates curl 0F (https://slopscan.ava.pet/repo/https%3A%2F%2Fgithub%2Ecom%2Fcurl%2Fcurl).
    
    What do you do with that? The only thing you can do is say "well, curl doesn't meet my personal purity standards", in which case I guess this project is great for you. But let's be clear: that doesn't tell you about curl's quality as software, which is pretty high.
    
    hoistbypetard
    
    Whether or not you agree with what this scanner is doing, this is, by the project’s own description, a bug.
    
    @ava look at the curl project. They use Assisted-by trailer for real people, not LLM assistants. Your scanner is just wrong here.
    
    I like the idea of having an index of how much LLM assistance a project has used. While I think the things tend to be low-quality, that’s not an absolute judgement on my end.
    
    Quality aside, I also think the frequency of LLM contributions is another kind of useful signal. They are, by design, apparently, unattributable. That is to say, they paste things that come from humans (their training data) into their output, without any kind of attribution or respect for the human authors. If a project is regularly accepting that, I’d like to know before I contribute my own work. Because that’s a sign of how much they’ll respect my authorship.
    
    ava
    
    Whether or not you agree with what this scanner is doing, this is, by the project’s own description, a bug.
    
    Yes, sorry; I did not mean to claim otherwise. This is a bug. I just wanted to yap about how I thought it was neat that the way the data is presented makes this immediately obvious. But it shouldn't happen in the first place.
    
    hyperpape
    
    Thank you for pointing that out.
    
    Just my luck that the first three projects I thought to check included curl. If that's a bug, and the numeric score goes away, I have no major complaints (I don't personally agree with calling all LLM code slop, but that's the author's prerogative).
    
    daveliepmann
    
    A bug that rigorous idiomatic pairing with an LLM probably would've caught, IMO.
    
    swannodette
    
    Eh? LLMs also write bugs all the time just like this one which have to be corrected. I don't really see how using LLMs rigorously or otherwise makes any difference here.
    
    daveliepmann
    
    I'm saying that holding effort and thinking-it-through constant, working in a loop with the LLM writing the test suite would've resulted in more than one test for the trailer signal and there's a strong chance one of them would be for false positives like curl.
    
    It's also the kind of thing an LLM has a chance at flagging at spec or code review time. But these are nondeterministic tools, so the bug could also get through regardless. I just think it's quite a bit less likely.
    
    hoistbypetard
    
    Maybe before the emergence of OpenClaw and the like. Those have human-like names and each instance (can get, or) gets its own. So I don’t think pairing with an assistant would likely have produced a mechanism to catch, say, that MJ Rathbun (the LLM agent that seemed to write a hit piece about a numpy maintainer after its PR was rejected) needs to be spotted as an LLM assistant where Daniel Stenberg is a human.
    
    ava
    
    I think that that is a bug in the detection, which should be patched. But I am also not super concerned about the optics of what is currently the case.
    
    If one is a puritan who will take that metric and nothing else and have their skewed sense of righteousness validated, then I think that the score of 0 would lessen what that individual thinks of the curl project. But any reasonable being will look 10cm below and see "ah, that's just a false positive". And I don't think I want to design my software to cater to people who cannot be reasoned with in the first place, if that makes any sense.
    
    Regardless, this is just a... I guess hypothetical discussion anyway, since I do agree that the score aspect can go, and that this false-positive should be fixed.
    
    singpolyma
    
    I'm a little confused. You seem to have quite derisive things to say about the core demographic of people who want this tool.
    
    danlamanna
    
    nah, upsetting a few vibe coders is nothing compared to the human toll of the AI industry.
    
    I know this is lobsters, but vibecoding has an actual meaning and the constant abuse of it helps no one.
    
    lake
    
    As an aside, I think it's worth observing that for people whose opinion of LLM-{assisted/generated} code is strongly negative, the meaning of "vibecoding" has clearly shifted, or broadened. Regardless of how the word started when coined by Karpathy, it's now clearly a larger pejorative term. It's difficult to define where the actual boundaries are, but that's common with slang terms, especially those that evolve amid rapid cultural shifts, like the one with LLMs now. (For an example of a somewhat similar semantic shift, see meaning #4 for "mid", which for many speakers apparently now means "low quality", rather than just "mediocre".)
    
    To put it simply, what vibecoding means to you is evidently different from what it means to other people. And, conflicts over what words mean in a changing zeitgeist are as old as human culture, probably. So I expect to see this continue showing up in discussions here.
    
    (I don't fully know what "vibecoding" means for me anymore).
    
    singpolyma
    
    The problem is this shift isn't universal. People on one "side" (ick) have broadened it (and "slop") to general purpose pejoratives. Everyone else has not. So it's not a matter of "what it means to you vs everyone else" but a rather confusing society-wide language issue.
    
    swannodette
    
    I feel some cognitive dissonance here. LLMs are the ultimate easy tool - you prompt and you get your answer without much fuss (velocity, since LLMs cannot by themselves deliver quality). How is that different from what this project is offering? Check a repo for something you don't want, move on to assessing an alternative that doesn't have an initial tell (because digging in is very time consuming).
    
    netzego
    
    "It automates away the need for a person to actually look into a project [...]" Well, the irony is on your side. Besides that, i can see your point. Specially with the oversimplified rating. At the same time I have much sympathy with the author's project and I think a more accurate rating would actually help people to identify "dehumanized" projects.
    
    cadey
    
    Is there a way to opt-out of your projects being shown in this list? I'm weary of harassment and would like to remove a harassment opportunity before it strikes.
    
    alloyed
    
    yeah, as a person who'd use a tool like this, I don't actually need the full list of projects; that feels more like a backend admin tool. instead, this is useful information when researching a specific project I might want to depend on, so just typing in the name is good enough. (or, even better, an inline userscript that summarizes/links to it!)
    
    chrismorgan
    
    Disclaimer: This is self-promotion, as I am the author of the tool
    
    That’s not a disclaimer, that’s a disclosure.
    
    Also note that the “I am the author” checkbox you ticked is visible—it shows “authored by ava” rather than “via ava”. So probably no need to mention it in the text.
    
    ava
    
    Ah, language barrier moment. I am still quite new to this entire community, so I thought I'd rather be careful about it. But thank you for the information! I'll keep it in mind.
    
    creesch
    
    Can be both :P
    
    chrismorgan
    
    “I am the author of the tool” is a claimer, not a disclaimer.
    
    creesch
    
    My comment was intended as tongue in cheek. Anyway, feel free to flag as off-topic.
    
    edwardloveall
    
    I've used this several times. Thank you for making it and hosting.
    
    ava
    
    You are very welcome! I am glad to hear it :3
    
    cajually
    
    The curl one is funny and seemingly perfectly inaccurate.
    
    ava
    
    It's true, yes. It is, at its core, a very simple mechanism. I think what helps in this scenario is, that the flagged signal actually gets shown, so that you don't just have to take the tool at face value, but can actually see "ah, yeah, no, that's a false positive". This is also very nice, I find, when it comes to projects that have used AI in the past, but don't anymore. The information of "ah, this is correct, but that commit was 2 years ago" gives you more information to base your judgment on.
    
    addison
    
    I'm desperately sad to see LibAFL on this list. Not because it is wrong, but because I could not convince all of my fellow maintainers to not slop in the codebase. This is a major reason that my willingness to fix things has degraded.
    
    zem
    
    this is why i leave the "coauthored by claude" comments in my git commits - people genuinely have strong feelings on this issue and i figure might as well give them an easy way of getting that information.
    
    doronc
    
    Ha, no vibecoding tag
    
    st3fan
    
    Needs a vibecoding tag be consistent with other posts about vibecoding.
    
    ava
    
    That's not how I understood that tag and its description. But you are here more than I am, so I'll defer to your judgement
    
    David_Gerard
    
    It's about it, so it counts. But nice one!
    
    ava
    
    Thank you! c:
    
    st3fan
    
    I think it is clear the tag has lost its meaning since it is slapped on anything that directly or indirectly refers to positive or negative articles about using AI in a different number of ways.
    
    rdg
    
    My worry is that projects will refrain from advertising their use of LLM tools to avoid being easily identified, making detection much harder.
    
    I think having a well written and structured page on why LLM tools should not be used (baked by references) could be more effective than shaming in preventing their adoption. I suspect that many maintainers have been exposed to more "pro AI" content that shapes their views in that direction and don't see the whole picture.
    
    darius-it
    
    Somewhat related story that was posted a while ago: https://lobste.rs/s/avubpi/can_we_measure_software_slop_experiment
    
    It also has a similar website: https://slop-o-meter.dev/. What I like about this implementation in particular (aside from it's fun/silly design) is that you can tune the parameters of the scoring algorithm to your liking, which makes sense since they might not apply to every repo equally. Ironically, the implementation itself is also slop though :/