Can We Measure Software Slop? An Experiment
4 points by pscanf
4 points by pscanf
I like the idea but the algorithm utterly fails for my repositories.
Ashet OS got a slop score of 3.3/5, while the first larger AI-engineered part was merged last week. Rest of the codebase is just a lot of work, with huge commits (merge requests typically in the ±10kloc range, and commits in +1kloc).
Seems like i'm working like a robot 🤖
zig-args even gets a score of 4.7, but has never seen AI at all.
kristall got a 0.3, which is fairly accurate (no ai used at all either)
For blade it works accurately, as in "i'm just supervising the test suite, not the code itself." I personally wouldn't count it as slop, as i'm taking care the test suite is tight and the code coverage is high
Yeah, it's really hit and miss, to the point where even for the plausible-looking ones you wonder "but is it actually saying something, or is it just chance?"
Also it penalizes big commits with few signals attached, which are very common for solo-dev side projects. I was considering adding some heuristic like "if the project has only one main committer, increase the attention value of one commit", though it then might overestimate other types of projects. I should figure out a way to "profile" a repo and apply different heuristics based on that.
So now you have some good data sources to measure on.
For my repos, the heuristic is fun: "Is the commit prefixed with an AI model name?"
The algorithm may or may not be accurate (I think some of the repos I submitted had agent-written code), but I'm mostly here to mention that I love the slop pouring animation. I'm curious who drew / animated that.
Thanks! And it's AI slop, of course! Though it took a fair bit of prompting Google's Veo before I got it to the state I wanted I realized I racked up a 50€ token bill and decided it was plenty good for what I needed.