An (almost) catastrophic OpenZFS bug and the humans that made it (and Rust is here too)
99 points by bal-e
99 points by bal-e
I’m a self-proclaimed Zig fan, but this seems a little disingenuous to me. @andrewrk uses const
prolifically where the original code mutated variables in place.
If he had written
var cols = vdrz.vd_original_width;
// ...
cols = vdrz.get_logical_width(txg);
and
var psize = (asize >> ashift);
psize -= nparity * (divCeil(asize_shifted, cols) catch unreachable);
psize <<= ashift;
which is a more direct translation of the original code, then the Zig compiler wouldn’t have caught either mistake. Surely a fairer way to put it is that @andrewrk’s preferred style, and perhaps the style of idiomatic Zig caught the error, not the language itself.
If the original code had been written in the same way, using const
everywhere, then clang/gcc would’ve caught the same error with -Wunused-variable
.
I was just going to say… This blog post wouldn’t have even been made had it failed. Not a good signal.
I’m confused by this criticism.
If you develop a tool X that does everything tool Y does and has extra features, why wouldn’t you share results when you find a new thing that tool X does better? There’s no reason for anyone to publish thousands of articles about the ways that Zig and C have similar behavior.
There’s no reason for anyone to publish thousands of articles about the ways that Zig and C have similar behavior.
If C shows behaviour which is considered defective and zig is found to have the same defect, it absolutely makes sense to publish about it as an opportunity for reflection and finding ways to mitigate or fix the defect.
It’s definitely a lot more interesting than changing the problem in a way C compilers could also solve it then take a victory lap.
It’s definitely a lot more interesting than changing the problem in a way C compilers could also solve it then take a victory lap.
I get that criticism.
But @LenFalken seems to be saying that we should be suspicious of Zig because Andrew wouldn’t have published this blog post had he found the problem to be identical in Zig, which is a different critique that I don’t understand.
I always write Zig code to minimize unnecessary use of variables. Why would I make an exception for this snippet?
I don’t disagree and don’t think that’s a bad style. In fact most of my Zig code ends up looking the same. But the article opens with questioning whether Zig would have caught the bug. I’d argue that Zig didn’t catch it, but your adherence to that particular style of writing code did it.
I think the article is a solid argument in favor of maximizing the use of const, but doesn’t really make a case for Zig by itself.
Well, I hadn’t considered about unused assignments to variables until this lobste.rs thread, so I’m glad I made the post :-) Will be looking to add that to the language now.
Nice! I wonder if that could eventually be the precursor to some form of linear typing too. I’ve read that Hare is looking at exploring that space.
I think it’s pretty common for compilers to do the proper liveness analysis for unused variable warnings. Here’s an example where the liveness analysis has to compute a least fixed point to determine that x is dead:
void f(int n) {
int x = 0;
for (int i = 0; i < n; i++) x++;
}
And here’s clang -Wall:
<source>:2:9: warning: variable 'x' set but not used [-Wunused-but-set-variable]
2 | int x = 0;
| ^
If you refcount uses to calculate liveness (which is probably most people’s first idea) you end up computing the greatest fixed point instead, which is less precise when the dead variable’s dataflow graph has cycles. There was a fun GC paper a while back pointing out that RC and GC compute greatest and least fixed points respectively, and RC famously cannot collect cycles. The situation with static analysis of liveness is analogous. Paul Khuong has an old blog post with the observation that most people’s first instinct (“most people” meaning people who don’t work on compilers or static analysis) when solving a problem often leads to the suboptimal fixed point: https://pvk.ca/posts/62/
After more experimentation it appears that clang -Wall does not report a dead variable if it’s used as an rvalue expression:
void f(int n){
int x = 0;
x = x + 1;
}
That’s not what I’d expect as a programmer! It’s silly that x++ or x += 1 (x is only used as an lvalue) vs x = x + 1 (x is used as an rvalue) makes any difference to the warnings when there’s no relevant difference in semantics or programmer intent. Someone on Mastodon tested it with clang-analyzer and it didn’t catch this either.
It’s silly that x++ or x += 1 (x is only used as an lvalue) vs x = x + 1 (x is used as an rvalue) makes any difference to the warnings when there’s no relevant difference in semantics or programmer intent.
They actually have subtly different semantics.
x = x + 1
is two separate ops: a binary add, and a “simple” assignment. x + 1
has to be evaluated first, and only then is x = x'
is resolved. In particular, that can mean two separate accesses to wherever x
is stored.
Meanwhile, x += 1
is a “compound” assignment. It’s required to have the same result as x = x + 1
, but it is a single op, and x
will only be evaluated once.
x++
is not an assignment at all; ++
is an operator that returns x + 1
, but the change in the stored value is a side-effect which doesn’t have to be resolved immediately.
The difference for our purposes is that x++
and x += 1
are “indivisible”: the compiler can say with confidence that if you never observe the result, then it wouldn’t have mattered if it didn’t do it.
For x = x + 1
though, the issue is that x + 1
may have side effects that affect the execution of the program. Less for a simple integer on the stack, but it’s really dependent on the type and location in memory of x
. x
might be a deref for example (s->f
), or its type might be volatile
.
So to be able to say it’s definitely not used, the compiler has to understand more about the nature of x
, which means a more complex analysis.
I don’t really know how hard it would be to produce something useful, but possibly if you tried to narrow it to just cases we can be sure about, there wouldn’t be many opportunities left! I’m guessing that compiler authors probably know what they’re doing, and having to opt-in to a different tool that is slower and less reliable but more expansive is a better tradeoff.
Incidentally, I’ve had a lovely afternoon playing with Clang’s frontend:
void f(int n) {
int x = 0;
x++;
}
$ clang -Xclang -ast-dump -fsyntax-only -Wall f.c
f.c:2:9: warning: variable 'x' set but not used [-Wunused-but-set-variable]
2 | int x = 0;
| ^
TranslationUnitDecl 0x55b3cfa1ca58 <<invalid sloc>> <invalid sloc>
`-FunctionDecl 0x55b3cfa83cd0 <f.c:1:1, line:4:1> line:1:6 f 'void (int)'
|-ParmVarDecl 0x55b3cfa83c00 <col:8, col:12> col:12 n 'int'
`-CompoundStmt 0x55b3cfa83eb8 <col:15, line:4:1>
|-DeclStmt 0x55b3cfa83e68 <line:2:5, col:14>
| `-VarDecl 0x55b3cfa83de0 <col:5, col:13> col:9 used x 'int' cinit
| `-IntegerLiteral 0x55b3cfa83e48 <col:13> 'int' 0
`-UnaryOperator 0x55b3cfa83ea0 <line:3:5, col:6> 'int' postfix '++'
`-DeclRefExpr 0x55b3cfa83e80 <col:5> 'int' lvalue Var 0x55b3cfa83de0 'x' 'int'
1 warning generated.
void f(int n) {
int x = 0;
x += 1;
}
$ clang -Xclang -ast-dump -fsyntax-only -Wall f.c
f.c:2:9: warning: variable 'x' set but not used [-Wunused-but-set-variable]
2 | int x = 0;
| ^
TranslationUnitDecl 0x5641b13ada58 <<invalid sloc>> <invalid sloc>
`-FunctionDecl 0x5641b1414cd0 <f.c:1:1, line:4:1> line:1:6 f 'void (int)'
|-ParmVarDecl 0x5641b1414c00 <col:8, col:12> col:12 n 'int'
`-CompoundStmt 0x5641b1414ef0 <col:15, line:4:1>
|-DeclStmt 0x5641b1414e68 <line:2:5, col:14>
| `-VarDecl 0x5641b1414de0 <col:5, col:13> col:9 used x 'int' cinit
| `-IntegerLiteral 0x5641b1414e48 <col:13> 'int' 0
`-CompoundAssignOperator 0x5641b1414ec0 <line:3:5, col:10> 'int' '+=' ComputeLHSTy='int' ComputeResultTy='int'
|-DeclRefExpr 0x5641b1414e80 <col:5> 'int' lvalue Var 0x5641b1414de0 'x' 'int'
`-IntegerLiteral 0x5641b1414ea0 <col:10> 'int' 1
1 warning generated.
void f(int n) {
int x = 0;
x = x + 1;
}
$ clang -Xclang -ast-dump -fsyntax-only -Wall f.c
TranslationUnitDecl 0x55ca390e1a58 <<invalid sloc>> <invalid sloc>
`-FunctionDecl 0x55ca39148cd0 <f.c:1:1, line:4:1> line:1:6 f 'void (int)'
|-ParmVarDecl 0x55ca39148c00 <col:8, col:12> col:12 n 'int'
`-CompoundStmt 0x55ca39148f38 <col:15, line:4:1>
|-DeclStmt 0x55ca39148e68 <line:2:5, col:14>
| `-VarDecl 0x55ca39148de0 <col:5, col:13> col:9 used x 'int' cinit
| `-IntegerLiteral 0x55ca39148e48 <col:13> 'int' 0
`-BinaryOperator 0x55ca39148f18 <line:3:5, col:13> 'int' '='
|-DeclRefExpr 0x55ca39148e80 <col:5> 'int' lvalue Var 0x55ca39148de0 'x' 'int'
`-BinaryOperator 0x55ca39148ef8 <col:9, col:13> 'int' '+'
|-ImplicitCastExpr 0x55ca39148ee0 <col:9> 'int' <LValueToRValue>
| `-DeclRefExpr 0x55ca39148ea0 <col:9> 'int' lvalue Var 0x55ca39148de0 'x' 'int'
`-IntegerLiteral 0x55ca39148ec0 <col:13> 'int' 1
void f(int n) {
int x = 0;
(*(&x))++;
}
$ clang -Xclang -ast-dump -fsyntax-only -Wall f.c
TranslationUnitDecl 0x559ca0d73a58 <<invalid sloc>> <invalid sloc>
`-FunctionDecl 0x559ca0ddacd0 <f.c:1:1, line:4:1> line:1:6 f 'void (int)'
|-ParmVarDecl 0x559ca0ddac00 <col:8, col:12> col:12 n 'int'
`-CompoundStmt 0x559ca0ddaf58 <col:15, line:4:1>
|-DeclStmt 0x559ca0ddae68 <line:2:5, col:14>
| `-VarDecl 0x559ca0ddade0 <col:5, col:13> col:9 used x 'int' cinit
| `-IntegerLiteral 0x559ca0ddae48 <col:13> 'int' 0
`-UnaryOperator 0x559ca0ddaf40 <line:3:5, col:12> 'int' postfix '++'
`-ParenExpr 0x559ca0ddaf20 <col:5, col:11> 'int' lvalue
`-UnaryOperator 0x559ca0ddaf08 <col:6, col:10> 'int' lvalue prefix '*' cannot overflow
`-ParenExpr 0x559ca0ddaee8 <col:7, col:10> 'int *'
`-UnaryOperator 0x559ca0ddaed0 <col:8, col:9> 'int *' prefix '&' cannot overflow
`-DeclRefExpr 0x559ca0ddae80 <col:9> 'int' lvalue Var 0x559ca0ddade0 'x' 'int'
Less for a simple integer on the stack, but it’s really dependent on the type and location in memory of x. x might be a deref for example (s->f), or its type might be volatile. So to be able to say it’s definitely not used, the compiler has to understand more about the nature of x, which means a more complex analysis.
All of those semantic subtleties already have to be addressed in an optimizing compiler (as evidenced from the -O1 disassembly on these examples) to perform sound optimizations. I agree about the increased complexity; the compile-time overhead on -O0 (which people expect to be fast) of always having this on would be one argument against having it on by default, but that argument is less convincing for clang-analyzer (and I checked and clang-analyzer does have some kind of liveness analysis). In Clang’s case I think the more likely explanation is that it didn’t have an IR (ClangIR) until very recently, so it had to do everything with the AST. Despite Clang’s tight coupling to LLVM, using LLVM’s IR to drive this kind of analysis wouldn’t be reasonable.
Sure. I was just pushing back on “these are semantically equivalent”. They aren’t in all cases, and if you’re only doing semantic analysis then you can miss them. I think that’s broadly why C compilers don’t see it.
And yes, clang-analyzer (and most analyzers) will do data flow and control flow analysis, which is where dead stores are found.
Does Zig have something like the clang static analyser that can catch the mistake as written in the original style? Arguably it’s a bug in gcc and clang that they need a separate tool for such a simple analysis: it should be a built-in warning like it is in Rust.
If the original code had been written in the same way, using const everywhere, then clang/gcc would’ve caught the same error with -Wunused-variable.
It should be noted that Zig will catch the error by default when you compile Andrew’s version, no need for any optional flag.
Starting to wish I’d never said anything…
I hope you’re not taking either of these articles as attacks on you.
Your writeup was sensible and informative. It’s leading people to think more about tooling and static analysis, even if the discussion is getting a bit sarcastic.
Thanks for your writeup and the work you do on OpenZFS!
That’s very kind! Thank you for asking!
No, I haven’t been taking anything too personally, and definitely not as attacks. Honestly, I’ve been very impressed with the quality of the discussion here in particular, and I’ve learned quite a bit too.
You do grow a certain kind of thick skin working on OpenZFS, as there’s plenty of people around quite happy to tell you that you are stupid/unserious/an existential threat to all of open source. It’s nothing we haven’t heard before, though I admit that “not using -Wunused” is a novel new justification for it (even though (a) we do and (b) it doesn’t work here anyway). There was a bit of that on a couple of well-known tech discussion places that I do not visit as a rule because they upset me too much, so I was quite happy to close those tabs as soon as I opened them!
(also a lot of LLM sales, which is like, ok maybe but also there are other ways to crack a nut besides bringing the weight of the entire moon down on it).
My comment up was partly a wry eyeroll that we’re still talking about this, and also mild discomfort at something I kicked off now being used to dunk on someone else, even if it did make me chuckle a bit too :) That’s ok though, we all contain multitudes or something.
Anyway, back to the work. Cheers :)
The one thing I’m learning from it all is a consistent pattern of const
vars on my heap is probably a good thing.
This is something that bothered me a lot when I started to learn Rust initially, but the more I got into the language, the more I started to appreciate to have const by default and mutability only as an opt-in.
I’m honestly a bit annoyed at both this and the Zig post. The author of the original article had mentioned that finding this in C is doable, but both in the article and the comments, the author mentioned that because this is an old code base, it requires a substantial amount of time to knock down the linting error backlog. I don’t understand the point of smugly pointing out how your favorite tool catches an unused var.
IMHO the Zig article was the annoying one, for the reason you mention and for “cheating” by transforming the code just enough to let the Zig toolchain find the bug.
This article is just poking fun at the Zig article (even reusing its layout), not presuming to be a counterpoint to the orihinal article. Cheaky, but deserved.
we could describe those two types of sizes as being separate distinct things
struct PhysicalSize(u64);
Maybe just typing out the name would also have helped? Screens are wider these days and lots of people can touch type so I don’t really see the point anymore of psize
. It just looks lazy to me.
Yeah, longer variable names would have prevented this. Humans don’t parse characters with silicon, we read words with meat. Maybe cryptic abbreviations in “must not fail” code are best avoided?
Maybe, maybe not. I get that they look like cryptic abbreviations, but they’re also terms of art that you have to know to work in this codebase. Does that make them more or less likely to be glossed over? I for one suspect that if I saw allocated_size
spelled out longform I would still skim past, but maybe thinking “oh yeah, that’s the new name for asize
that robn forced us to change to”.
Fair enough, If it’s used everywhere and the whole team understands it I can see the benefit of abbreviating it. Are there short names that differ by more than one character? If you have superficially similar entities in your codebase with suble, but crucial differences, maybe you could make the names (and the type definitions if possible) more obviously different.
I have an aviation background and I agree emphatically with your assertion that “just get good” is not a reasonable answer. If a competent aircrew has a disaster or a near miss, “pilot error” is never the whole cause. There’s a deeper issue somewhere that needs mitigation. Any system that involves humans but can’t tolerate human mistakes is guaranteed to fail.
Fair enough, If it’s used everywhere and the whole team understands it I can see the benefit of abbreviating it
There are three implicit assumptions here:
If all of these are true, I can see the benefit of using abbreviations. The fact that I believe none of them to be true is why the coding conventions for CHERIoT explicitly disallow abbreviations. People with different backgrounds often interpret them differently. We’ve had cases where people with a security, systems, or hardware background all have different preconceived notions of what an acronym means.
On top of that, longer names tend to make it easier to have good autocomplete (and even vim has good C/C++ autocomplete with clangd these days). Especially if you use the prefix for context, so typing the prefix will find all of the things in a given context quickly.
People with different backgrounds often interpret them differently. We’ve had cases where people with a security, systems, or hardware background all have different preconceived notions of what an acronym means.
On the other hand, a name consisting of full English words rarely captures the semantics of the variable that it refers to. People still have different preconceived notions, but a longer name can make people more likely to trust their preconceived notions.
I have an aviation background and I agree emphatically with your assertion that “just get good” is not a reasonable answer. If a competent aircrew has a disaster or a near miss, “pilot error” is never the whole cause. There’s a deeper issue somewhere that needs mitigation. Any system that involves humans but can’t tolerate human mistakes is guaranteed to fail.
I wasn’t sure last night when I wrote the post what my point was, but over the course of the day I’ve realised that what I was trying to get at was exactly what you’ve just described here. Thanks :)
You probably know already but “they write the right stuff” is an essay about this sort of mindset in software engineering, it talks about the practices of the shuttle software group, and the blame and feedback cycle over “the process”.
While it’s likely way too much for the average software project, one of the things I found interesting is the feedback cycle over basically any bug which makes it through the process, to try and see if the process can be improved to catch that bug in the future. Anything which makes it through is considered a systemic issue.
Thanks, it’s been a long time since I’ve thought about that essay.
When I used to work as an SRE, we had a kind of soft rule that any time something new or weird came up, if for whatever reason we couldn’t understand what it was and why it happened and fix it properly, we should never just shrug and say “oh well, that was weird, whatcha gonna do”. We should do something - record some logs, screenshot a load graph, write a monitor to bring some oddly high metric to the fore, write a script to correlate a sequence of interesting events. The idea is that we don’t want a repeat performance to be the “same” issue - we want to be able to use it to build our understanding of the previous time. The worst thing is a problem occurs a second time and adds nothing to your understanding of the first time it happened.
It’s not quite the same, but I still try to apply that principle now while writing code: do one thing that will help the next time it comes up. In this case it’s been to mess with tools a little, but also to write something about it. Generating some discussion wasn’t quite part of my plan, but that’s going to help too (not least, some small ideas for getting more static analysis into our regular build cycle). “The journey of a thousand leagues” etc.
I have an aviation background
About 15 years ago or so, a friend of mine was helping to replace an instrument on a Cessna airplane and was amazed that you were only allowed to use one type of wire, which only came in one color. There was no red wires for hot, black for ground, or other colors for signals. Just white. It made the job much more difficult, but that was just how it was in aviation it seems.
Heh, that wire is expensive and fantastic. Having spools of it in different colours would be eye-wateringly expensive. One of the cool things though is the white insulation usually has a titanium oxide in it and can be really cleanly and permanently laser-labelled directly into the insulation for the whole length of the wire. It’s pretty beautiful but I definitely know that feeling of shock and awe the first time you see an all-white wiring harness :D
I asked about different colors, and I was told that no, it had to be white—different colors were not allowed. At the time, I suspected it was because no one wanted to pay for alternative colored wire to be certified.
I’m sure there are psycholinguistic studies on how much likelier it is to confuse two things that look and/or sound very similar. Maybe we should look it up and can make this thread more scientific, but I don’t think it’s controversial to say that similar looking things are easier for the brain to mix up.
I don’t think @rbuchberger is suggesting that your reflexive but conscious behaviour would change, rather that the subconscious would simply have more redundancy and your brain would just be less likely to mix them up, despite expending the same brainpower (skimming).
This, and the compounding effect of cognitive fatigue induced by staring at a screen the whole day.
FWIW, you don’t necessarily have to go all the way to allocated_size
. Maybe alloc_sz
and phys_sz
, though?
The issue is not the length of the names. The problem is that the two words are written mostly the same. At least the difference is at the start of the word. Otherwise it would be even harder for a human to distinguish between the two.
The words in question are psize
and asize
. They are 5 characters long each. 4 are the same and 1 is different. The ‘signal-to-noise’ ratio is dismal. The names emphasize what the things have in common, they are sizes. Instead of emphasizing how they differ, physical vs allocated. Note that even with ‘descriptive’ names one can end up with words that are mostly the name.
Following a similar style physsz
and allocsz
would be an improvement imho. For those that prefer shorter names, the following has some wise remarks https://nsl.com/papers/style.pdf.
Following a similar style physsz and allocsz would be an improvement
I think what many commenters here are missing is that the words “asize”, “lsize” and “psize” are terms of art in ZFS.
Changing them to “physsz” and “allocsz” (in such a way that doesn’t introduce an impedance mismatch between this code and what the rest of the ZFS community speaks) would entail rewriting all of the codebase, all of the documentation and lore, and all of the ZFS operators’ collective knowledge.
I think what many commenters here are missing is that the words “asize”, “lsize” and “psize” are terms of art in ZFS.
We do understand they are technical jargon. And I do understand the importance of using the same language (“ubiquitous language” for buzzword compliance). Other FS related areas use similarly bad names, like atime, mtime, etc. I still think it is a bad name.
I understand that changing existing practice/conventions is a huge life, which is why it is seldomly done. Even outside the context of software development. (ej. We still say that current flows in the direction opposite to electrons because the convention was established before we knew what electrons where).
The goal of my first comment was to argue that it was a bad name not because it was short, but because of they chose words that are mostly written the same. This is something that happens a lot more with longer variable names. And it still hampers readability.
Following a similar style physsz and allocsz would be an improvement imho.
All other things being equal, I strongly agree.
One size doesn’t fit at all, sadly. My ND brain actually struggles with anything that’s too verbose. It still seems mad that in 2025 we’re still manipulating code as text and not structured data that would allow people to work with it according to their own needs/preferences (eg aliasing identifiers to be longer or shorter.)
Oh I completely agree - overly long names are just as unreadable as overly short ones. In that case I think the solution is to narrow the surrounding scope/context so the name itself doesn’t have to carry all the specificity.
I don’t know if it would make it more or less likely to happen. It’s certainly not obviously better, I think.
Those three size types are used heavily throughout the codebase, as well as baked into the on-disk format - they’re fundamental to understanding how everything works. So we’d be changing it everywhere, so it would end up looking the same as everything else, which is pretty much perfect conditions for the brain glossing over it.
Which really goes back to the point I was trying to make, which is that humans are bad at noticing details a lot of the time, even when they’re trying really hard to. We can shuffle things around a little bit, maybe, but I’ve never seen anything to suggest that we can ever overcome it, and our collective body of work seems to back that up.
Isn’t this bug trivially caught by unit testing?
The bug is that it simply returns one of its arguments, instead of doing any calculation. That will make basically any reasonable test fail.
I think the bigger problem is that there is not a culture of testing in C.
The type system solution may work sometimes, but it doesn’t catch the case where the answer if off by 1, off by a power of 2, overflows, underflows, or an infinite number of other possible bugs.
Write unit tests!! Especially for pure functions. They are so easy to test.
It’s testable, but I don’t think it’s trivial.
On the function itself, it’s not actually pure. In general, asize and psize are not correlated at all - psize is an amount of data, but asize is basically opaque to the specific driver in use. The units are not convertible at all; in fact, asize doesn’t even have to be a “size” - it could be an identifier or a handle for something that the vdev uses to get that data back later.
This function I think is a good example of why things in C can be challenging to test. In this case, the result is extremely dependent on the overall vdev state (number, size and layout of disks) and also the history of that state (txg
can basically be thought of as a timestamp). To test it thoroughly, you effectively have to mock the vdev_t
, and doing that is complicated because layouts are dependent on all sorts of weird things, and it’s not uncommon to end up with a pretty long string of dependencies required.
And we do do this in some places. Consider the raidz_test tool, which is basically a unit test for the raidz math. It works by building up different layout “maps” and then blasting data into it and making sure both the columns and the data within them are generated correctly. But, since in the real world the “map” is entirely based on the incoming user IO, it builds up fake IO objects as inputs to the map generation function.
I think some of those things point towards API mis-design, but some of that is just C things too. Structs are very porous, and the language really nudges you towards passing whole structs around rather than just the values you need inside them. Sometimes its hard to change that too - in some contexts, it can have a measurable effect on performance (copies, register spills) or can blow out your stack. Sometimes it’s not (here’s one I did last week).
And there’s another dozen annoyances, obstacles and other stuff that we’d need to clear first, but I’ve yapped enough. Ultimately, it’s all doable, and we should, and I’m slowly picking things off as I can. It’s just a long way off.
That is all fair … but I would note that I have basically never seen a Rust project WITHOUT unit tests.
There is a strong culture of testing there, so if you were to switch to Rust, I imagine the testing would catch a lot of your bugs.
It’s definitely not “abandon C” or “rewrite it in Rust”; those aren’t practical.
I also think there’s a bit of a blind spot in the way Rust is sold to C programmers
So yes, the Rust type system is something you can’t get without rewriting all your code in a different language, which as you say isn’t practical in every case.
But you CAN get the testing in C, and IMO that is more than half the story!
I think the main inhibition in C projects is usually the build system – it should be set up to build and run multiple test binaries (in parallel), and it should also have ASAN, UBSAN variants, and more
I wrote my own little build system on top of Ninja to accomplish that for my C++ code. CMake definitely has support for tests, but I honestly don’t remember if it needs to rebuild for ABSAN/UBSAN, which is my pet peeve (e.g. variant builds should not invalidate each other )
I also run my tests in various preprocessor modes like -D GC_EVERY_ALLOC
, which is hugely helpful. This also seems inhibited by typical build systems. (I’m sure it is possible, but IME most C projects that use makefiles have a lot of friction around these things)
(And granted, I am talking about user space code, not kernel code. But I imagine that the same thing applies in Rust – you run your tests as normal user space binaries, and you can also do that in C, with enough effort.)
The function in question is qualified as static
, which means any tests have to reside in the same file, which would have to be global in nature. Or the function would have to be global, which defeats the purpose of using static
to hide an implementation detail. I would contend that with C, “unit tests” are not as easy to write as in other languages, due to the nature of C compilation and linking.
Wouldn’t the following work?
/* file: foobar.c */
static uint64_t
vdev_raidz_asize_to_psize(vdev_t *vd, uint64_t asize, uint64_t txg) { ... }
#ifdef WE_R_DOIN_UNIT_TESTIN
#include "foobar_utests.c"
#endif
Have you worked on (or seen) a C project that used this approach?
Yes, although reversed. There are several tests in BIND where the test program #includes the library implementation. For example
https://gitlab.isc.org/isc-projects/bind9/-/blob/main/tests/isc/hashmap_test.c
That’s a thing that can easily be changed. I compare these two things:
static
, and write unit tests that catch bugsstatic
, and have bugs that could have been caughtFor me the choice is extremely easy.
It is a bit of a tax, but ultimately I also view C as only one part of the system I’m working on, and the other parts are written in languages that are easier to test (which is not the case for this kernel code, admittedly).
I think the bigger inhibition in C projects is the build system, which is not always set up for multiple test binaries.
Would wrapper types like this avoid the bug in C too?
typedef struct { uint64_t size } asize;
typedef struct { uint64_t size } psize;
Maybe, at the cost of a good amount of readability, and still needing some programmer discipline.
The problem is that they’re now compound types, not scalar, so they have different rules for conversions, layout, etc.
Say we have:
typedef struct { uint64_t a_size; } asize_t;
typedef struct { uint64_t p_size; } psize_t;
Initialising them becomes awkward:
psize_t psize = 4096; // error: invalid initializer
psize_t psize = (psize_t)4096; // error: conversion to non-scalar type requested
psize_t psize = { 4096 }; // ok.
And we can’t initialise them on the fly (initialisers are not expressions):
psize_t asize_to_psize(asize_t a);
psize_t psize = asize_to_psize({ 8192 }); // error: expected expression before ‘{’ token
Or return them either:
psize_t asize_to_psize(asize_t a) {
return { 8192 }; // error: expected expression before ‘{’ token
}
They’re hard to work with too, since we no longer get math ops:
asize_t asize = { 8192 };
psize_t psize = { asize << 1 }; // error: invalid operands to binary << (have ‘asize_t’ and ‘int’)
asize_t asize1 = { 8192 };
asize_t asize2 = { 4096 };
asize_t total = asize1 + asize2; // error: invalid operands to binary << (have ‘asize_t’ and ‘int’)
return (asize1 == asize2); // error: invalid operands to binary == (have ‘asize_t’ and ‘asize_t’)
Still, they can’t be accidentally used in place of each other:
asize_t asize = { 8192 };
psize_t psize = { asize.a_size << 1 };
return (asize == psize); // error: invalid operands to binary == (have ‘asize_t’ and ‘psize_t’)
psize_t asize_to_psize(asize_t a) {
psize_t p = { a.a_size >> 1 };
return a; // error: incompatible types when returning type ‘asize_t’ but ‘psize_t’ was expected
}
In reality though, there’d be macros to help with all this, but they are difficult to write and maintain, and still require discipline to use. Would it be worth it to avoid this shape of bug? Hard to say. Maybe, but in deciding these things I imagine myself trying to persuade my colleagues that this is better, and right now I don’t feel great about my odds.
Even a plain ‘typedef’ of ‘typedef uint64_t asize’ . I have worked on projects where no raw types were allowed. It really helped with good static analysis tools (FlexeLint) that would then complain whenever you mixed types.
A typedef is just an alias, as far as the compiler is concerned they’re the same thing. So the only thing typedef would offer is a hint to the developer, hint which was already in the names and bith the author and reviewer missed.
You can take the original code, add the missing bits so it compiles, and replace the types of psize and asize by your typedefs, and it will still compile just fine with no warning (even at Wall).
It really helped with good static analysis tools (FlexeLint) that would then complain whenever you mixed types.
So now you need additional tooling which specifically decides to treat typedefs as newtypes.
Unfortunately in this case, typedefs are merely alternate names for an underlying type. typedef uint64_t asize
and typedef uint64_t psize
just get you two more ways to spell uint64_t
. Hell, depending on your environment, uint64_t
is probably itself a typedef for unsigned long long
or something.
Yes, however due to C’s lack of conveniences around those this can get quite unwieldy (in this case it’s fine).
Here even without any sort of types shenanigans a compiler (or linter) able to warn about unused stores would tell you about psize (and cols) being dead stores.
And that’s one of the reasons I find some langage’s refusal to compile on unused variables so frustrating: they’ll bust your balls about that during dev but they’ll completely miss unused stores even though every issue which can be signaled from an unused variable is as or more likely to be signaled by an unused store.
For real. The common C static analysers usually report it ok. Do you know any more about why C compilers generally don’t? I’m assuming either its actually expensive to work out, or its not reliable enough. I would turn that warning on instantly and deal with any false positives, but maybe there’s a better reason?
I have no idea. I tried checking clang’s but tracker but found nothing for diagnostics, just dead store elimination.
Might just be that nobody has really proposed (/ put the effort in) it. After all they do have “unreliable” diagnostics in extra/everything so even if dead stores was not a 100% that should not be a blocker.
This is exactly the kind of bug that terrifies me as well.
For example, how about a primary key on a Django model. Is an Employee.id an int? Sorta, but not really. It’s an EmployeeID, and that’s an important distinction.
Which operations should be defined on it? Surely not all of the int operations. It doesn’t make sense to add two EmployeeIDs together. Equality against other EmployeeIDs? Yep. Comparing an EmployeeID to a ProductID should be undefined. And you could make the case that gt/lt and sorting is an implementation detail. But when everything an int, we lose all of that meaning.
I’m sure people have more enlightened thoughts in this area. :)
You can do the same thing as described in the post. E.g. in one of my projects I have separate newtypes for all the primary keys:
#[derive(sqlx::Type)]
#[sqlx(transparent)]
pub struct BookmarkId(i64);
#[derive(Debug, FromRow)]
pub struct Bookmark {
pub id: BookmarkId,
pub user_id: UserId,
pub url: String,
pub title: String,
pub author: Option<String>,
pub notes: Option<String>,
pub read: bool,
pub public: bool,
pub created_at: Timestamp,
pub updated_at: Timestamp,
}
sqlx(transparent)
makes it so that it can be used directly in queries without having to get the inner value out, but in the Rust code all the ids are a separate type.
And you don’t even need high level libraries for it, raw rusqlite will do it fine, you just have to implement a few traits by hand (or macro) and you’re good.
Typed ids is a nice pattern (if just a subset of solving primitive obsession).
You can also do #[repr(transparent)]
to give it the same memory layout as the single tuple inner type.
python has typing.NewType
that gets you partially there at the type checker level (e.g. mypy). it prevents passing a EmployeeID
to a function accepting a ProductID
, and it requires these ids to be constructed explicitly:
EmployeeID = typing.NewType("EmployeeID", int)
ProductID = typing.NewType("ProductID", int)
foo = EmployeeID(12)
…but it’s not a panacea, b/c the values will behave as regular integers at runtime, e.g. they compare equal and can be mixed in arithmetic operations:
$ python
>>> EmployeeID(123) == ProductID(123)
True
>>> EmployeeID(1) + ProductID(2)
3
What’s up with the first cols
assignment? It’s unused, no?
Good spot :) Yeah, another thing a compiler won’t tell you about, though it will quietly optimise it away, so it’s no big deal. Unfortunately there’s loads of this stuff throughout the OpenZFS codebase like this. Static analysis finds a good chunk of it (including that one), but it takes time to go through and clean up, and alas we’re very short on time.
As it happens though, when I have a few spare minutes and the mood takes, I do sometimes take a swing at tidying up things like this. Current pile of patches is robn/zfs#tidy-your-room, and this particular one is already sorted in robn/zfs@288782ae01.
so it’s no big deal
The problem isn’t performance, the problem is that this likely points towards a bug introduced during a refactor.
In this case, the behaviour was changed a little, but the initialiser wasn’t. There is no bug there, just untidyness. And it’s true that untidyness makes it harder to spot new messes, so I definitely agree its a deal, but then we’re back to programmer time being scarce. We do what we can!
Curious. I can use -Wall -Wextra
in GCC/clang and get an unused warning [1], but I I wonder if adding those options now (if they aren’t being used) will generate too much noise (I always compile C code with them, plus some extra ones that aren’t covered by those two).
[1] I even added such a warning to an assembler I wrote, so it’s not exactly rocket science these days.
I tried it on godbolt, neither GCC (15.1, Wall, Wextra) nor clang (20.1, Wall, Weverything) flag an unused store. And since psize is updated in place it has at least one read which satisfies both compilers (also works on Go, and I assume Zig).
If it’s a single-step transform, -Wunused-variable
will catch it:
uint64_t asize_to_psize(uint64_t asize) {
uint64_t psize = asize << 1;
return asize;
}
$ gcc -Wall -Wextra -c size.c
size.c: In function ‘asize_to_psize’:
size.c:5:14: warning: unused variable ‘psize’ [-Wunused-variable]
5 | uint64_t psize = asize << 1;
| ^~~~~
If it has multiple steps, it will not:
uint64_t asize_to_psize(uint64_t asize) {
uint64_t psize = asize << 1;
psize <<= 1;
psize *= asize + 3;
return asize;
}
$ gcc -Wall -Wextra -c size.c
$
(fwiw, gcc version 12.2.0 (Debian 12.2.0-14+deb12u1)
).
That’s correct behaviour by the documentaion, which specifically says “unused aside from its declaration”. My (rough) understanding is that its “stored” (in a register or whatever), it’s no longer “unused” by this definition.
I don’t think “stored” really matters, just that the variable itself is read once at the langage level (the entire thing will be SSA’d long before it gets lowered to assembly after all). Any read of the variable will satisfy the “variable is used” predicate and suppress the warning (/ error in go and zig, in the latter the langage server will even update your code with _ = var
to avoid the compiler complaining, an exercise in insanity if I’ve ever seen once).
Yeah, another thing a compiler won’t tell you about
Some compilers do, fwiw (though maybe not C compilers).
Well, C is implied here :)
It’s making me think tho: rustc does flag it, c2rust exists, and even though the code it produces has its issues maybe it surfaces this issue? Because it’s not part of the C semantics so c2rust should have no reason to work around it.
edit: it kinda works but c2rust allows dead_code and unused_assignments by default, so you need to remove those.
haha! Despite being roasted, I’m happy to see this post-
When I worked at Backtrace, I attempted to write all my C code this way, but the CTO insisted on putting all the locals at the top of a function as variables. This really frustrated me. So it pleases me to see the culture shifting with posts like this.
However… I have to call out the post a little bit for not showing the command used to compile the C code, because by default, this is not caught:
[nix-shell:~/tmp]$ clang -c bug.c
[nix-shell:~/tmp]$ gcc -c bug.c
[nix-shell:~/tmp]$ clang --version
clang version 20.1.5
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /nix/store/kfav6wkyglcpgrajyfq0zdgfqlm3cwky-clang-20.1.5/bin
[nix-shell:~/tmp]$ gcc --version
gcc (GCC) 14.2.1 20250322
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I have to call out the post a little bit for not showing the command used to compile the C code
He sort of does because the flag is in the warning output, but I agree it could be clearer (and more true to mirroring your post):
t.c:30:12: warning: unused variable 'psize' [-Wunused-variable]
^^^^^^^^^^^^^^^^^
I can reproduce it with gcc:
$ gcc -c -Wunused-variable t.c
t.c: In function ‘vdev_raidz_asize_to_psize’:
t.c:30:15: warning: unused variable ‘psize’ [-Wunused-variable]
30 | const u64 psize = parity_adjusted << ashift;
| ^~~~~
$ gcc --version | head -n 1
gcc (GCC) 14.3.0
Oh that’s also a great point. It was a warning only, returning exit code 0. So in a larger project this would have very likely scrolled by with nobody noticing it.
Every large C codebase I’ve worked on uses -Werror, at least where it existed (introduced in .. GCC 4 maybe?). If OpenZFS hadn’t, enabling it and dealing with the fallout would have been one of the first things I did when I arrived.
I am always wary when I see even a single warning when compiling that doesn’t stop the build. It’s not always the project’s fault; build options, compiler versions, libc differences, all sorts of things can change something for the compiler. But to not have -Werror on? If it’s anything important, I go and find out why, in case it speaks to some deeper issue with the development process that I might not want to bet on.
Spoiler: The bug would be easier to catch in Zig because the Zig compiler flags unused variables.
I’m surprised that OpenZFS happens without static analysis for unused variables, as I thought that was a pretty easy thing to check for. I wonder how many other bugs fall out if you run more static analysis on the ZFS codebase.
Except the bug would not have been caught in Zig without writing the code differently, because the variables were not unused.
And had the code been written differently, in such a way that Zig would have caught it, the C compilers could have caught it as well. The original post even mentions -Wunused
.
Compilers don’t have to track unused variables, they can track unused assignments to the variables:
warning: value assigned to `psize` is never read
|
| psize <<= ashift;
| ^^^^^
|
= help: maybe it is overwritten before being read?
= note: `#[warn(unused_assignments)]` on by default
Yes I know that, I was in the original thread where this sort of things was being discussed. However that is not what the most widespread C compilers (GCC and Clang) do, nor is it what the Zig compiler does.
The fact that I can’t turn this off is why I don’t like writing zig programs.
The fact that other people can is why I dont like running C programs
I just started toying with zig recently and it’s really such an enormous pain. I have no idea why debug builds have to deal with this nonsense - it makes development iterations so frustrating.
It’s fascinating – do you find rust pleasant?
I’ve found the few times I’ve written rust extremely painful, but enjoyed writing zig a lot.
This article says
There’s no “cheap” annotation (eg
-Wunused
) that will notice it.
and goes on to suggest that in Rust you might catch this by using newtyped return values. Well, if you port this function exactly as-is to Rust, without doing any type annotations or adding any compiler flags, Rust will already catch this, saying warning: value assigned to
psize is never read
. Similarly it will also point out the fact that cols
is written to twice in a row. I really wonder why C compilers don’t do that out of the box.
This is precisely the kind of bug that makes me use C++ instead of C for any systems programming. Not being able to differentiate different quantities is just terrifying for a language used for the most privileged parts of the system.
In C++ I find myself using enum classes with an explicit underlying type for ad-hoc newtype integers. When interop with C is required, the single field struct mentioned above is my poison of choice. I rarely promote them to full-fledged classes with methods and conversions.
You aren’t wrong. I wonder if it’s worth me looking back towards C++ a bit. I wrote quite a lot of it in 2009-2012. I don’t know that I got good at it, but I liked it ok (and without that experience probably wouldn’t have appreciated what Rust was trying to do when it came along). Might be a path of someone less resistance for getting a little more rigor into some of the bits of OpenZFS that could really benefit from it.
There’s a much easier migration path to C++ from a codebase that is C, since there is a large common subset of C and C++ that makes it easy to incrementally rewrite C as C++. For completely new projects, I think Rust might be a better option. Making Rust work with ownership models that were not originally designed with Rust in mind seems to be challenging but that’s not an issue for clean-slate codebases.
I did hit some issues using C++ in kernel modules for FreeBSD because FreeBSD kernel modules (.ko files) are really just .o files: they don’t get a proper link step and so COMDATs are not resolved, which meant that there were then things left in them that the kernel loader didn’t know what to do with. At some point I want to fix this, but it’s probably a blocker for a lot of C++ constructs in OpenZFS (inline
functions, templates).
I have to explicitly call this out. This “a one-character fix” is a stellar example of how to write a PR/commit message. Kudos!
Thanks. I try hard on those :)
Do you feel any type of way about the response? Unless there was communication in some other channel, it just looks like an “approved” with no comment. This is the sort of thing I would expect a bit of a postmortem after, but maybe not.
We did talk about it a little internally at Klara, and the original author of that PR (and my colleague) did use the opportunity to run a new set of tests based on my settings (the “aggressive allocator fragmentation” I mentioned in the post, and found some math bugs too (#17490. So in that sense, we took the opportunity to double-check.
In a proper (and for-profit!) product organisation, I think it’d be worth a pause to check if we need to update our processes, or down tools and build out some new test infrastructure, or whatever. In a volunteer- & contract-driven open-source project though, that sort of time and space just doesn’t exist.
So yes, I do sigh a little bit, because these are solved problems in the abstract. But no, I think the response is fine, because there really isn’t much to say. There’s nothing new here that we didn’t already know. We know C is tricky, and we know our testing regime is not as good as we would like. But, it is pretty good, we improve it a little but all the time, and we try to build “general maintenance and targets of opportunity” into everything we do. Sometimes you just have to play the long game.
I have vivid memories of being in calls back in the day, and you spending more time on a commit message than the change itself.
It really taught me how valuable a commit message can be, and to be kind to your future self when all the context has fallen out of your brain.
speaking of bugs, their code blocks have a common one (lines in the same code block are different font sizes from one another)
i see this on a lot of people’s blogs, specifically on webkit on ios, and i have no idea why
Curious. This is just whatever Hugo produces with it’s default code highlighter, so I’m guessing that might be why you see it around more often. I’ve never seen it before, but I’m not a frequent iOS user. I’ll borrow a phone off the kid and give it a try, and see if there’s anything obvious I can do about it. Thanks for the heads up!
It happened on my website and I generate HTML from markdown via Pandoc. Can’t remember how I figured out the fix but I found this from my style.css
:
/* Pandoc's default value of display:inline-block shows up as too large on iOS Safari. */
pre > code.sourceCode > span { display:inline; }
I was really hoping for something akin to Haskell’s newtype
: a type that is representationally the same as some other type, has no runtime overhead, but is nominally distinct. Does Zig have such an idiom? It seems that this might not be possible, as a consequence of “all structs are anonymous” (as the language reference puts it).
You just create a named struct with a single member. Or an enum wrapper (https://ziglang.org/devlog/2024/#2024-11-04).
Every struct and enum in Zig makes a new type.
Cool, thanks.
BTW, that devlog page says anonymous struct types have been removed. Does that mean the statement “all structs are anonymous” at https://ziglang.org/documentation/master/#Struct-Naming is no longer correct?
No.
I see the confusion here. There used to be an additional weird type in the type system that isn’t even worth explaining, but it’s gone now.
Please help me understand: why two distinct size types would be desirable? (Rather in general than in this case to avoid this bug.) I can see how tooling that detects unused assignment can help. I can see how having distinct handle types over void *
is good. I can see how wrapping a DB primary key into a separate type is useful (as mentioned in the comments). And, of course, I can see how having variable names with Levenshtein distance > 1 is preferable.
However the distinction between physical and allocated sizes is so thin, isn’t it? Both represent an amount of bytes on a device. Both, I believe, can be reasonably summed up and compared. And yet there is an argument to have them as separate types.
Similarly, bondolo in the comments mentioned working on projects where no raw types were allowed. At glance it seems somewhat overkill and dystopian to me. So what am I missing?
However the distinction between physical and allocated sizes is so thin, isn’t it? Both represent an amount of bytes on a device. Both, I believe, can be reasonably summed up and compared. And yet there is an argument to have them as separate types.
The answer is actually in your description. They do indeed both represent “an amount of bytes on a device”. But, they don’t represent an amount of bytes on the same device (or don’t have to). Or, more abstractly, they represent an amount of bytes, yes, but how to interpret them depends on the context.
If we only pass around the raw “number of bytes” value (as a uint64_t
), then when it comes time to interpret that value, that “interpetation context” has to come from elsewhere. Usually, that means the programmer just has to get it right. If they get it wrong, we lose the spacecraft, or feel quite uncomfortable in our 104℃ bath.
But, if we can somehow combine the raw value with the context we need to interpret it and pass them around together, then our tools can use this extra knowledge to make sure we always use the value correctly.
If the tool we have is types, then we can encode the context into the type, and the type-checking facility (say, the Rust compiler) can make sure we use it properly. It doesn’t have to be done with types as such; in a Perl or a Python maybe I could serialise it as a two-element array ['A', 8192]
, and then my “type-checker” would be something that checks that size[0] == 'A'
. Or whatever, there’s tons of methods.
The nice thing about lots of languages that aren’t C is you can still arrange for math operations to work on the value part, so you only need to worry about the context when it comes time to use it in a different context. My silly Rust version isn’t quite the whole thing, probably it .. implements Deref
? or Add
and Sub
and stuff. I dunno, my Rust isn’t very strong. There’ll be a crate. It’s not the point.
That was my thinking anyway.
My silly Rust version isn’t quite the whole thing, probably it .. implements Deref? or Add and Sub and stuff.
Implementing arithmetic operations probably makes sense if these quantities are used for computations (also Eq/Ord). Deref is less likely, especially for number, in my experience From/Into is a better way to implement “cast to primitive” if it’s needed.
Both, I believe, can be reasonably summed up and compared.
Can they? In a way that has a meaningful result the code regularly needs? That seems like a huge assumption to me, and I would easily believe someone actually involved in the project asserting the opposite, as the article seemingly does. If you work with both size types close to each other regularly, but almost never want to have them interact, forcing the developer to be very explicit when they do that seems like a fairly obvious measure to avoid mistakes.
Now, after reading new robn’s comments that shed more light onto the specific details of fundamental differences between asize and psize, I can see why having two distinct types would be better. They are two very different amounts of bytes in the domain of OpenZFS.
It’s such a shame typedef is merely an alias in C )-:
On a general note (as I meant it to be), isn’t bondolo’s mentioned “no raw types allowed” approach too much? Or: when is it not too much?
On a general note (as I meant it to be), isn’t bondolo’s mentioned “no raw types allowed” approach too much? Or: when is it not too much?
If you’ve got tools that can understand them as something different, then it’d certainly be worth a look. Without that, I think typedefs to fundamental types are probably only worthwhile when they have a clear, focused meaning as something that really shouldn’t be thought of as a quantity. Anything that gets a lot of math probably shouldn’t be, because there’s other stuff the C programmer needs to think about when doing math (overflow, signedness) that gets harder if the underlying type is not visible.
(I have not thought hard about this, but that’s my initial reaction anyway).
I agree with you. I think type wrappers for integers can be useful, but they’re very often shown in small examples to suggest they’re an obvious net win that will prevent bugs. In reality their usefulness is inversely proportional to how much casting back and forth you’ll have to do. At best that’s very little and the values really do just flow through the program with minimal interactions. Maybe that’s the case for asize, I don’t know. But at worst casts are everywhere and you’ve just made the program more verbose for nothing. If your eyes glaze over the wrong word they can also glaze over a cast to the wrong word.
Yep, this is really important, and is why I’d be loathe to do it in C, which really doesn’t have the tools to say “I want to use this like a plain old number and just be really strict on the conversions”. If we must rely on reading to catch some kinds of bugs, then anything we do has to be more readable, not less.
In any language with operator overloading. delegation, transparent dereferencing, etc? No brainer.
I wondered how well an LLM would catch this issue vs static analysis or switching to a high powered type system. It did surprisingly well. Sharing just the code snippet and asking to “Review this: ”, all the LLMs I tried immediately noticed the mistake and labeled it a critical issue.
But as this was part of a larger patch, using it for a full review would cause the context to be larger leading to the response having more noise or generally just performing poorly. As mentioned, static analysis tools have a lot of false positives and this is something I’ve noticed can be an issue using LLM for reviewing code as well. So instead of the code snippet, I shared the entire patch. Fortunately, Gemini 2.5 Pro and Claude 4 Sonnet both still noticed the “Critical issue” and boldly stated it must be fixed before moving on. o4-mini-high didn’t seem to catch it, and o3 did notice the issue but put it alongside a bunch of others which may or not be real problems, so I think it flunked.
These tools are evolving and while many choose not to use them for development, it might not be the worst idea to run a hand-coded patch through a quick analysis to see if it can spot an issue the 3 super experienced developers missed.
This was my first thought as well, so I did the same experiment before reading the comments.
You don’t even need to go to top tier models like 2.5 Pro or 4 Sonnet; even Gemini 2.5 Flash Lite spots the bug even when given the original diff with no additional context. (Yes, I turned off grounding with Google Search.)