Is Rust faster than C?
80 points by steveklabnik
80 points by steveklabnik
Good post, agreed that there’s no inherent reason for either language to be faster. It’s all project specific.
To add another “social factor” – to the extent that C codebases may be faster as they stand today, it’s often just because they’re much older codebases and have been optimized over a long time. And C has the advantage there just by virtue of being around a long time. Language matters, but so does however many millions of engineer-hours have gone into a project!
The meme that old codebases are more optimized (or “battle-tested” to be free of bugs) really annoys me. It depends on whether someone has taken the time to optimize performance. If not, the even extremely popular and old C libraries can be slow.
My first Rust project (encoding_rs) was/is a lot faster than glibc’s iconv, which makes an architectural design choice that is fundamentally slow.
The meme that old codebases are more optimized (or “battle-tested” to be free of bugs) really annoys me.
To some extent, it was shown that old code has less bugs.
I know, but it’s not correct to think that old code becomes bug-free by just letting it sit there. If everyone assumes that someone must already have fuzzed it, the first person to actually fuzz it may well find bugs in supposedly “battle-tested” code.
This is a fair point! I mostly live in the HPC and scientific computing spaces, where there are a lot of long-lived highly-optimized codebases. So I was mostly thinking of those, not things like glibc. :)
Thank you for encoding_rs! I couldn’t have implemented TextDecoder for https://jsr.io/@easrng/safe-whatwg without it
It’s not just that there are old C codebases that people have had longer to optimize, many old C codebases were written to run on significantly more constrained hardware. It may have been worth adding implementation complexity to old C code to let it work at the time that results in lower resource consumption today, but a Rust (or C) programmer today targeting more powerful hardware will chose simplicity over squeezing the last performance out.
On the flip side old C code may designed and optimized to run on single core machines while modern Rust (or C) code may be explicitly architected to take advantage of multiple cores.
many old C codebases were written to run on significantly more constrained hardware.
OTOH every time I interact with one of these older C libraries they end up using linked lists and I always end up getting 10x+ performance improvement by re-implementing the thing in C++ with some trivial to use vector / flat_map / …
This is, in my opinion, the biggest advantage of C++ or Rust for performance. It is trivial to write code that is abstract over data structures. When you discover that the data structure that you picked has good performance for what you expected the workload to be but bad performance for what the workload actually is, you can change it easily. In C, implementation details of the data structure tend to leak unless you’re really careful (and provide macros that wrap other macros and are incredibly painful to debug).
In one project, I had a carefully written generic concurrent hash table in C (so many macros!). It was sufficiently hard to implement that I used it in a bunch of places. After some profiling, I realised that concurrent insertion was rarely necessar: most places where it was used were already holding a lock. After moving it over to C++, I tried replacing the custom hash table with a few off-the-shelf implementations and a lock. The new code was more maintainable (hey, look, no hand-rolled hash table made of macros!) and performed better (on the microbenchmarks that actually hit these paths: it turned out to make no difference in normal use, so all of that original complexity was to no benefit).
It’s not just an issue of multiple cores.
Classical algorithm analysis teaches things with assumptions that haven’t been true for a long time due to the way caches, etc., make it so that the number of memory accesses alone doesn’t matter but locality, predictable direction, etc.
Similar effects apply between more recent recent hardware generations.
Something that makes sense on tiny hardware without deep cache hierarchies and such doesn’t necessarily work optimally on newer/bigger hardware.
There’s a big confounding factor here because two language properties are often in direct tension:
This is why late-bound languages often do better in large-scale projects than you’d expect from benchmark results. If your requirements (either from users or from underlying hardware) are changing, it’s far more useful to have a language that encourages building abstractions because they let you start building for one set of constraints and then optimise later. When your requirements have become more static, you often want low-level control to be able to microoptimise.
Thanks!
And yeah, this is a good point too: the amount of time we can spend on a problem is part of that engineering discussion. Sometimes micro-optimizing things isn’t the best use of time, or spending time on that when more macro changes should be made is always tough.
there’s no inherent reason for either language to be faster. It’s all project specific.
The rules of languages can change what a compiler can safely do given the “same” code. Now often you might be able to annotate code in one way or another to get some of the safety back but at the end of the day a language with “more” semantic granularity will have more leeway to give you free optimizations. On top of language features making it easier to provide code that’s easy for the compiler to optimize [0]
A more expressive type story means that you’re not grabbing as many “void*“s to use for your generic data structures, and now your compiler has even more to work with in terms of simplifying things. Same story for almost all observable effects and any guarantees you can rely on.
The bounds check point in the article is valuable, because if your codebase is filled with iteration rather than array access, you just don’t need your bounds checks!
It feels to me obvious that if a compiler has more information then it can do more stuff. Not guaranteed, but if I’m trying to do a thing and in one implementation I have a bunch of invariants, and in another I don’t… more information feels like it’s going to get us further.
The trickier thing is, of course, the comparison of “like for like” when people want evidence. If you start off with some C code and then reverse engineer some Rust code that compiles to the same thing, that’s “like for like” in some sense. If you write the Rust and C version of some algorithm and both implementations “look the same”, that’s “like for like” in some other sense.
If I write a for loop over an array in C, and use an iterator in Rust to implement the same algorithm, is that “like for like”? If my C streaming is done by hand but in Rust it goes through some trait, is that like for like? I’m generally of the opinion that the target is saying “if you told a user of this language to implement this algorithm, how would they do it?” is decent. But even just things like “I’ll use this library that happens to be hyperoptimized already” just leave the gates open to a lot of bad faith interpretation.
[0]: A, in my head canon, classic example: Haskell making it “easy”(bit in the eye of the beholder) to do stream-based computation meaning that you can enforce code is written in an easy-to-SIMD way for important bits of your codebase, and beat out a “straightforward” C implementation. The core point though being that it would be hard to provide an equivalent abstraction in C! So less work to get to something more performant. Lots of asterisks apply in this worldview though (lots of debates to be had about relying on a sufficiently advanced compiler to get you good results) https://www.cs.tufts.edu/comp/150FP/archive/simon-peyton-jones/haskell-beats-C.pdf
Similarly the kind of projects you end up writing in Rust and C differ. C motivates you to keep your codebase small, where as Rust gives you the tools to not fall apart at larger scales.
I highly doubt anyone would be willing to write our $WORK project in C, but it works just fine in Rust. There is a class of programs where Go/C#/Java would be the default choices and Rust can be made to work too, but choosing C would be a suicide.
I generally find Rust utilities end up growing in scale and size, and C projects staying small but getting polished to perfection. The C tool ends up faster and the Rust tool has too many dependencies to get to that kind of polish, but the more you want the tool to do, the less likely you are to find one in C.
I understand that this is not the point of the article, but one thing that is not mentioned that really excites me about using stronger type systems for low-level programming is that stronger type systems can enable stronger optimizations. The canonical example of this is how Rust added so many noalias attributes that the resulting LLVM IR actually crashed LLVM (this has been fixed for years, but you see my point). Another example is that specialization, which isn’t available yet but may yet be could let you selectively optimize the implementations of certain data types like those that can fit into a SIMD type. There are so many possibilities for strong type systems to improve performance without sacrificing the APIs that we provide people, and that is what really excites me about Rust!
Exactly, I was expecting a mention of aliasing restrictions enabling optimizations that is not possible in C that easily.
stronger type systems can enable stronger optimizations
I think this is what fortran is all about too and why it was so good at making high-performing vectorized code easily. if a compiler has less options (and thus less guessing), it has room to make optimizations on guarantees that it knows is true. I’m glad there’s a general shift going back around to that kind of “more-restrictions” model.
It’s hard to separate the subtle difference between what’s theoretically possible, and what people do in practice.
For example, C doesn’t have easily accessible hashmaps, so implementations tend to default to linear searches, until it becomes a problem:
I don’t see such problems in Rust. The hashmaps are easily available, and often less work than writing a substitute.
Another case is multi-threading. Rust programs can sprinkle parallel iterators here and there, and advertise being ✨blazingly fast✨. In C, there needs to be a good reason to deal with multi-threading. Many C libraries don’t even clearly document their multi-threading compatibility.
Bryan Cantrill has talked about Rust performance with a perspective that was new and interesting to me. The example was faster in Rust because it turns out to be much easier to use a faster algorithm in this case. https://youtu.be/HgtRAbE1nBM?si=Qk-exSPtzuPzDlZK&t=2450
Given that I’m Bryan’s direct report, I obviously find this interesting too!
He also wrote this experience up as a blog post:
“falling in love with Rust” is the post that made me take seriously the idea that it might be worth trying to get a full-time position working in Rust when I graduated, and i’m really grateful for that. i hope your post can do that for someone, too.
Another big factor is that Rust has convenient hash maps and dynamically growing arrays in the std lib while in C there’s a strong temptation to hand roll some shitty linked list
It’s also a small performance trap, since the HashMap is slow due to DDoS concerns, though that’s easy to swap.
Narrator: no it is not
Seriously though if you spend a little time looking at Rust disassembly on nontrivial examples it becomes very obvious. Rust has:
There’s a reason the Rust solutions don’t win on highload.fun
I’d argue that:
A: Rust is faster than C in the “what design choices are people willing to push on their future selves outside the context of benchmark racing?” sense.
B: C is faster than Rust in the “default” style for each language because C encourages you to “assume the happy path without verifying” more often.
B I agree with. But A is false in my experience, where trivial new features can require large refactorings because they necessitate new borrowing schemes.
I’m not talking about what it takes to make the compiler not complain when refactoring, I’m talking about things more in the vein of Stylo (Firefox’s parallel CSS engine) or the pervasive “when in doubt, make a copy” attitude to string processing in C++, where people either aren’t willing to take the risk or aren’t successful when the compiler doesn’t have their back.
See also this:
And it’d be easy to be like “Well, that’s just B-Trees! …not AVL trees.” Yeah, but… the reason I could use a B-Tree and not an AVL tree, is because of that composability of Rust.
A B-Tree in C is gnarly. A B-Tree is just… I mean talk about intrusive. It’s VERY intrusive. It’s very kind of up in everything’s underwear. And someone, of course, on the Internet’s like “Well, you should go to use this B-Tree implementation.” You look at the B-Tree implementation and are like “This is rocky! And if there’s a memory corruption issue in here, I do not want to have to go find it.”
I would still use an AVL tree in C even though I know I’m giving up some small amount of performance, but in Rust, I get to use a B-Tree.
– Bryan Cantrill @ https://youtu.be/HgtRAbE1nBM?t=2450
zero init by default
Where does this come from? Rust doesn’t have default initialization. Read buffers that the caller inits explicitly, maybe?
I was imprecise, should have said all variables require init by default.
I’m still not sure how this becomes a performance issue (again unless you’re talking about cases like read buffers)- Rust merely requires init before use based on dataflow analysis. Surely the C code you’re comparing with doesn’t read from uninitialized variables to a noticeable degree?
No, safe Rust requires you to provide a value at construction time, which is not based on dataflow analysis. For example if you make a very large array, you must spend the time to zero init the entire thing even if you always write to elements before reading them, unless you are lucky and the optimizer can tell what you’re doing.
What I’m saying is that construction time is not tied to declaration time. I’ve been asking if you were talking about the large array case from the start, because that’s really the only situation where this is an issue, and I was trying to understand if you were seeing that or something else.
I mean, it’s more and more of an issue the greater the number of objects you have, and an array is just a means to in effect make a lot of objects at once. I’ve also run into it implementing a memory allocator for small (~32 byte) objects where you could naively dismiss zeroing just 32 bytes as cheap, but when there are lots of allocations happening it slows down the benchmark considerably. The effect can be worse than you might expect too because it’s not just the cost to do the write, that extra code can bloat the function and prevent inlining, and then the lack of inlining can prevent any other optimization.
If you have a lot of separate objects not in something like an array, there is no need to initialize them eagerly.
If you are writing an allocator, generally the responsibility lies on the caller to initialize it, so e.g. zeroing it shouldn’t need to happen.
I’m still not seeing it- where in this “lots of small objects” example would C reasonably just leave the objects uninitialized?
If you have a lot of separate objects not in something like an array, there is no need to initialize them eagerly.
Not sure what you mean by this, in safe Rust there is no way initialize them not eagerly AFAIK. Unless you mean something like wrapping in Option<> but that increases the size of your data structure unless you get lucky with niche optimization so the cure is worse than the disease for perf (fewer of your objects fit in cache so you will pay >10x penalty to access memory when you miss).
If you are writing an allocator, generally the responsibility lies on the caller to initialize it, so e.g. zeroing it shouldn’t need to happen.
Yes that’s right but safe Rust gives you no way to express that. Any backing buffer you create will get zero init. You can’t for example implement a Vec that doesn’t zero init the allocation without unsafe.
I’m still not seeing it- where in this “lots of small objects” example would C reasonably just leave the objects uninitialized
C let’s you leave anything you want uninitialized. If you’re frequently instantiating structs that have a field that doesn’t need to be initialized most of the time you don’t pay the penalty.
When I think about Rust vs C performance it largely comes down to two factors:
I think these are mostly insignificant in general.
“Our Rust-based rav1d decoder is currently about 5% slower than the C-based dav1d decoder” - https://www.memorysafety.org/blog/rav1d-perf-bounty/
If you want to be faster than C, then the only real approach is about aliasing. Rust doesn’t stand out as far as I know. The borrow checker is close: In Rust you often know if you are the only who may modify through a reference. What the type system does not provide is the information that nobody except you has access. That would enable more compiler optimizations.
That means the discussion is just about which language makes it easier to write fast programs. That is not about the language itself but about the ecosystem, like libraries.
Rust doesn’t stand out as far as I know.
Rust puts the equivalent of restrict
on every reference that doesn’t contain an UnsafeCell
, so it is doing a bit more than you assume here, and that is in the type system.
I also think that this is a prominent example, but I think there are also more things than just this, you just really have to get into the abstract machine semantics to talk about it. Rust is also ahead here on provenance, for example.
“Both can inline assembly and therefore are equally fast” is a useless argument. Sure, technically it’s true, but when people are discussing the speed of a language implementation (in good faith), they’re not talking about a thin wrapper on a bunch of hand optimized assembly.
That may be clear to you but probably not to everyone. I think it is useful to point out the (modern-day) absurdity of a statement like “x86 assembly is faster than C,” to clarify how to think about this question. We could get (pretty much) the same machine code out of C, Rust, and x86 assembly, but not with the same amount of effort. So it’s the effort that’s worth talking about, as the rest of Steve’s post does.
I’m curious about what you thought the purpose of that section of this article was about? Do you think it is to actually make the claim (which you’ve stated would be bad faith) that because they both support assembly then the answer is yes?
One reason it’s good to look at degenerate examples like this when making an argument is to understand the constraints of the question better. The assembly perspective is not a good answer not because of “bad faith”, but because it’s non-interesting. It doesn’t inform us, or provide insight into how we might use an answer that “rust is as fast as C” in the real world. When you consider the entire article, you could stop there, but let’s move on and see what’s there at a higher level.
By looking at the most degenerate answer like this, you also start with a good framing to allow less absurd interpretations to be framed as similarly uninteresting answers (or interesting only with a given perspective).
I think this was a pretty good approach to answering this in general. Far from the “useless argument” that you suggest.
I don’t claim to be or know your parent, and I’m curious too, but an old friend of mine once said
Programmers read blog posts like a compiler reads source code: they get to the first thing they disagree with, stop reading, and complain.
Even if that’s not what your parent did, in the aggregate, some percentage of people do do this, which is worth taking into consideration when you read comments on the internet.
(Also: thank you for deeply comprehending the point of my structure! That’s exactly what I was going for.)
That quote is a great observation. I’m sure I’ve been guilty of that many times in my life :D. Thanks for sharing it.
I originally wrote my reply purely as a “well actually”, but figured I’d seek to understand the parent’s reply a bit more to give them an out before giving my view point.
Technically, thanks God it is possible to match C speed in Rust. However you get so much hate from Rust zealots (see for example https://steveklabnik.com/writing/a-sad-day-for-rust), that better I stay with C++, boost, Qt and enjoy my life.