Revisiting Loop Recognition in C++... in Rust

11 points by camblomquist

kornel

If a straight translation from the C++ were made while trying to stay in Safe Rust, most of the pointers would be replaced with references

Oh no, that assumption gets people stuck fighting the borrow checker.

Rust’s references are not general-purpose pointers. They have a much more specialized role of giving only a temporary access to objects, and by definition are incapable of storing data.

Rust has other ways of referring to objects and storing data “by reference”. What each pointer translates to is “it depends”, and unfortunately the details are very important. It may be a Box, or may require Rc or Arc (even where C++ wouldn’t use shared_ptr), or could even be non-pointer and store and pass an object by value.

camblomquist

You’re totally right and I think that at the very least, I should’ve claimed a naïve translation rather than a straight translation. The point was to show that the approach wouldn’t work as a means of justifying a rethinking of the layout that goes against the intent of following the pseudocode to the letter. As both you and I said, it’s a fight with the borrow checker and the user is going to lose this one. As for the actual replacement, I may have had tunnel vision on what replaced pointers. I knew the index as a pointer approach was common enough and was pretty quick to dismiss alternatives like Rc because they don’t play nicely with cyclic graphs.

masklinn

Rust is not garbage collected, opting for compile-time verification of memory lifetime through the borrow checker.

The two clauses are largely unrelated. Rust is not garbage collected, memory management is automated via RAII. For this specific context the borrow checker “just” verifies that a reference does not outlive the referent (that there are no dangling pointers).

Rust has pointers but deferefencing these pointers is limited to unsafe contexts.

Rust has pointers and dereferencing those pointers is just fine: references, box, rc, … are all pointers. Raw pointers are unsafe to dereference.

Rust requires statements be terminated with ;.

Not quite:

An expression that consists of only a block expression or control flow expression, if used in a context where a statement is permitted, can omit the trailing semicolon.

–

Lines not terminated with ; are considered expressions, returning their value.

Expressions are expressions, and thus have a value. An expression followed by a semicolon is an expression statement. I’m unsure there’s anything in rust which cares about lines.

BenjaminRi

Rust is not garbage collected, memory management is automated via RAII. For this specific context the borrow checker “just” verifies that a reference does not outlive the referent

Yes, in particular, Rust and the borrow-checker do not guarantee the absence of memory leaks. I keep seeing a lot of confusion about this. While the RAII-style memory management usually frees everything once it drops out of scope, you can just leak values in completely safe Rust, e.g. with std::mem::forget:

Takes ownership and “forgets” about the value without running its destructor.

Any resources the value manages, such as heap memory or a file handle, will linger forever in an unreachable state. However, it does not guarantee that pointers to this memory will remain valid.

dataangel

The Rust Release build handily beats the C++ in both execution time and memory footprint.

Having done a fair number of these comparisons my two most likely explanations for why are:

Rust has better stdlib containers (BTreeMap/HashMap vs map/unordered_map). Whether this is idiomatic for C++ is debatable because the first thing many people do on a new project is bring in replacements for those.
The author measured wrong

There are no significant sources of overhead at the language level in C++ compared to Rust other than C++ coroutines requiring heap allocation (shouldn’t be used for a graph algorithm benchmark), the theoretical Rust advantage from having better aliasing due to mut being like restrict rarely shows up, and Rust has several idioms that add overhead that would not be in the C++ version (RefCell, extra clone and Rc calls to please borrow checker).

camblomquist

I think defaults matter. If the C++ Standard requires that std::map be a red-black tree, people are going to reach for a red-black tree whether they realize that or not. I don’t think it’s fair to tell users “Yeah the language is great as long as you actively avoid what comes with it.” The first draft went into that in detail but there wasn’t a good place to put it and I wasn’t really willing to put in the effort for a modernized C++ benchmark or one that pulled in something like abseil. It’s worth pointing out that in the original multi-language-bench repository, most of the other contenders reached for a hash map and maybe I should’ve considered that instead. It’s unfortunate that their paper came out right before std::unordered_map was a thing.

As for measuring wrong, the code is available for inspection. The benchmark is the same for both C++ and Rust and the outputs of each executable are the same save for formatting differences that would be in favor of C++. I can’t guarantee that the tricks used by the original loop tester app in C++ managed to work in the Rust version but I also don’t believe that would have accounted for as much a difference as we see here. Still, I could be wrong and probably should’ve checked the assembly but that’s a bit beyond my abilities.
- fanf
  
  I expected the 32 bit indexes to be a win (being half the size of pointers), but in fact they weren’t! I guess the data structures dominate.