Was it really a Billion Dollar Mistake?
36 points by RaphGL
36 points by RaphGL
However, when you need to initialize each value of that slice, you are now turning a O(1) problem into a O(N) problem. And it can get worse if each field in the struct also needs its own form of construction.
It's still O(N) for the OS layer to memset to zero. O(2*N) is still O(N). Also you can just make your types optional that you want to zero initialize. Then you still get the computer to do what you want, and your type system reflects reality.
I'm obviously biased, but I don't sit around all day wondering if optionals were really such a good idea. I don't think about it at all because they're just so plainly a win-win to everyone who touches them. Odin is not 1.0 yet. Be brave! You can still change it!
Zeroing might be more efficient on some systems. E.g. arm64 has a DC ZVA instruction.
And the OS might give you a page that's not actually allocated and will be zeroed on the first page fault when writing to it.
Which shouldn't really affect the decision what to use, but there are slight performance differences.
+1 and it's the same for x86. I think when it comes to ERMSB for rep stos* the stores issued from the backend are at cache-line granularity and multiple stores are merged into a single 64B write (assuming 64B cachelines) saturating LFBs.
Interestingly enough, even if CPUID has ERMSB+FSRM set the performance might still be slower than expected. I noticed this when testing memset on AMD Zen 3. This seems to have been fixed on the more recent AMD CPUs (they mention string ops in Software Optimization Guide for the AMD Zen5 Microarchitecture)
I would assume that rep stos is more generic and can fill with any word. At least in my testing, memset has no difference between 0 and 1 on x86-64 linux (which also uses rep stosb), but memset 0 is faster on darwin-arm64.
Sometimes zero is special on x86 too https://travisdowns.github.io/blog/2020/05/13/intel-zero-opt.html#a-very-simple-loop
At least in my testing, memset has no difference between 0 and 1 on x86-64 linux (which also uses rep stosb), but memset 0 is faster on darwin-arm64.
Interesting.. this thread made me realize that what I was observing in my interpreter in Rust could actually be a Rust standard library phenomenon rather than an OS one.
A year or two ago I had an interpreter where when I initialized the heap with some byte pattern (IIRC 0xABCD or similar) it was significantly slower to allocate a few Gbs of heap, compared to zero initializing it. This was Rust code running on x64 Linux. The heap initialization code looked like vec![0xabcdabcd; size] and vec![0, size] for the zero initialization.
I've always assumed that there must be some CPU instructions or similar for zero initialization which is why it's faster, but I wonder if this could be a Rust-level thing, maybe the std allocator zeroes allocated pages ahead of time and so zero initialization becomes no-op?
I think there's two distinct phenomena happening in your case:
First, every newly allocated memory page will be zeroed by Linux when faulted in. That means for vec![0; size] there's often nothing to do in user space, but for vec![0xabcdabcd; size] you initialize the heap twice: once when it's zero-filled in kernel space, and once in user space when you're writing 0xabcdabcd.
The other detail is that when requesting memory via mmap, the physical memory is usually not allocated unless you attempt to write to it (think CoW). Instead, all pages initially just point to a dedicated "zero page". That means vec![0; size] might return almost instantly, and the cost of initialization is paid when you actually modify the memory. You can use flags like MAP_POPULATE to tune this behavior, but allocators generally don't prefault memory.
AMD has a vendor-specific CLZERO instruction which on some uarches can have more throughput than rep stosb or NT stores. Haven't tested on recent epycs yet.
The IBM 3090 had special instructions that would instruct the memory controller to very quickly clear at least entire cache lines without reading them. The OS had special mechanisms to provide whole pages of zeroed memory, not sure if that used an additional mechanism or just the previous mechanism.
I am going to argue that in practice, most invalid memory addresses are not null in those languages.
I argue this misses the point. Null safety is not about protecting from most invalid memory accesses. It's about having a possibility of having clarity that some pointer cannot be null. In my opinion, it's a useful piece of compiler-enforced documentation, and it makes me wonder less.
Odin does have Maybe(^T), and that’s fine that it does exist, but it’s actually rare it is needed in practice.
It's not very useful in a language where there is no way to express a non-nullable pointer. No wonder.
The architectures that arise from this mindset tends to be based around things like:
- Thousands/millions of new/delete or malloc/free
- Data structures full of pointers which point to other things
- RAII
- “Smart” pointers and reference counting
Zig has non-nullable pointers and explicitly rejects RAII, smart pointers, and hidden allocations. It allows for data-oriented patterns. Non-nullable types don't force a particular mindset in this regard, I think.
A major footgun of nulls not discussed here is that if your type system doesn't differentiate nullable types then your APIs can't either. What does Map.get return? If your type system can encode this idea of "maybe it's not there", then your API becomes a more exact inscription of the behavior of your program.
I do not think the nil pointer problem is as much of an empirical problem in practice. I know a lot of people want to “prove” things whenever they can at compile-time, but I still have to test a lot of code in the first place, and for this very very specific example, it is a trivial one to catch.
Can we compare Sentries? I do think it's an issue that needs to be solved at the language level because NPEs keep slipping through the cracks in our Java being deployed, while Kotlin never has that issue (unless it's using an ill-specified Java API (i.e. it returns null w/o @Nullable)). We want to "prove things" at compile-time so we don't have to pay for the Sentry quota being full & for the shitty UX of an error in prod. Static analysis is how you prevent that (yes some exist for Java regarding nullable types and yes I do need to enforce that but damn is that a lot of code to update). I'm not quite sure what Mr. Bill is intending here but damn do NPE/segfaults hamr the usability of much of the software everyone is using.
n.b. In many ways, this individual-element mindset is a hell of a lot more costly than worrying about the “Billion Dollar Mistake” of this article. The performance losses alone are probably wasting BILLIONS PER DAY as an industry.
I think the "grouped element mindset" thing is a neat phrasing of an important idea that many programmers need to learn, but I wish the prescriptions for some of the major issues plaguing software were better than "arena allocator good". Please just make your language more complex so the engineers using it don't have to waste time debugging things that can be caught at comptime.
In Python, we would frequently see a string operation fail on a None and get logged in Sentry, but in Go, you can't confuse string and *string, so I haven't seen that happen. In theory, we should be seeing null pointer exceptions for objects in Go, but in practice it doesn't happen much. I think nullable strings in particular are a design mistake (or at least there should be an option to make them non-nullable), but nullable objects are less likely to lead to real bugs.
It would be nice though to have options for more numeric behaviors: saturate on overflow, panic on overflow, use big int on overflow, etc. I think there are a lot of latent bugs around numeric behaviors that just never get caught because they don't trigger often enough.
a lot of latent bugs around numeric behaviors that just never get caught
Every time I see an ArithmeticException from division by 0, I am reminded that Pony makes dividing by 0 equal 0. It probably costs a branch at every usage site, and you could build a formula for the tipping point of when the error reporting costs outweigh the CPU time spent branching. I'm a fan of these "safe and costly by default" with noisy operators for unsafe functions.
Pony makes dividing by 0 equal 0
That seems cursed. I've read the link but it doesn't fully explain why they make it so ("Baffling right? ... From a practical standpoint, it is very much not" but then they don't explain it how it's not baffling from a practical standpoint).
Are all exceptions checked in Pony? I can't think of another reason why you wouldn't want n / 0 to not throw.
It makes more sense in the context of error handling amd partial functions in Pony.
It’s also common to have n / 0 == 0 in proof assistants so Pony is in good company.
It's also how newer ISAs define integer division by zero. On ARM64 and RISC-V signed and unsigned division never trap and in particular dividing by 0 yields 0. Incidentally, the fact that division by 0 (and signed division overflow) traps on x86 (and some other ISAs) isn't much help in languages like Rust where division by 0 is defined to panic even in non-checked builds. Unless you want to take a totalitarian approach where the language runtime owns certain signal handlers (which isn't appropriate for a language like Rust) it doesn't seem practical to transform signals into unwinding panics.
Unless you want to take a totalitarian approach where the language runtime owns certain signal handlers (which isn't appropriate for a language like Rust) it doesn't seem practical to transform signals into unwinding panics.
For completeness, Rust does intercept stack overflows with signal handlers on Unix (and vectored exception handlers on Windows): https://github.com/rust-lang/rust/blob/0d162b25edd5bf0dba9a22e83b614f1113e90474/library/std/src/sys/pal/unix/stack_overflow.rs#L76. It's still not an ideal situation (e.g. see the comments) but at least it's a fatal error and doesn't require panic-style resumption and stack unwinding. You'd also need much more extensive metadata to determine if a faulting div is in Rust code compared to the per-thread stack guard page.
But in Go, you often need to return the empty string in failure conditions, which is similarly bad to returning nil. Sure, the program won’t crash if mistakenly accessed, but it’s easy to introduce logic bugs.
I wish that forty years ago all the twos complement integer types had reserved the all 1 IntMin for "undefined". Having one more negative number than positive number in the integer set causes subtle errors, and conversely, it would be better if integers could signal null without needing to reserve more bytes.
Better would have been (using the 8-bit value as an example) -128. On 2s-complement machines, you can't negate this value (as remains -128). I like to think of this value as ±∞.
That's exactly what she said.
40 years ago there wasn't even universal agreement about 8-bit bytes and the CDC series used one's complement well into the 1980s. In addition, your proposal would have made it extremely difficult to chain 8-bit operations into 16 or 32 bit operations.
Besides, you don't need to allocate this to hardware as your software compiler is more than capable of handling this.
Standard ML had what you wanted back in the early 80s. It never caught on. And I would argue precisely because of these kinds of features. If you couldn't port your programming language using the native assembler and some keyboard time, it was effectively dead on arrival. BASIC could be ported. C could be ported. So could Pascal. Schemes were notoriously bootstrappable. Ada ... not so much. ML ... not so much.
I don't think the correlation between porting and later popularity was coincidence.
40 years is too recent. 8 bit bytes and two’s complement was the standard for new designs by the 1970s, with very few exceptions. There were some large systems with word-oriented addressing that continued being sold and used for a few more decades, but they were designs from the 1960s. The exceptions were tiny embedded designs with strange widths, such as the PIC.
40 years ago there wasn't even universal agreement about 8-bit bytes and the CDC series used one's complement well into the 1980s.
Yes. That's when the consensus for 8-bit bytes and 2s compliment were still solidifying, so the last moment we could have standardized something else. Now it's pretty much too late.
Yes, it was. The author completely misses the point and that is kingdof proof that it is indeed pretty much an unnecessary source of bugs.
If you take a java codebase and replace all nullable occurrences with a maybe monad, then you are insisting on incurring with the exact same problem, just with fancier syntax. I'll concede that all nullables would become explicit. But if you send up with hundreds of them all over the place, then a maybe monad is hardly helpful.
The point is that we rarely need nullable references. Most people don't get this and that is why it's a huge problem. The article just assumes that "all nullable pointers" are just there as a fact of life. This is already missing the point. Most have no reason to be nullable.
In decades of programming, I would say, virtually all null pointer errors I have encountered were unnecessary. Of you write a function that does a mathematical operation over two numbers, why would you pass it a null? If you accept a list or array, why accepting null? There is no need for null rather than an empty container. Same for strings.
yeah i figured that during my current project.
null is still useful, but as an opt-in:
foo { }
foo(x="") { }
foo(x="y") { }
where i have a type ?[]const u8 (zig optional string) which can be null (absent), "" (empty) or non-empty.
but i actually only need this for inheritance processing during DOM construction.
As most attributes also don't allow empty values, the empty string already marks the absence of an attribute.
So yes: I fully agree that must stuff has no reason to be null.
The whole argument here is that zero initialization is fast or I missed something? If so, who the hell cares! :D
The whole C-mindset is basically being penny wise and pound foolish, focusing so much on some irrelevant minute performance wins, ignoring the bigger picture. In the real world Rust software is continuously replacing legacy C and C++ projects while showing much better performance because Rust makes concurrency and higher order composability easier and more robust. You can glue couple of highly-optimized Rust dependencies, add your logic, and get a great performance, robust and easy to maintain software that is easy to keep correct in the presence of external contrubotions, and so on.
Yes, data oriented architectures, cache locality, "Grouped-Element Mindset" are all great, but you can do all that without zero-initialization and null pointers just fine. Zero-initialization optimization here is a negligible detail. And in real software, written and maintained by multiple people, combining multiple components, etc. any gains in zero-initializat are going to be lost on unavoidable redundant null-pointer checks all over the codebase, etc.
In Rust you can also abstract over zeroable types with something like the Zeroable marker trait without forcing everything to be zeroable. And you could have a transparent wrapper Zeroed<T> to certify that the T value is zeroed and use that to provide safe bulk transmutes/casts to and from zeroed memory with no runtime overhead, e.g. if you want to initialize directly from mmap-zeroed memory or allocate_zeroed; I've done similar things in my own Rust projects.
I think there are real advantages to making everything zeroable, but you do have to give up a lot in return.
The whole argument here is that zero initialization is fast or I missed something? If so, who the hell cares! :D
I found this ironic because the author kind of accuses the proponents of option-like features to be focusing on too much on details.
The problem wasn't "null". The problem was not having 1) slices and 2) Optional and then overloading both of those onto "null".
Having slices means that you have an explicit length rather than trying to use some idiotic value as a sentinel (eg. null or NUL). Having Optional means that you get language support for "undefined" rather than trying to shoehorn that in with "null" (even if the compiler reduces that to "null" internally).
And, as a bonus, if you have a language with tagged unions (eg. sum types), then you even avoid accidentally using some field as a null pointer when you shouldn't because you dereferenced a type incorrectly.
After you filter off those usages of "null", there's ... almost nothing left. Pretty much the only reason left to use "null" at that point is C interop.
However, I'll defend the elders. Slices would have roughly doubled the size of pointers back when memory was scarce. An Optional data type would have significantly increased the complexity of the compiler at a point where memory, disk and CPU were expensive.
We of the the 21st century take nigh-infinite, nigh-instantaneous compute resources for granted without much thought.
This post is incredibly condescending for what amounts to an aesthetic preference. I strongly prefer mandatory null checking because NPEs in production are very expensive both for the people operating the software and for the customers who rely on the software working. That’s the nature of every domain I’ve worked in and has nothing to do with me not evolving in my “software journey”. If the cost of NPEs is lower in your domain, godspeed. I’m not here to tell you you’re wrong, but there’s no need to insult people in domains with different requirements.
The first problem was that ALGOL W didn't have an explicit dereference operator (i.e. ^ in Pascal or * in C), so there had to be some special way of checking that a reference was (intentionally) undefined.
The second problem is that later languages (and the hardware and OSes that formed their platform) had limited capability of tracking invalid references/pointers. ALGOL W was heavily oriented towards automatically-managed references, but once BCPL, Pascal and C introduced manual allocation things became a nightmare. Of course, Hoare himself commented (in an ACM lecture?) on the desirability of /never/ disabling runtime checks, the billion-dollar mistake is was made by the people who didn't listen to him.
The third problem was that in later languages it was possible for an inexperienced programmer to manipulate references/pointers in invalid ways.
There has to be some way of checking whether two references (rather than their referents) are identical to handle circular lists etc., and there has to be some way of checking whether one is intentionally undefined. Since there isn't a ^ operator which can be omitted to force this, limited use of ==nil is about as good as anything.
If you want to make pointers not have a nil state by default, this requires one of two possibilities: requiring the programmer to test every pointer on use, or assume pointers cannot be nil.
Swift has their cake and eats it too on this by having types be potentially nullable, in a way which works more like a sql nullable column than an ADT. T? Is like T | null ... this reduces the amount of infrastructure needed to support avoiding the null problem by default while giving an easy out for people in messier situations
I find it good for languages looking for an easy out here. I prefer other languages that happen to have fuller ADT support but it's a lot of work to make it work nicely
? in Swift is just sugar for Optional which is a sum type.
https://developer.apple.com/documentation/swift/optional
It can be nested unlike the TypeScript-like union type syntax you've written.
T? and T | null are exactly the same as Option<T>. It generally gives you some syntactic sugar for .unwrap(), but there's no rule that the latter can't have that sugar. I find that the explicit monad composes better in the type system (Option<Option<T>> is useful sometimes).
Is it exactly the same? Can you express Option<Option<T>>? (you might think this is purely academic, but it crops up when composing types - imagine a dictionary with value Option<T>, ie. a key can be present with a None value. What type does get return? Is there a way to differentiate missing key from key present with none value without needing to call a separate contains_key function?)
sure you can. It is T??.
And by that you mean T | null | null, which would mean "no you can't", unless I miss something?
Nope! That's incorrect. It is a disjoint sum rather than a union, as in T + null + null. You can test this out yourself:
func some<X>(_ x: X) -> X? { x }
func none<X>() -> X? { nil }
let nilnil: String?? = none()
let nonNilNil: String?? = some(none())
print("Are \(String(describing: nilnil)) and \(String(describing: nonNilNil)) equal? \(nilnil == nonNilNil)")
The output is:
Are nil and Optional(nil) equal? false
So, as this very simple test obviously verifies, you can express Option<Option<T>>. That's because T?? is literally syntactic sugar for Optional<Optional<T>> in Swift, which you are free to write if you don't like writing T??.
If you had clicked through to the documentation that was linked by typesanitizer three hours before your comment, this would have been apparent: https://developer.apple.com/documentation/swift/optional. That documentation explains that ? is syntactic sugar for an Optional type.
The analogy I like to use is complaining that French cannot do the English equivalent of Iambic Pentameter.
Hey, I'm French! I had no idea we could use stress syllables to do poetry!
But damn my ignorance, of course we can!
Hot take: Python 3 was a billion dollar "mistake".
The companies where I worked at the time must have spent millions worth of engineering effort on it and they are not even Python shops. Pretty much every software engineer has burned some time on it one way or another, not to mention the engineering support teams that have spent months on it.
I have no doubt that this has added up to billions globally.
It didn't affect us at all, besides having to read the gripes. We ignored it for five years, started kicking the tires for a couple more, and then started porting to 3.6 for improved dicts and f-strings.
If you used 2.7 idioms and logging early there was little to do besides rename imports, which could be automated.
The big exception was if your project was doing a lot of character encoding. That would be a good argument for a potentially costly rewrite.
Perhaps it was, but also consider it a multi trillion dollar feature. Consider the value of software produced by languages that feature this “mistake”.