Memory Safety for Skeptics
18 points by steveklabnik
18 points by steveklabnik
It's interesting to me how this has shifted. When rust first appeared the main things to me were: no manually malloc, no manual free. Aka no memory leaks or use after free.
These days rust considers memory leaks to not be part of "memory safety" but all kinds of other stuff (thread safety for example) gets included.
Absence of memory leaks is liveness property, not a safety one.
That’s interesting. Drawing the distinction in the sense of finite traces not being able to illustrate a memory leak?
Yes. If a finite prefix of the computation fails to free an object, then, well, you might simply not have reached the instruction that frees it yet.
Type checkers for Turing-complete languages usually have no business trying to establish liveness properties.
Yes, however there is a stronger safety property that also implies practical leak freedom: "Allocation X is freed before lifetime 'a ends". (edit: or, more precisely, " lifetime 'a does not end while X is still alive.")
Notoriously, Rust has affine types, not linear types, so it cannot prove these either (hence the Leakpocalypse), but there are type systems for systems languages that can.
In a linearly typed, but Turing-complete language, what's preventing me from writing this?
let x = whatever;
loop {} // loop forever
drop(x); // necessary because now values must be explicitly consumed
Linear types + guaranteed termination/productivity is what you need to establish the absence of memory leaks.
Nothing prevents this, hence me saying "practical leak freedom". Linear types allow you to prove that e.g. in a server, memory allocated to handle a specific request does not outlive the request handler. loop {} throws a wrench in everything and leaks not only memory but also an entire thread.
lifeness is a safety property, not a security one.
if your program crashes due to OOM, and people die, the program was not safe.
I'm using the technical definitions of safety and liveness used in the verification of concurrent algorithms.
Neither safety nor liveness is a single property; each is a whole class of properties. The only property that's simultaneously a safety and a liveness property is the trivially true property that all algorithms vacuously satisfy.
These days rust considers memory leaks to not be part of "memory safety"
This has been the case for ten years now, it was shortly before 1.0 that this was figured out.
Yes I'm sure that's true. I'm at that terrible age when things that seem a age ago in internet years are still "wait what happened" for me sometimes. I'm beginning to understand how my grandma feels
I never found that part interesting, because I'd already been using unique_ptr (Box) and shared_ptr (Arc) in C++ for ages. Probably the most interesting part was the fact that an immutable reference to a Box actually prevents you from getting a mutable reference to anything within the Box. With unique_ptr<Thing> const&, the pointer itself is immutable, but Thing is not! How misleading and useless const in C++ can be sometimes.
The article uses this definition of memory safety:
There is, however, a rough consensus among practitioners of what kinds of program behaviors are memory unsafe. That's a good place to start.
My favorite short definition comes from Michael Hicks, an academic who works on programming languages:
"[A] program execution is memory safe so long as a particular list of bad things, called memory-access errors, never occur:
• Buffer overflow
• Null pointer dereference
• Use after free
• Use of uninitialized memory
• Illegal free (of an already freed pointer, or a non-malloc-ed pointer)"
Why is null pointer dereference in there? Doesn't that have a well defined behavior in modern operating systems and runtimes of throwing a segfault? The other ones, I agree, would be unsafe since they can cause arbitrary memory corruption, but null pointer dereference by itself can't, right?
@ekuber gave you a good answer already, but here's a slightly different version, my take:
There are two things going on here, and you're conflating them:
"well defined behavior in modern operating systems and runtimes of throwing a segfault" only happens if you're in an environment that protects against this from happening. But you're not always in those environments.
Secondly, at the language level, it is UB. This means that if you write code that tries to dereference a null pointer, you may never even get to the "we try to access address 0" part of things. The program is malformed, and you may get that access at 0, you may get no access at all, you may get a variety of behaviors. So even within those contexts, a segfault is a good secondary line of defense, but does not prevent the problem.
To make a (probably bad) analogy: segfaults are like airbags. Are they good in a crash? Yes. But they're not perfect, and by the time you've gotten there, something has already gone wrong. Ideally you don't crash into anything in the first place.
A typical concrete example of how a segfault is not guaranteed is when you're accessing into some data structure through a null pointer. If you're lucky, you might only try to access a field a few bytes in, and you'll still hit the guard page. If you're unlucky, the offset might be much larger (consider a dynamic index into an array) and it could go anywhere in memory. The OS can't do anything about that case, but the language can.
The OS can't do anything about that case, but the language can.
The OS can, by reserving a really large VM space starting at 0x0 as a guard region. For example, macOS reserves a 4 GiB guard region: https://stackoverflow.com/questions/23188378/mach-header-64bit-and-pagezero-segment-64bit
They said "consider a dynamic index into an array", which is not limited to 4GiB on a 64-bit system if the attacker discovers a way to inject an arbitrary integer.
Plus, special-casing null returns doesn't fix the more general problem of conflating "no value" with "value of 0".
(To mitigate it more generally, the OS would basically have to put every single array in its own VM page, followed by huge runs of guard pages, and push developers to not amortize the cost of calling malloc... which has its own efficiency problems beyond that.)
...plus, that still doesn't address things like walking off the end of an array embedded in a non-final position in a struct.
Fundamentally, this is all the same sort of thing that you get from stuff like mod_security, where you're trying to guess at the program's intent after the fact rather than just properly getting the relevant metadata at the appropriate place and then plumbing it down to whatever layer is best suited to enforcing it.
(Have you ever tried to discuss programming on a forum, or set a search for Macintosh System 7 CDs on eBay, only to get an Error 403 because the libc calls you're trying to discuss are common exploit vectors? I have.... Seriously. Try quoting the "Gotta Go Deeper" section of Let's Be Real About Dependencies on Phoronix or experimenting with crafting an eBay Saved Search using system and eBay's parenthesis-based OR syntax.)
only happens if you're in an environment that protects against this from happening. But you're not always in those environments.
6 months ago I’d have argued that this was a hypothetical that one would be unlikely to encounter in reality. And then, 5 months ago, I discovered very painfully that the microcontroller I’ve been using for 3 years does, in fact, allow you to read from address 0. Or, in my case, address 12 because the pointer I was ultimately dereferencing wasn’t the first field in a struct pointer that was null. Hadn’t encountered it yet because we don’t do any dynamic memory allocation at the application level and hadn’t actually had any null-pointer issues until we discovered one inside the OS’s network stack.
"well defined behavior in modern operating systems and runtimes of throwing a segfault" only happens if you're in an environment that protects against this from happening. But you're not always in those environments
I don't see the point of discussing these things in extreme generality. Yeah, sure if your execution environment lacks a MMU and has no way of trapping on null dereference (or rather trapping on 0 + smallish offset), let's discuss that separately. But that's like an extreme minority of use cases.
IMO the rationale for memory safety is that they prevent vulnerabilities and serious bugs. So if your language and runtime allow for null dereference without arbitrary memory corruption, it shouldn't be considered to be an unsafe memory access pattern.
Secondly, at the language level, it is UB.
That's mostly a C/C++ thing. Other languages like Java/Go* don't treat it as UB.
“Extreme generality” is the point of discussing language semantics. You don’t get to just ignore cases.
For the rest of it, I agree that there are other kinds of memory safe languages, I’m talking about Rust and C and C++ in my comment, sure. Which is also a reason to care about those environments.
That's mostly a C/C++ thing. Other languages like Java/Go* don't treat it as UB.
I'd suspect its the other direction. Most languages with the concept (citation needed) match C/C++ - because most of them are using runtimes implemented in it or compiling with LLVM or so on.
And to go a step farther I'd argue that memory unsafety is a defining property*. That golang doesn't have "null" pointer dereference, it has "nil" pointer dereference. That Java (and other similar languages) are best understood by considering their "null" to be a different word from C's "null" that just happens to be a homonym.
* I'd actually except two versions of this in the definition. One is "it's just undefined" the other is "it's just accessing memory address 0 (plus offset as usual) and doing whatever happens when you do that". Both of which are memory unsafe when you don't actually intend to access memory address 0.
Null pointer dereferecence protection provided by an OS still means it is not a property of the language. You can argue that C-on-Linux is a different thing than C-on-microcontroller, the former being safer than the latter. But you need to make that caveat always ("null pointer dereference is not a memory safety problem in C (on Windows/Linux/MacOS)". If it is a property of the language, you can be more succinct in your communication ("pointer dereference is not possible in safe Rust"), and having language-provided facilities to more properly handle then (like NonNull) would also help more graceful handling of these by not only ensuring a crash (instead of relying on the OS always), but also a helpful reminder provided by the type system to handle that case more gracefully than a complete crash when possible.
There's also a dangerous variant of accessing a null pointer with an offset. C will happily do arithmetic on any pointer, and such gadgets can be used to access arbitrary memory addresses (BTW, 0["hello"] is a perfectly cromulent C.)
Doesnt that just yield the h from hello?
Yeah and there’s no null pointer in sight. The weird commutative + hidden in [ ] isn’t much relevant to memory safety.
No. That'd be "hello"[0].
0["hello"] will give you 104 by taking the raw ASCII value of the first character of the string and then calculating an offset that many bytes forward from 0.
"hello"[0] and 0["hello"] mean exactly the same thing because a[b] desugars to *(a+b) and pointer+integer addition is commutative.
There’s nothing in ["hello"] that causes "hello" to be dereferenced in the way you described.
Well, then I'll admit that I only ran a test program with nothing but 20-year-old one-semester-of-university-C for theoretical knowledge.
(I've always avoided memory-unsafe languages for anything riskier than MS-DOS retro-hobby work out of a sense of responsibility.)
ssokolow@monolith-tng ~ % cat << EOF > test.c
#include <stdio.h>
int main(void) {
printf ("%d", 0["hello"]);
return 0;
}
EOF
ssokolow@monolith-tng ~ % gcc test.c
ssokolow@monolith-tng ~ % ./a.out
104
ssokolow@monolith-tng ~ % python3 -c "print(ord('h'))"
104
(That last line was checking my hypothesis since I haven't memorized ASCII and didn't feel like manually counting forward from a=97)
Well, "%d" is the format string to print an int, so no wonder you're getting a number out. At least, char is a numeric type smaller than int, and those get automatically upcast to int, so there is no undefined behaviour here.
Well, "%d" is the format string to print an int, so no wonder you're getting a number out.
Yes, because I wanted to see a numeric interpretation of what 0["hello"] was doing. The point is that ["hello"] produced the value of the first character, offset from 0, not the memory address of its first character, offset from 0.
Why is null pointer dereference in there? Doesn't that have a well defined behavior in modern operating systems and runtimes of throwing a segfault?
Because any argument which applies to buffer overflows also applies to interpreting the bit pattern which signifies a null as a pointer and attempting to dereference it. They're both spatial memory safety bugs.
To continue ekuber's point, the invention of CHERI chips did not cause buffer overflows to cease to be memory safety bugs just because it introduced a platform with a runtime mechanism to catch buffer overflows as reliably as null pointer dereferences. Thus, going in the opposite direction, null pointer dereferences didn't cease to be spatial memory safety bugs as memory-protected OSes took market share from OSes like DOS and use of bare-metal programming in embedded and game console spaces.
A language can only claim memory safety if it's specified to have that property on all platforms.
A language can only claim memory safety if it's specified to have that property on all platforms.
Surely you mean all supported platforms. It's meaningless to talk about platforms which most modern languages don't support anyway. Is there a Java or Go runtime that runs on DOS? If not, why do we care what happens in that hypothetical scenario?
To continue ekuber's point, the invention of CHERI chips did not cause buffer overflows to cease to be memory safety bugs just because it introduced a platform with a runtime mechanism to catch buffer overflows as reliably as null pointer dereferences.
It does, if you consider memory safety holistically for the entire system rather than just looking at the programming language spec in a vacuum.
Ok, let's frame this in a different way - would you agree that bounds checking is an important aspect of memory safety? If not, you have to throw away the concept of arbitrary indexing completely, right?
Something like CHERI is similar to bounds checking, except that instead of dealing with slices/ranges you're dealing with arbitrary pointers. So a system with CHERI would indeed be memory safe even with buffer overflows just like a language with bounds checking can be memory safe with respect to out of bounds indexing.
Surely you mean all supported platforms. It's meaningless to talk about platforms which most modern languages don't support anyway. Is there a Java or Go runtime that runs on DOS? If not, why do we care what happens in that hypothetical scenario?
No, I mean that the language spec, whatever form it takes, must require the compile-time or runtime checks which enforce memory safety as a prerequisite for an implementation being considered compliant.
For example, if a retro-hobbyist decides to revive the GCJ Java-to-machine-code compilation support from GCC 6 and below and port it to DOS to be part of DJGPP, then Java does not cease to be memory-safe if doing so introduces opportunities to invoke memory-unsafe behaviour... it just becomes a non-compliant implementation.
I used DOS as an example people are familiar with but, even if you point to bare-metal microcontrollers, of which there are many (eg. the bare-metal AVR and ARM microcontrollers in Arduino boards), whether a language is memory safe is a property of the language, not the platforms it runs on.
(And, in case you find it relevant, yes. Retrocomputing hobbyists are still writing new software that connects to the Internet for old platforms. See, for example, https://amendhub.com/ )
It does, if you consider memory safety holistically for the entire system rather than just looking at the programming language spec in a vacuum.
We're talking about why a null pointer dereference is memory unsafety. Holistic analysis is a red herring. Otherwise, you might as well declare the iPhone memory safe on the basis that it refuses to run apps not vetted by Apple. Not a very useful definition, I hope you'll agree.
Fundamentally, the only difference between a null pointer dereference and a use-after-free is the abstract concept of "how we got to this point".
Something like CHERI is similar to bounds checking, except that instead of dealing with slices/ranges you're dealing with arbitrary pointers. So a system with CHERI would indeed be memory safe even with buffer overflows just like a language with bounds checking can be memory safe with respect to out of bounds indexing.
Red herring or not, it's entirely possible for a memory-unsafe language to allocate such that CHERI sees a valid contiguous allocation which, unbeknownst to its checks, is actually subdivided further into an overflowable buffer followed by something else in a way that renders it exploitable. It's akin to how you can get the user-visible effects of a memory leak in a garbage-collected language even if the memory isn't technically unreachable. (I'm looking at you, stale event handlers in website JavaScript.)
No, I mean that the language spec, whatever form it takes, must require the compile-time or runtime checks which enforce memory safety as a prerequisite for an implementation being considered compliant.
So you're talking about language memory safety, which is fine, but programs can be memory safe, and even individual executions of programs can be memory safe, and CHERI ensures that all executions are memory safe. This seems to be so often forgotten or misunderstood. Earlier on you said that a null pointer dereference counted as a memory error:
Because any argument which applies to buffer overflows also applies to interpreting the bit pattern which signifies a null as a pointer and attempting to dereference it
But that isn't true, except on execution level (and even then depending on characteristics of the environment/platform). So you seem to me to be mixing up the different concepts of memory safety.
Holistic analysis is a red herring. Otherwise, you might as well declare the iPhone memory safe on the basis that it refuses to run apps not vetted by Apple.
That's just a non-sequitur. The claim wasn't that any holistic approach to program quality counts as "memory safe". It has to actually prevent memory errors.
and even individual executions of programs can be memory safe
This strikes me as an absurd use of the word, it's like saying "I safely drove down the highway at 200km/h with my eyes closed" because I didn't hit anything this time. Safety isn't about a singular result, it's about a guarantee about the result before you do the thing.
Can a program be memory safe absent tooling proving it so? Eh, you could probably claim that sufficiently close inspection of a singular program rendered it so. At any sort of scale the only things that are safe are the things that have been shown to be safe with a tool though. Practically this means as a property of the subset of the language used for almost all memory safe programs (though heroic efforts have been successfully made to prove a handful of programs memory safe in languages not designed for it).
This strikes me as an absurd use of the word, it's like saying "I safely drove down the highway at 200km/h with my eyes closed" because I didn't hit anything this time. Safety isn't about a singular result, it's about a guarantee about the result before you do the thing.
But safety of a particular execution is about a single result, that's exactly what the borrowed definition in the post is referring to:
"[A] program execution is memory safe so long as a particular list of bad things, called memory-access errors, never occur:
Note "program execution is memory safe", not "program is memory safe" or "language is memory safe".
Can a program be memory safe absent tooling proving it so?
This is actually a ridiculous question. That things can be true without even being provable was itself proven, by Gödel I believe. Of course a program can be safe absent tooling which proves it so. The creation of tooling which proves a program memory safe does not magically transform said program from being unsafe to safe.
Eh, you could probably claim that sufficiently close inspection of a singular program rendered it so.
No, it's a property of a program that's irrespective of the existence of any proof (or "inspection", or whatever). When you prove a property you are showing that the property already held, you are not making it hold by means of the proof.
At any sort of scale the only things that are safe are the things that have been shown to be safe
That's nonsense.
You're interpretation would render "safe" and "correct" synonymous, but they aren't, they're different words. Safe doesn't mean nothing went wrong, or even nothing could have gone wrong, it means that you know nothing could go wrong. Driving down the highway with my eyes closed was not safe just because nothing bad happened this time.
Jumping off a cliff into a lake isn't safe, unless you know that the lakes deep enough you won't hit the bottom hard. You render jumping off the cliff safe by checking the lake is deep without modifying anything about the lake, the cliff, or the jump. Similarly you render executing a program safe by checking it won't do anything harmful - and even if it would not do anything harmful it was not safe until you checked.
Things can be true without being proven true, but not all properties are like that. The obvious example of such a property is the property of being proven true - it doesn't hold until you prove it even though the underlying object was true before being proven true. Safe is a property like proven, it is the property of being known to be not dangerous - and known is very close to a synonym for proven. Memory safe is a particular category of safety and it inherits that. It is about knowing that a set of bad things will not happen.
Edit: And to be clear, I disagree with the cited definition as well
You're interpretation would render "safe" and "correct" synonymous
I don't see where you're getting that from at all, sorry.
Safe doesn't mean nothing went wrong, or even nothing could have gone wrong
When talking about memory safety, that's precisely what it means (assuming by "went wrong" you mean "memory error occurred").
Jumping off a cliff into a lake isn't safe
Jumping off a cliff has nothing to do with memory safety.
You want the "safe" in "memory safe" to mean exactly what it does when free of that context. It doesn't, though.
Safe is a property like proven, it is the property of being known to be not dangerous
Again, you are conflating "safe" with "safe as in memory safe". They are not the same.
So you're talking about language memory safety, which is fine, but programs can be memory safe, and even individual executions of programs can be memory safe, and CHERI ensures that all executions are memory safe.
I’d frame it slightly differently. CHERI causes a program to stop executing before a memory safety bug causes a breach of confidentiality or integrity.
That is a useful property, in particular for dynamic error handling: your program is in a valid state when you enter an error handler.
That said, the definition that I like for memory safety is the one that we wrote for our 2023 MICRO paper on CHERIoT. We were aiming for a hardware venue and so had to express memory safety as a hardware property not a property of the language and that constraint made us think about what we were really doing.
We defined memory safety as a set of properties that one language implementation may wish to enforce in the presence of code generated by another (which may be simply an assembler). Framing memory safety as a language interoperability problem is how I’ve always thought about it (and why I started working on CHERI)l but it wasn’t something I’d properly written down until then.
but programs can be memory safe, and even individual executions of programs can be memory safe
The problem with this framing is that it cuts out latent hazards. In accident analysis and safety engineering in general, latent hazards are important because accidents happen when there are too many hazards and the safety controls fail to mitigate the hazards. Think of the swiss cheese model.
A better way to phrase it is: This program was written in an unsafe language but other safety controls were applied to compensate for the hazards of using an unsafe language. This execution of a program didn’t go wrong, because despite a number of latent hazards and missing safety controls, it luckily avoided an accident. This execution of the same program was exploited by a remote code execution attack because the latent hazards turned out to be a security vulnerability.
A better way to phrase it is:
I disagree. I think what you said and what I was saying are very different. I think you have misunderstood my point, which is one of terminology. There are established meanings for "memory safe" as applied to executions, programs, and languages. A program can be memory-safe even if it was written in an unsafe language and even if there weren't any "other safety controls" applied.
OK, I'm starting to get a sense of where the miscommunication likely lies:
So you're talking about language memory safety, which is fine, but programs can be memory safe, and even individual executions of programs can be memory safe
To me, "memory safe" on the scale of an individual program or an individual execution of a program is a purely an exercise in debating how many angels can dance on the head of a pin, because verifying that such a property holds true at a scale smaller than "language X on runtime/platform Y" is infeasible, given how real-world programmers use programming languages.
(Yes, NASA works miracles, but the lengths they go to in order to achieve their results are not practical for the rest of the programming world.)
As Dijkstra wrote in "The Humble Programmer", "Program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence", which means that you need some kind of proof system (In Rust's case, embodied in the language semantics and type system) and some kind of underlying runtime which upholds the invariants which the proof system assumes.
and CHERI ensures that all executions are memory safe.
CHERI is sort the edge of what I consider "usefully memory safe" because, while it does serve the role of implementing an extremely strong analogue to the memory safety guarantees provided by something like a bytecode VM, there are still limits to how much it can do in the presence of a language like C or C++ where programmers are used to reinventing bespoke data structures, and the language makes it easy for them to forget to narrow the protection guards as they subdivide allocations.
I anticipate that, were CHERI deployed universally, we would still see something analogous to ROP popping up as a direct result of that shortcoming, and a push for something similar to Rust because C and C++ just don't express the programmer's intent at a high enough level to rule that kind of exploit out.
But that isn't true, except on execution level (and even then depending on characteristics of the environment/platform). So you seem to me to be mixing up the different concepts of memory safety.
How do you divide the concepts? To me, they're inseparable, because memory safety is fundamentally about what happens at execution time.
That's just a non-sequitur. The claim wasn't that any holistic approach to program quality counts as "memory safe". It has to actually prevent memory errors.
Not at all. From the perspective of a compiler for a language like C, runtime prevention of memory unsafe behaviour and "you send your binary off to Apple to be audited, they will only sign your binary if their secret-sauce checks find no memory unsafety, and phones will only run signed binaries" are not qualitatively different.
They're both "The compiler generates the code and then it's somebody else's problem" solutions where the end user doesn't observe memory safety bugs.
To me, "memory safe" on the scale of an individual program or an individual execution of a program is a purely an exercise in debating how many angels can dance on the head of a pin, because verifying that such a property holds true at a scale smaller than "language X on runtime/platform Y" is infeasible, given how real-world programmers use programming languages.
I'm lost in your words there (what does "scale" mean when applied to a concept, and one with free variables no less), but it is most certainly feasible (not even particularly difficult) to verify that individual executions are memory safe. Your tone suggests that perhaps you think that memory safety of individual executions or programs doesn't matter. I suggest that it very much does, and that the NASA miracles you refer to are evidence of that.
As Dijkstra wrote
Dijkstra's point was that tests don't prove the absence of bug, not that the absence of such proof is proof of the contrary.
How do you divide the concepts? To me, they're inseparable, because memory safety is fundamentally about what happens at execution time.
If you completely refuse to think about safety of individual programs and executions, then yes, you would have to conflate execution/program/language safety. But it is a conflation; they are obviously different things.
Edit: I think I've begun to see your point. Maybe what bothers me is there hasn't been an acknowledgement that a non-language solution (eg CHERI) can mediate the effect of a memory error to the point that "memory safe" in the language sense no longer matters. I take your point about subdivided allocations, but the talking point was about null pointer dereference, specifically. It's problematic that "memory error" hasn't been well defined and I don't think the post helped much in this regard (in fact it's awful; the definition of what constitutes a "memory error" seems to be specific to a particular language or set of).
runtime prevention of memory unsafe behaviour and "you send your binary off to Apple to be audited, they will only sign your binary if their secret-sauce checks find no memory unsafety, and phones will only run signed binaries" are not qualitatively different
They certainly are. One has guarantees that the other does not, and the concept of memory safety is all about guarantees. (Edit: and in addition, one restricts what programs you can run where the other does not; I think is the essential difference, in fact; calling a system "memory safe" because some third-party such as Apple disallowed you from running not-memory-safe programs on it is clearly not going to gel with most people, and that's what you're relying on with this argument, but perhaps it does technically meet the criteria; so what?).
I'm lost in your words there (what does "scale" mean when applied to a concept, and one with free variables no less), but it is most certainly feasible (not even particularly difficult) to verify that individual executions are memory safe. Your tone suggests that perhaps you think that memory safety of individual executions or programs doesn't matter. I suggest that it very much does, and that the NASA miracles you refer to are evidence of that.
I'm saying that memory safety guarantees that hold true less broadly than "for all executions of all programs which compile/execute in language A on platform B" involve so much non-reusable effort to achieve validation that their applicability is so niche that I believe that "digestion burns more calories than it releases" to talk about them unless you're actually working for one of those niche employers like NASA.
Dijkstra's point was that tests don't prove the absence of bug, not that the absence of such proof is proof of the contrary.
My intent was that, if you don't have some kind of proof that eliminates a form of memory unsafety as broadly as a type system guarantee (be it compiled or something more like how Java interacts with the JVM), then whatever you have isn't meaningfully different from running the tests Dijkstra called out for being insufficient.
Edit: I think I've begun to see your point. Maybe what bothers me is there hasn't been an acknowledgement that a non-language solution (eg CHERI) can mediate the effect of a memory error to the point that "memory safe" in the language sense no longer matters.
I think that's where we differ. I believe that "mediate the effect of a memory error to the point that "memory safe" in the language sense no longer matters" is an empty set. ...that it's fundamentally no different from "do what I mean" programming, where the system has to read your mind to guarantee correct behaviour.
I don't believe C is specified tightly enough for there to exist a defensible Java+JVM relationship in C+CHERI, and that's just normal. If a language isn't designed to give enough information to the execution environment, then the execution environment's guess at the intended semantics can't render safe all programs the compiler will accept because of how higher-level logic bugs can introduce exploitable paths that are still presented as valid to CHERI or equivalent.
the definition of what constitutes a "memory error" seems to be specific to a particular language or set of
The definitions I've seen, which are used in places like the Wikipedia article, treat it as "accessing memory which shouldn't have been accessed" and use "spatial" and "temporal" as the top-level divisions. (i.e. Is the system allowing you to access an address you were never granted or allowing you to access a granted address before/after its window of validity?)
I'm saying that memory safety guarantees that hold true less broadly than "for all executions of all programs which compile/execute in language A on platform B" involve so much non-reusable effort
So what? It's still useful to be able to distinguish programs which have those guarantees from those that don't, for example, even in an abstract sense, and that's what "memory safety" means in the context of programs.
My intent was that, if you don't have some kind of proof that eliminates a form of memory unsafety as broadly as a type system guarantee (be it compiled or something more like how Java interacts with the JVM), then whatever you have isn't meaningfully different from running the tests Dijkstra called out for being insufficient.
Dijkstra was calling out tests as being insufficient to prove the absence of bugs including incorrect behaviour, he wasn't saying that if you didn't have any more than tests then your program necessarily had errors (and especially not that it necessarily had memory errors). He also wasn't suggesting that a term developed to differentiate between programs with and without a certain type of error should instead be used to different between programs that were proven not to have that type of error versus those for which that wasn't proven; his focus there was on correctness of programs as a whole.
The definitions I've seen, which are used in places like the Wikipedia article, treat it as "accessing memory which shouldn't have been accessed" and use "spatial" and "temporal" as the top-level divisions
Sure, but when does "memory get accessed" (and how do you know it "shouldn't" have been accessed) at the language level? Looking at C in particular, there's absolutely no guarantee made by the language that a null pointer dereference, or an out-of-bounds array access or whatever, actually causes any memory access at all (example). We have to resort either to saying that "well it's UB so it could do that at least theoretically" or "well the compiler will generate memory accesses for this code, we just know that" - the latter is only true in some cases as my example shows (so does that mean other cases aren't memory errors? surely not) and if the former, don't we have to be concerned about the consequence of the error (otherwise, what's the point of the classification?) in which case things like CHERI surely should be considered relevant even if they are "external" to the language/compiler.
Hicks goes into the definition problem in a some more detail (but still not enough, IMO); Wikipedia really doesn't do it justice (at least not last time I read the page).
I think that's where we differ. I believe that "mediate the effect of a memory error to the point that "memory safe" in the language sense no longer matters" is an empty set. ...that it's fundamentally no different from "do what I mean" programming, where the system has to read your mind to guarantee correct behaviour.
Memory safety is a concern because of the potential effect of memory errors. If you constrain those effects so that they no longer have the extreme potential for damage, that concern is resolved. No mind reading is required.
So what? It's still useful to be able to distinguish programs which have those guarantees from those that don't, for example, even in an abstract sense,
Useful in an academic context, perhaps, but of very niche practical use. That's why I referred to angels dancing on the head of a pin.
and that's what "memory safety" means in the context of programs.
Every discussion of memory safety I've seen has aligned with @gpm's stance on it, so I'm going to call [citation needed] here.
Dijkstra was calling out tests as being insufficient to prove the absence of bugs including incorrect behaviour, he wasn't saying that if you didn't have any more than tests then your program necessarily had errors (and especially not that it necessarily had memory errors). He also wasn't suggesting that a term developed to differentiate between programs with and without a certain type of error should instead be used to different between programs that were proven not to have that type of error versus those for which that wasn't proven; his focus there was on correctness of programs as a whole.
Dijkstra was known for arguing for more use of formal methods to prove program correctness. Here's the larger context for that quote:
Argument three is based on the constructive approach to the problem of program correctness. Today a usual technique is to make a program and then to test it. But: program testing can be a very effective way to show the presence of bugs, but is hopelessly inadequate for showing their absence. The only effective way to raise the confidence level of a program significantly is to give a convincing proof of its correctness. But one should not first make the program and then prove its correctness, because then the requirement of providing the proof would only increase the poor programmer’s burden. On the contrary: the programmer should let correctness proof and program grow hand in hand. Argument three is essentially based on the following observation. If one first asks oneself what the structure of a convincing proof would be and, having found this, then constructs a program satisfying this proof’s requirements, then these correctness concerns turn out to be a very effective heuristic guidance. By definition this approach is only applicable when we restrict ourselves to intellectually manageable programs, but it provides us with effective means for finding a satisfactory one among these.
-- https://www.cs.utexas.edu/~EWD/transcriptions/EWD03xx/EWD340.html
That aligns strongly with using languages with strong, expressive type systems like Rust to achieve memory safety instead of trying to retrofit it onto languages like C using mechanisms like CHERI.
Sure, but when does "memory get accessed" (and how do you know it "shouldn't" have been accessed) at the language level?
At the language level (and, by that, I mean what the programmer sees, not what the optimizer outputs), memory "gets accessed" when you read it (eg. dereferencing a pointer) or assign to it.
Knowing it "shouldn't" have been accessed is achieved through constructs such as runtime bounds checks or compile-time errors. (eg. Rust catching an attempt to use a reference to a moved value)
Looking at C in particular, there's absolutely no guarantee made by the language that a null pointer dereference, or an out-of-bounds array access or whatever, actually causes any memory access at all (example).
That's another reason CHERI is insufficient... because the optimizer may have completely obscured what limited information C gathers on the programmer's intent (within the context of valid-but-ugly tricks), obscuring things like overflowing from one field of a struct to another, or writing beyond the boundaries of a slice in some virtual machine's address space that hasn't had its CHERI-level bounds narrowed.
Memory safety is a concern because of the potential effect of memory errors. If you constrain those effects so that they no longer have the extreme potential for damage, that concern is resolved. No mind reading is required.
See my previous statement. If the code isn't integrating adequately with CHERI to re-tag pointers when it slices up allocations, then mind-reading is required to determine whether what CHERI "sees" is an exploit or an emulator.
Every discussion of memory safety I've seen has aligned with @gpm's stance on it, so I'm going to call [citation needed] here.
Read Hicks, linked from the original post, for a start.
Dijkstra was known for arguing for more use of formal methods to prove program correctness. Here's the larger context for that quote
Which confirms what I've already said about this quote.
Knowing it "shouldn't" have been accessed is achieved through constructs such as runtime bounds checks or compile-time errors. (eg. Rust catching an attempt to use a reference to a moved value)
By that definition, all memory accesses in C are not memory errors.
That's another reason CHERI is insufficient... because the optimizer may have completely obscured what limited information C gathers on the programmer's intent, obscuring things like overflowing from one field of a struct to another,
Alright, it might not catch all memory errors - if indeed that is an example of a memory error - but it certainly limits their effect.
For the purposes of this post, we are generally considering whether a program execution is memory safe or not. From this notion, we deem a program to be memory safe if it all of its possible executions are memory safe, and a language to be memory safe if all possible programs in the language are memory safe.
OK, given this context, I can see how Hicks's choice of terminology works... but it also feels like he's intentionally defining a semi-nonstandard use of it for the context of his argument.
As @gpm said, this definition of "memory safe" allows "safe driving" to mean "I always drive defensively" (programming language), "I drove defensively" (this codebase), or "I didn't get in an accident" (this execution). That last jump is unacceptable.
When you're not Hicks, using it as shorthand for something like "subject to memory safety", saying a single execution is "memory safe" is equivalent to saying that a drunk driver who didn't cause an accident counts as a "safe driver". You're conflating "acted safely" and "arrived safely". Action and result.
He also cites two papers, only one of which is open access, for his definition of memory safety. I don't have time to read the whole open access one, but the section he appears to be referring to is as follows:
A. Memory corruption
Since the root cause of all vulnerabilities discussed in this systematization of knowledge paper is memory corruption, every exploit starts by triggering a memory error. The first two steps of an exploit in Figure 1 cover the initial memory error. The first step makes a pointer invalid, and the second one dereferences the pointer, thereby triggering the error. A pointer can become invalid by going out of the bounds of its pointed object or when the object gets deallocated. A pointer pointing to a deleted object is called a dangling pointer. Dereferencing an out-of-bounds pointer causes a so called spatial error, while dereferencing a dangling pointer causes a temporal error.
[...]
-- https://ieeexplore.ieee.org/ielx7/6547086/6547088/06547101.pdf
Different phrasing, but I see that as consistent with my definition.
Which confirms what I've already said about this quote.
From the beginning, I was quoting Dijkstra in the context of saying that it's not useful to apply the terminology "memory safe" to anything as narrow as a single execution of a program.
I will, however, accept that it can be argued that a single program can be said to be "memory safe" across all inputs. If nothing else, that's consistent with a compiler or interpreter being "a single program" that can be memory-safe across all inputs.
By that definition, all memory accesses in C are not memory errors.
No, by that definition, C cannot know which memory accesses are memory errors. All means of recognizing memory errors in C are imperfect (eg. not everything causes a segmentation fault) and external to what is defined as being C.
Alright, it might not catch all memory errors - if indeed that is an example of a memory error - but it certainly limits their effect.
Fair. I never intended to dispute that.
OK, given this context, I can see how Hicks's choice of terminology works... but it also feels like he's intentionally defining a semi-nonstandard use of it for the context of his argument.
I mean it's not like Hicks is the only one using this terminology. But I don't have any choice references to hand and I'm not digging them up just for this (you can do your own homework on this, or choose not to, as you see fit).
Different phrasing, but I see that as consistent with my definition.
Indeed, part of what I was saying before is that there is no consistent terminology. I was also trying to say, perhaps too obtusely, that it is actually (seemingly) very difficult to come up with a nice completely self-consistent terminology that meets all the criteria we would want it to.
From the beginning, I was quoting Dijkstra in the context of saying that it's not useful to apply the terminology "memory safe" to anything as narrow as a single execution of a program.
I was pointing out that Djikstra doesn't say that. I know what you are getting at; but Djikstra isn't actually supporting it. He's not talking about terminology.
No, by that definition, C cannot know which memory accesses are memory errors.
I was too obtuse. The point I was making is: how does anyone decide which memory accesses in a C program are memory errors, and which aren't?
[...] external to what is defined as being C
So, not part of the language after all - and so the original question (which was meant to be illustrative - I don't expect you to answer it) stands.
I was pointing out that Djikstra doesn't say that. I know what you are getting at; but Djikstra isn't actually supporting it. He's not talking about terminology.
Ahh, there's the miscommunication. I never said he was talking about terminology. I said that the class of problems we've dubbed "memory safety" is a problem domain that's subject to what he was talking about.
I was too obtuse. The point I was making is: how does anyone decide which memory accesses in a C program are memory errors, and which aren't?
That depends on what you mean by "decide".
For example, if you mean "declare/communicate", it's historically been done via human-readable documentation. However, one more automatic option is that Splint supports annotations which let you augment C's type system in the direction Rust goes.
So, not part of the language after all - and so the original question (which was meant to be illustrative - I don't expect you to answer it) stands.
In this case, one could argue that null pointer dereference bugs are part of the language because the programmer-facing side of it makes no effort to prevent conflation of NULL and a valid pointer to some usually-0 address.
Ahh, there's the miscommunication. I never said he was talking about terminology. I said that the class of problems we've dubbed "memory safety" is a problem domain that's subject to what he was talking about.
In the sense that it's a type of error and he's talking about errors, yes, but he's talking much more broadly about program correctness. The exact context you brought up the Dijkstra quote was in reply to:
So you're talking about language memory safety, which is fine, but programs can be memory safe, and even individual executions of programs can be memory safe
... which is very much about terminology, so, not trying to be unfair, but I don't accept any blame for any misunderstanding here. I'm now guessing you were just reinforcing the notion that you can't prove lack of errors by tests (or any means other than formal methods), but I never argued that point.
Edit: in fact, you yourself said:
From the beginning, I was quoting Dijkstra in the context of saying that it's not useful to apply the terminology "memory safe" to anything as narrow as a single execution of a program.
... You specifically said it is about terminology.
That depends on what you mean by "decide".
What I mean is: given a program, in some language(s) for which you know the semantics, and an execution trace (including all "memory accesses", values and so on), how would you determine which of those memory accesses constituted memory errors? (But again, I'm not really expecting an answer here - I was trying to point out that it is complex and thorny - I don't existing definitions of "memory error" generally give a very good answer to it).
In this case, one could argue that null pointer dereference bugs are part of the language because the programmer-facing side of it makes no effort to prevent conflation of NULL and a valid pointer to some usually-0 address
But what makes them memory errors, specifically? What I was questioning was: is it purely defined by the language (and execution trace), or are there external factors? (again, rhetorical, mostly!).
I'm now guessing you were just reinforcing the notion that you can't prove lack of errors by tests (or any means other than formal methods), but I never argued that point.
*nod* I was building on my view that "safety" has no useful meaning in the context of a single, isolated execution trace for the same reason that "formally verifying" a single run of a program has no useful meaning.
It's like looking at a double slit experiment and asking "does this single photon, in isolation, produce an interference pattern?".
... You specifically said it is about terminology.
In the same sense that you can't meaningfully "formally verify" a single, isolated execution trace.
What I mean is: given a program, in some language(s) for which you know the semantics, and an execution trace (including all "memory accesses", values and so on), how would you determine which of those memory accesses constituted memory errors?
By comparing the memory access invariants embodied in the source code to the memory access behaviour exhibited by the execution trace.
For example, "This is an array indexing operator, but the execution trace is accessing memory outside the bounds of the array. Therefore, this is a memory error" or "this is a pointer to a struct, not an array of structs, but the offset added to it results in a pointer to a location beyond this allocation" or "this code is returning a pointer to something within the stack frame of the current function".
(Essentially the same invariants that are embodied in declarations that behaviour is undefined, giving optimizers license to then algebra the syntax tree around the assumption those eventualities will never occur.)
The problem with CHERI is that not all valid C code will express a narrowing of bounds consistent with the developer's intent, because C programs sometimes do tricks such as putting a struct on multiple intrusive data structures, each of which is embedded at a location other than the beginning or end of the struct.
It's akin to how assembly languages couldn't be retrofitted to have C's guarantee that you can't use goto to jump into a function, bypassing its beginning. Rust does for memory access what languages like C did for structured programming... resulting in much wailing and gnashing of teeth from the Mels of the world.
But what makes them memory errors, specifically? What I was questioning was: is it purely defined by the language (and execution trace), or are there external factors? (again, rhetorical, mostly!).
I'd say it's defined by the language in that the language semantics do not provide means to distinguish NULL from casting an arbitrary integer to a pointer beyond "better hope the programmer remembers to check it". As such, if interpreting an arbitrary number as a pointer and attempting to dereference it is a memory error (which is not in dispute), then so is dereferencing NULL.
In a language without what I'll call "low-level NULL" (eg. safe Rust), attempting to treat an Option::None as 0 and dereference it will be a compile-time error.
In C, you don't reliably get something like "error: incompatible NULL to pointer conversion".
(i.e. It's defined by the language in that the nonsensical low-level operation is the memory error and not all languages grant access to that operation.)
... You specifically said it is about terminology.
In the same sense that you can't meaningfully "formally verify" a single, isolated execution trace.
First: you can easily verify a single, isolated execution trace to be free of memory errors (assuming you have a reasonable handle on what classifies as a memory error).
Second: Djikstra wasn't talking about individual execution traces*, so you can't take what he said as having some meaning about how we should apply terminology to them. Or even about how we should apply "memory safety" terminology at all. He's saying only "formal methods are required to prove correctness", you are incorrectly extending that to "and so we should not have any terminology that means anything about the correctness of a program that is weaker than 'it has been formally verified to be free of errors'". It's not what Dijkstra was saying, and it's not even something that logically follows from what he was saying.
** other than that he was effectively saying that a collection of individual traces can have properties not found in all possible traces, i.e. tests which show correct function do not show that the program as a whole always has correct function. But, again, he was not making a point on terminology.
By comparing the memory access invariants embodied in the source code to the memory access behaviour exhibited by the execution trace.
I tried to be clear, it wasn't a question that I expected you to answer. I anyway don't think your answer is satisfactory. The Hicks post has a discussion.
I'd say it's defined by the language in that the language semantics do not provide means to distinguish NULL from casting an arbitrary integer to a pointer beyond "better hope the programmer remembers to check it"
This definition would make null pointer dereference not a memory error if C didn't allow casting between pointers and integers.
First: you can easily verify a single, isolated execution trace to be free of memory errors (assuming you have a reasonable handle on what classifies as a memory error).
But you can't infer any useful amount of information about whether the program that produced it upholds any properties outside that trace (ideally, universally), so applying the term "memory safe" outside Hicks's clarifying context erodes the meaningfulness/usefulness of the term.
Second: Djikstra wasn't talking about individual execution traces, so you can't take what he said as having some meaning about how we should apply terminology to them. Or even about how we should apply "memory safety" terminology at all. He's saying only "formal methods are required to prove correctness", you are incorrectly extending that to "and so we should not have any terminology that means anything about the correctness of a program that is weaker than 'it has been formally verified to be free of errors'".
No, I'm saying that talking about individual execution traces is functionally equivalent to the tests Dijktra called hopelessly inadequate.
An execution trace can demonstrate the presence of a bug, or the presence that the program is capable of completing successfully without encountering a bug, but it's "hopelessly inadequate" for making any statements about other properties the program may have.
The Hicks post has a discussion.
I'll try to make time to give it a read later today.
This definition would make null pointer dereference not a memory error if C didn't allow casting between pointers and integers.
I don't see how. I was saying that null pointers and integer-to-pointer casts are two members of the set of "ways to construct a pointer that is simultaneously invalid and dereferenceable". Dereferencing a null pointer in C is a memory error because dereferencing an invalid pointer is a memory error. Dereferencing Option::None is not a memory error in Rust because the compiler won't let you interpret it as an invalid pointer and dereference it.
But you can't infer any useful amount of information about whether the program that produced it upholds any properties outside that trace (ideally, universally), so applying the term "memory safe" outside Hicks's clarifying context erodes the meaningfulness/usefulness of the term.
It means just what it always did when you apply it to contexts other than execution traces, nothing has been "eroded".
No, I'm saying that talking about individual execution traces is functionally equivalent to the tests Dijktra called hopelessly inadequate.
An execution trace can demonstrate the presence of a bug, or the presence that the program is capable of completing successfully without encountering a bug, but it's "hopelessly inadequate" for making any statements about other properties the program may have.
He was specifically talking about them being "hopeless inadequate" to prove the absence of bugs, not inadequate for showing properties in general.
I keep saying this and you keep overlooking it in your responses, but: Dijkstra wasn't saying anything at all about how we should describe bug-free programs. That tests alone are inadequate to prove a program is correct (or even just that is free of memory errors) is something we can agree on, but you're extrapolating beyond that. I'm not interested in pounding this particular piece of pavement any further. Your claim that talking about "memory safe executions" somehow diminishes the meaning of "memory safe" is bogus, and circular; you're presuming that your own internal definition of "memory safe" is the one that matters.
I don't see how. I was saying that
You said:
I'd say it [what makes null pointer dereferences memory errors]'s defined by the language in that the language semantics do not provide means to distinguish NULL from casting an arbitrary integer to a pointer beyond "better hope the programmer remembers to check it"
I inserted the context from that point the discussion in [ ] to make it clear. This definitely implies that casting integers to pointers is somehow an important part of what defines NP dereferences as memory errors. It's right there in the discussion above; this is what you said, that was the context.
Dereferencing Option::None is not a memory error in Rust because the compiler won't let you interpret it as an invalid pointer and dereference it.
Neither will C; any attempt to derefence an invalid pointer has undefined behaviour. So to call the latter a memory error but not the former, then what constitutes a memory error depends on the effect or at least possible effect of the error. Which leads back to the point about CHERI.
Given that I had to dodge the maximum reply depth to post this, and that we seem to be going in circles as I try to identify where I need to refine my phrasing to get my point across, I think this is a good place to stop.
OK. One more reply because I think it'll clear things up a lot.
I already demonstrated that this is not necessarily true. The language doesn't in fact say anything about how a dereference should be converted to machine code.
You'll get undefined behaviour. What happens at the machine level is up in the air, and could be something akin to what you describe, but it's not guaranteed.
Obviously, when you invoke undefined behaviour, anything goes... so there are two possible cases:
The optimizer transforms it in some way... which could itself cause a memory error, because anything goes. This isn't really useful for talking about memory errors because all memory errors tend to be useful opportunities to leave undefined for optimization potential.
The optimizer doesn't transform it. There are entire classes of "C as high-level assembly" programmers who rage when this doesn't happen, which supports the position that UB resulting in passing invalid dereferences through happens often enough to be useful to talk about.
I'll leave it at that because, again, max reply depth.
Obviously, when you invoke undefined behaviour, anything goes
Yep, I made the same point myself earlier.
There are entire classes of "C as high-level assembly" programmers who rage when this doesn't happen, which supports the position that UB resulting in passing invalid dereferences through happens often enough to be useful to talk about
Not really relevant though, when trying to rigorously define what constitutes a memory error at the language level.
'll leave it at that because, again, max reply depth.
Ok.
As such, if interpreting an arbitrary number as a pointer and attempting to dereference it is a memory error (which is not in dispute), then so is dereferencing NULL.
That doesn't follow, because NULL is not a result of casting an arbitrary number to a pointer. It requires a very specific number.
That doesn't follow, because NULL is not a result of casting an arbitrary number to a pointer. It requires a very specific number.
But that's irrelevant to the point. The memory error happens because the programming language doesn't see it as "very specific" enough to prevent the conflation between NULL and an arbitrary number and, when it becomes machine code, it's treated as an arbitrary number by the CPU, to the point where the only thing that catches it is that modern, mainstream OSes leave the page which happens to contain it unmapped.
If I take a null pointer, expecting an array, and index far enough into it in C, I'll erroneously synthesize a pointer to valid memory and successfully dereference an invalid pointer instead of segfaulting.
when it becomes machine code, it's treated as an arbitrary number by the CPU,
I already demonstrated that this is not necessarily true. The language doesn't in fact say anything about how a dereference should be converted to machine code.
If I take a null pointer, expecting an array, and index far enough into it in C, I'll erroneously synthesize a pointer to valid memory
You'll get undefined behaviour. What happens at the machine level is up in the air, and could be something akin to what you describe, but it's not guaranteed.
The article uses this definition of memory safety:
It is, frankly, a bad definition. Memory safety (or unsafety) is not about whether that list of things occur, but rather what the effect of them occurring might be (or I suppose, to be accurate, the combination of both).
Null pointer dereference in C/C++ yields undefined behaviour, and that can lead to behaviour that's indistinguishable from a "classic" memory error, and so it counts as a memory error. In practice it often does result in a segmentation fault but that's certainly not guaranteed (regardless of operating system).
Null dereference in a language like Java, on the other hand, is perfectly safe, which goes to my point.
(Edit:) In terms of programs, as opposed to languages, memory safety means no memory error occurs (where what counts as a memory error is dependent on the semantics of the language). People often get these confused, which is why you'll see (incorrect) claims that a particular program is "not memory safe" because it is written in a not-memory-safe language.
(Edit again): There are also memory safe executions of programs (which may themselves not be memory safe, because some executions exist where memory errors do occur). That's in fact what the borrowed definition is referring to:
"[A] program execution is memory safe so long as a particular list of bad things, called memory-access errors, never occur:
but the post author conflates it with program safety:
So, memory safety is when a program is guaranteed not to have those weaknesses.
... and then also with language safety.