Memory Safety Is …
15 points by bakaq
15 points by bakaq
Memory safety is a property of an implementation.
I think this brings one of my long-standing issues back: Python is not memory safe. You can make it segfault in all kinds of ways, quite easily, by loading embedded bytecode or at runtime via ctypes, or by misusing the buffer interface (There are well-known buffer related use-after-free problem that is essentially inherent to the language because they stem from the C API). That is not something you can realistically fix without breaking Python itself.
And yet people will confidently claim that Python is memory safe, or at least that it should be considered memory safe. In practice, it clearly is not. You can get it to segfault trivially, and you can get it to execute arbitrary native code just as easily. That makes the label hard to defend. But it should probably be considered memory safe, because you have to go out of your way to make it crash. (And it wouldn't be memory safe by Compcert's definition mentioned in the article)
This is why I find "memory safety" very difficult to define cleanly. The term is used in inconsistent and often emotional ways, depending on how people feel about a language rather than where a sensible technical boundary should be drawn. A good example here is also Go: some people argue it is not memory safe because it allows tearing, while others insist it is.
At this point, memory safety has become a hot-button topic with no widely accepted or precise definition, which I find genuinely fascinating.
I think the Python problem is well captured by the first page of https://storage.googleapis.com/gweb-research2023-media/pubtools/7665.pdf. Almost every language has an unsafe subset, and, if that subset is sufficiently clearly fenced of, that's fine (e.g., unsafe in Rust, Go, and Java). I'd say Python falls roughly into this category, because buffers and ctypes feel sufficiently niche in practice.
It's also useful to define a language that remains safe even in the presence of adversarial code (so, without unsafe subset), but we might as well just say "JavaScript" as that's more or less the only such language in wide use.
The Go problem is much more interesting! It clearly is not "formally memory safe", but also there seems to be enough evidence to conclude that it is sufficiently safe in practice, modulo some yet unknown future attacks? And then the question becomes "what is sufficient"? And that's a fascinating question, because most discussion of unsafely is in the context of C and C++, but I'd argue that the biggest problem there is not unsafely per se, but rather the fact that the language is a psycho clown with a chainsaw which is out there to get you! There was this story recently:
And earlier today I watched https://youtu.be/zULU6Hhp42w?t=750 which hilariously mentions the same bug in passing --- a very senior person solving a hard problem got got by language trivia, not because of some genuinely tricky UB, but because the language is a field of rake-triggered footguns.
I am extremely curious what would be a security track record of non-TigerStyle Zig, this should give a much better view of the cost of memory un-safety, desaggregating from the cost of dealing with byzantine programming language.
I think the Python problem is well captured by the first page of https://storage.googleapis.com/gweb-research2023-media/pubtools/7665.pdf. Almost every language has an unsafe subset, and, if that subset is sufficiently clearly fenced of, that's fine (e.g., unsafe in Rust, Go, and Java). I'd say Python falls roughly into this category, because buffers and ctypes feel sufficiently niche in practice.
I'm not referring to ctypes but just Python not being particularly hardened. The buffer reuse bug is ages old for instance and it's an inherent limitation of the C interpreter that you can use to execute arbitrary code: https://gist.github.com/mitsuhiko/5e0ca65a9a8bf6e07f8edf6e20a41212 (only tested on my arm64 mac, might require some adjustments)
Yes, there is a ctypes import, but just because I want to get the address of libc. It otherwise does not use ctypes. But when you run it:
$ python id.py
[*] Executing 'id' via fake type tp_repr -> system()
uid=501(mitsuhiko) gid=20(staff) groups=20(staff),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),701(com.apple.sharepoint.group.1),702(com.apple.sharepoint.group.2),33(_appstore),100(_lpoperator),204(_developer),250(_analyticsusers),395(com.apple.access_ftp),398(com.apple.access_screensharing),399(com.apple.access_ssh),400(com.apple.access_remote_ae)
Look, i know it's technically a bug, but it's a bug that does not get fixed :)
//EDIT: and maybe to make this a bit more specific. In Python there is no real rule where the language starts and where it ends. Python's CPython implementation behaves however it's implemented. There is no real specification. It's also hard to say where the language starts and ends if you consider how the C-ABI leaks into even Python code itself (eg: the core IO system leaks it out).
As a result this makes it hard to argue about memory safety. I think the buffer interface is just the best example because the core python buffer object is memory unsafe due to how the C API works. The buffer it backs is a cpointer and the lifetime is independent of the wrapper object. As a result you can trivially create cases of use-after-free. It's almost part of the object's contract.
And as can be demonstrated: you could create absurd cases where you can exploit it.
That's horrifying. Do you know where this bug is tracked?
It's this 12 year old ticket: https://github.com/python/cpython/issues/60198
The issue is there since Python 2.7 or whenever the API was introduced.
It's also useful to define a language that remains safe even in the presence of adversarial code (so, without unsafe subset), but we might as well just say "JavaScript" as that's more or less the only such language in wide use.
Technically, Buffer.unsafeAlloc (and new Buffer(number) in older versions) let you read from uninitialized memory. This is arguably "memory unsafe" (because it can lead to sensitive data leaking).
That's a (Node.js) runtime extension, not part of the language itself. This is somewhat nitpicky since the Node runtime is so prevalent, but the same APIs of course do not exist in the browser.
Equivalently, Deno's FFI can be used to open any and all possible safety holes to the JavaScript runtime. But that again is only present in Deno and doesn't make browsers, Node.js, or even Deno running without FFI permissions memory unsafe by association.
As pointed out in the post, segfaults are not an example of memory unsafety. They are well defined to trap. I think your comment would be stronger and less misleading to beginners if you were more precise about the way in which Python is not memory unsafe that you are trying to highlight.
Edit: https://gist.github.com/mitsuhiko/5e0ca65a9a8bf6e07f8edf6e20a41212 is a great example, thank you
One thing you can do is use CPython compiled with Fil-C. It works with dynamically loaded FFI native modules, as long as the libraries are also built with Fil-C.
With Fil-C via Nix you can even use the Nixpkgs Python packaging infrastructure; it defines the Fil-C ABI as a platform triplet, makes it available as a cross compilation target, and has a few examples of Fil-C specific overrides for Python native modules that need some extra configuration flags or source patches for Fil-C (disabling inline assembly, etc).
I got it to serve an ASGI web app using Uvicorn, SQLite, Cairo, msgspec, etc—all built for Fil-C automatically using pkgsFilc.python3.withPackages...
The key detail here is that on the level of abstract semantics, you simply can not have undefined behavior. For the specification to be consistent, you need to explain what abstract machine does in every case exhaustively, even if it is just “AM gets stuck”. If UB is set complement to defined behaviors, then it is defined just as precisely!
While it's true that at an abstract machine level UB is "just another state", what this misses are that the semantics of the UB state are fundamentally different (in that they are lacking) compared to other otherwise-similar states.
Yes, an implementation is required for this to be apparent, that doesn't mean that Cardelli's definition is without merit though. In his viewpoint, as I understand it, what distinguishes UB (an "untrapped error") from others is in how an implementation is permitted to handle it while still being considered to correctly supply the required semantics. That's summed up in his paper via:
It is useful to distinguish between two kinds of execution errors: the ones that cause the computation to stop immediately, and the ones that go unnoticed (for a while) and later cause arbitrary behavior. The former are called trapped errors, whereas the latter are untrapped errors
But, the required semantics are indeed a property of the language, not a property of the implementation:
A program fragment is safe if it does not cause untrapped errors to occur. Languages where all program fragments are safe are called safe languages.
I'll note though that even Cardinelli muddies the waters a little:
Untyped languages may enforce safety by performing run time checks
Obviously a "run time check" is something that an implementation does or enforces; it's not a concern of the language. I think Cardinelli means essentially that a language may require semantics that in general would necessitate run-time checks (for example, Java defines the behaviour of a null pointer dereference, which generally requires a runtime check - albeit one that may be assisted by the hardware/OS. Though, in certain cases, the check could be elided as it could be proved that the pointer could not be null, for example).
An implementation I of L is memory safe, if it cannot give rise to new, surprising behaviors, for arbitrary programs P (including buggy ones), except that crashing is allowed.
(This is the conclusion that the post artives at.) I never like calling this "memory safe". If you want to take this as a definition of a "safe" translation (the author's I), be my guest. You can even extend such a definition to safety of the whole implementation (I + CSema). But it has nothing at all to do with memory. What do program behaviors have to do with memory in particular? We might as well call this "control safe", as in not taking any unplanned control flow paths. You may complain that an unintended behavior may not cross any dynamic control flow, but then it may not need to do any memory accesses either! And if you say that even CPU registers are memory, then any program accesses memory and the "memory" part of "memory safe" adds zero extra information.
As crappy as the Skeptics definition is, at least it tries to capture the "memory" part.
What do program behaviors have to do with memory in particular?
The most common way to make a program diverge from the well-defined part of its language is by erroneous memory accesses.
There are vulnerabilities that start off as integer overflows but what elevates them from bugs to security incidents is using the wild integer to access memory without proper checks.
Sure, invalid memory accesses are a common source of security bugs, but they are not the only one (simple logic bugs like forgetting a check can create vulnerabilities too). Furthermore, nothing in "memory safety" claims to be about security specifically.
You could define "memory safety" to be about security, and then specifically about memory-relates security bugs, but then how do you define the "memory-related" part precisely? Just "using memory" is not enough. I feel it's this part that the article is about.
Reminded of this article, where the author defined memory safety like this:
the goal of memory safety is to access data that is the type we expect.
Which itself is somewhat vague, but it's still a useful formulation.
From my perspective, "memory safety" mostly seems to be defined in a relative sense. If you can statically guarantee some property that the C compiler can't, then your language is more "memory safe" than C.
If you can statically guarantee some property that the C compiler can't, then your language is more "memory safe" than C
I find that unsatisfying; what if that property is something fairly insignificant? If I have a language A which guarantees that integer variables never obtain a value greater than 100, is it really "more memory safe" than another language B which only guarantees that they never obtain a value greater than 200? (since A offers all the guarantees that B does, and more). I don't think it can be reduced to something (quite) as abstract as this; you at least need to say something about what makes a property relevant to memory safety, and without that you're pretty much still at square one.
Memory safety was traditionally considered a binary property - a language is memory safe, or it isn't; a program is memory safe, or it isn't. While I now frequently see some claiming that it is actually a spectrum, I think considering memory safety as a binary property is more useful: we want to be able to say "this program is memory-safe, therefore we know it is definitively not (barring compiler or other external bugs) susceptible to exploits involving arbitrary code execution", for example.
There is an important distinction between languages like JavaScript, OCaml, and safe Rust, and languages like C, C++, and Pascal. In the first group, the language definition gives programs well-defined behavior (or a specified, predictable failure mode) for all executions. In the second group, executions that trigger undefined behavior are not governed by the language semantics: once undefined behavior occurs, “anything can happen” and the implementation may do literally anything.
In principle, you could imagine a C/C++/Pascal implementation that treats undefined behavior as a checked runtime error, detecting it promptly (via pervasive sanitization) and terminating in a predictable way. But that is not the usual compilation model or performance contract, and typical toolchains do not provide such guarantees by default.
Interestingly, assembly languages often have defined meaning: instructions have specified effects on registers and memory, and failure modes are typically architectural (traps, faults) rather than semantic “undefinedness” at the language level. Yet we still would not describe assembly as memory safe.
This suggests that “memory safety” is not the most illuminating axis here. A more useful lens may be: what reasoning principles does the language enforce, especially via its type system, and to what extent do those principles rule out classes of executions or make program behavior compositional and predictable.
Note that the present post is 90% sophistry in the style of Xeno
Not a big deal, but I think "Xeno" should be Zeno.