NASA’s 10 Rules for Safety-Critical Code

27 points by azhenley

aiono

I find it interesting that loops are allowed as long as they have bounds, but recursion is in general to be avoided. A similar restriction can be enforced in recursion as well, by ensuring that subscalls have smaller arguments, similar to how Agda works. Seems like what they actually want is guaranteed termination.

matklad
Only loops with fixed bounds are allowed, which is a syntactic property. You can do for(0..100), you can’t do
```
var i: u32 = 0;
while(i < 100) : (i += 1)
```
if there were equally visible syntactic way to mark bounded recursion, bounded recursion would have been allowed.

Being able to tell just from the syntax whether something terminates quickly or not is easy, as anything more complicated you can mess up.
dzwdz

I assume it’s about running out of stack, since you don’t know how much space you’ve got left there, and C doesn’t let you safely bail out if you overrun the stack.

They already have the “no heap allocations” rule, it makes sense to prevent you from accidentally using up the stack too.
- aiono
  
  I didn’t think about running out of stack. That could be the reason indeed.
zesterer

Yes. The general term for this is Walther Recursion, for those interested.

I know that all of these sort of rules (MISRA, etc.) are intended as guidelines and not hard rules, but I can’t help but think that the ‘no recursion’ rule is unusually punitive. A lot of algorithms that one could absolutely want to implement for some real-time safety-critical system are substantially easier to implement and debug when written in a recursive style. A lot of air will be produced talking about ‘stack overflows’, but implementing most of them invariably requires dynamic memory (or, at least, a large preallocated buffer) anyway.
- technomancy
  
  It’s also deeply weird that recursion is banned in rule 1 (“complex flow control”) rather than rule 2 (“prevent runaway code”).
- BasicallyEternity
  
  A lot of algorithms that one could absolutely want to implement for some real-time safety-critical system are substantially easier to implement and debug when written in a recursive style.
  
  What algorithms are you thinking of? My experience is that a lot of code I see written using recursion would be easier to read and debug if written as a stack and a while loop. (Plus its usually more performant too!)
  - zesterer
    
    Many kinds of tree or graph traversal, divide-and-conquer style problems, dynamic programming, etc.
- dutc
  
  I think the post should have linked to the article itself, rather than to the Wikipedia page: The Power of 10: Rules for Developing Safety-Critical Code, Gerard J. Holzmann
  
  The article itself provides additional rationale and context for each rule.
  
  I think it’s also worth confronting the implication of “NASA’s… Rules for Safety-Critical Code.” NASA is probably the agency within the US federal government that has the most uniformly positive global reputation. The brand is just outstanding. But while I would never dare diminish their achievements and accomplishments, I would encourage people to take a realistic view of these large organisations and be cognizant of the many challenges they face. In that sense, it’s not clear to me that we necessarily owe that much importance to these rules, unless we have more information about what their application has actually achieved. The article indicates that these rules have found their way into the JPL institutional coding guidelines (and JPL is, indeed, one of the many important facilities that NASA sponsors,) but it’s not clear whether these were actually adhered as part of supporting any actual missions.
  - fanf
    
    Gerard Holzmann talked about how these these rules had been applied https://youtu.be/16dQLBgOwbE following the landing of the Curiosity rover in 2012
- azeemba
  
  Use a minimum of two runtime assertions per function.
  
  Does runtime assertion mean something else here? Using the standard meaning, this would encourage adding code that crashes instead of adding code that handles errors gracefully.
  
  There is definitely a trade-off between having more louder crashes during development and avoiding crashes in production. But as a blanket rule this one stood out, compared to the others.
  - zgrep
    
    Following the citation, there’s a nicer, more detailed description:
    
    Rule 5: The code’s assertion density should average to minimally two assertions per function. Assertions must be used to check for anomalous conditions that should never happen in real-life executions. Assertions must be side-effect free and should be defined as Boolean tests. When an assertion fails, an explicit recovery action must be taken such as returning an error condition to the caller of the function that executes the failing assertion. Any assertion for which a static checking tool can prove that it can never fail or never hold violates this rule.
    
    In the cited document, each rule is also followed by a rationale explaining it.
    
    dzwdz
    
    When an assertion fails, an explicit recovery action must be taken such as returning an error condition to the caller
    
    They’re just carrying on knowing their program is in an inconsistent state? I wonder if this sets some “tainted” flag that prevents it from doing anything too dangerous or something – or if that error is not meant to be recoverable, and they’re only propagating it upwards to die a bit more gracefully.
    
    Any assertion for which a static checking tool can prove that it can never fail or never hold violates this rule.
    
    So code which satisfied these rules can no longer satisfy them if you upgrade your tools? What are you supposed to do in this case?
    
    Assertions that can be verified by a static checking tool are still valuable – they might not be statically verifiable anymore if the surrounding code changes, and they still communicate intent. That’s a bit of an odd rule, IMO.
  - zesterer
    
    Safety-critical systems favour crashing when error conditions are encountered (i.e: terminate immediately and reset). High-availability systems favour graceful handling when error conditions are encountered (i.e: continuing to function but in a degraded state). The different requirements tend to favour different strategies.
  - matklad
    
    This is the kind of thing that rules speak about:
    
    https://github.com/tigerbeetle/tigerbeetle/blob/80817e80e4a29290306d77b205be789f7c99be33/src/vsr/journal.zig#L1487-L1554
    
    If any of those assert conditions turns out to be false, the process crashes, which is the right behavior in context.
  - aiono
    
    Assertions are usually the invariants that you assume to hold. Since all your code relies on that assumption, it’s probably more dangerous to continue after you know that your assumptions don’t hold.
  - enobayram
    
    I believe they mean assert statements that don’t run in production builds. So you’re supposed to assert at least the pre- and post- conditions of every function so that they crash during testing.
  - casperin
    
    It means exactly what you think it means: Crash the program if the condition is not met. Also in production. :)