Property-Based Testing in Practice
26 points by natfu
26 points by natfu
Property-based testing (PBT) is a testing methodology where users write executable formal specifications of software components and an automated harness checks these specifications against many automatically generated inputs. From its roots in the QuickCheck library in Haskell, PBT has made significant inroads in mainstream languages and industrial practice at companies such as Amazon, Volvo, and Stripe. As PBT extends its reach, it is important to understand how developers are using it in practice, where they see its strengths and weaknesses, and what innovations are needed to make it more effective.
Joe Cutler (one of the authors) presented a PBT talk at NYC Systems over the summer. It references this paper a bit, and was highly entertaining in person. :)
PBT is one of those things that can be really effective, but it can also be hard to use effectively, and can be misused. When misused it can give you a false sense of confidence with your testing.
Their research opportunity 6 ("Improve tools for evaluating testing effectiveness.") would improve this, I'd love to work on this one day. It's very easy to have generators that just generate basically the same valid values that are used the same way. For example, if you check validity of a string argument, it might make sense to select between just two string values (one valid, one invalid) instead of using a generic string generator and trying wasting testing time/iterations trying different valid and invalid strings.
(Then test your string validity check separately with random strings, if needed.)
The users here also seem to have side-stepped the issues with some of the older PBT libraries like Haskell's QuickCheck, which associates generators and shrinkers with types, which doesn't make sense. Most types are simply too general for what they're representing. It makes sense to test with everything that e.g. a function can take, but you also want to test with e.g. only valid values, or only with some arguments valid and others truly random etc. for good test coverage. This is related to the previous point, it's too easy to spend a lot of test time/iterations trying the same code paths.
I'd also love to see some code with properties checked, and complex generators etc. AFAIK there aren't a lot of open source code taking PBT seriously and applying it to large software that can't afford to go wrong or even stop working with an exception/panic/etc.
Needing to create your own data generation process for valid inputs (the oracle problem) and great shrinking to avoid losing time on meaningless value is definitely a huge hurdle. It can be great to some spend time thinking on it, because you'll need to think about your design and valid inputs and outputs, in that it resembles the thought process of TDD imo. In many cases you might be better served identifying edge cases and rolling with that.
I know there are companies like Antithesis that test the entire system so it goes way beyond PBT. Finding the middle ground where PBT can be useful without setting up a whole mock environment is actually quite difficult!
IMO a lot of the benefit of property-based testing comes from the process of harnessing your test target so it can be run in a PBT-supporting way. At that point you can just use the property-checking function to quickly but manually write a lot of test cases covering all the functionality you are interested in. Turning test cases into data instead of code is wonderful. Writing a generator for that data can be quite difficult; I am reminded of Knuthâs remarks in Stanford Graphbase about the difficulty of generating interesting random graphs, leading him to painstakingly curate appropriate real-world datasets instead.
Not an expert myself, but I know of this related paper: https://dl.acm.org/doi/pdf/10.1145/3764068
(disclaimer: I know of it because I've talked to the authors before)
Nice! I've posted about it before https://lobste.rs/s/tp8gzj/empirical_evaluation_property_based =) I definitely want to play around with hypothesis and show some ways it can be interesting. The field is really missing resources and entry points for people that might benefit from it.
Oh wow! Didn't see that. Yeah PBT is really cool and a surprisingly simple idea. Another interesting (although more in the weeds) paper on how to better mechanize generators, which are the more laborious part of PBT: https://pat-lafon.github.io/papers/cobb.pdf. (same disclaimer as before: I just happen to have met several people who are doing cool things with PBT)
Relatedly, there's no fundamental reason why you should be able to shrink a generated test input but not a hand-written one. When a test fails, we should be able to shrink the input in all cases.
We already do this kind of thing in compiler testing and debugging with tools like C-Vise and C-Reduce. I'm hoping to take this a step further in my own language by allowing shrinking with dependencies. It's actually quite easy to do once you have a shrinker: (1) you allow your compiler/interpreter to compile/run without the standard library (2) you implement a tool/flag that merges a package into a self-contained file with everything it uses (including imported things, including the standard library) (3) you feed the test case reducer the output of step (2).
(I'm using "shrinking" and "reducing" interchangeable here, but in the context of PBT you want to shrink while maintaining the properties of the input. Still, you should be able to apply the idea to test case reduction in general, not just with PBT.)