A testing conundrum

7 points by ubernostrum

JulianWgs

@given(python_data, python_data)
def test_two(data1, data2):
    h1 = Hasher()
    h1.update(data1)
    h2 = Hasher()
    h2.update(data2)

    if data1 == data2:
        assert h1.digest() == h2.digest()
    else:
        assert h1.digest() != h2.digest()

This is really inefficient. Instead use assume on data1 == data2 at the start of the test to mark the data generated as bad. The test will not continue after this saving the unnecessary computation.

However that doesn’t solve the problem. For that I would probably create a data juggling function which can change the order of dicts for nested data structures. That is complex and could also contain bugs, but chances that both functions contain the same bugs are minimal. May be also simplifying the test code to sets and dicts first could lead to some interesting investigation.

Another idea would be to search for a third party package which compares nested data structures exactly (and hope that it has been battle tested and not contain bugs).

nedbat

Maybe I don't understand what assume is for. I don't want Hypothesis to avoid data pairs that are unequal. I want to test that those pairs produce unequal hashes.
- JulianWgs
  
  You could have one strategy to find all the pairs that are equal and another to find all pairs that are not. That would make for a more balanced and efficient test. This code example can also pass if there was never an equal pair. (But I have to admit my original comment was a little unclear on that part, so thanks for asking). In machine learning it is called class inbalance.
mxey

I use a Go library called Rapid that is modeled after Hypothesis. That one lets you skip a generated example. Does Hypothesis have something like that, so you could filter out inputs you don’t care about?