Here I have to beg to differ, with an example provided by hypothesis itself:
pair = ({-7, 1}, {-7, 1})
@given(set_shuffled_pairs(st.sets(st.integers())))
def test_iteration_order_permuted(pair):
left, right = pair
> assert list(left) == list(right)
E assert [1, -7] == [-7, 1]
E
E At index 0 diff: 1 != -7
E Use -v to get more diff
E Falsifying example: test_iteration_order_permuted(
E pair=({-7, 1}, {-7, 1}),
E )
tests/test_contents.py:39: AssertionError
Here we have two equal sets that have different iteration order, as permuted by the set_shuffled_pairs in my original question.
Of course, as you point out, on a different pytest run this example might pass and others might fail. Since it is as simple as applying "list", or writing a for loop, I suspect that many tests involving sets suffer from order-dependence, and thus run-to-run flakiness. It might be worth adding a note about setting PYTHONHASHSEED to improve reproducibility, perhaps to
https://hypothesis.readthedocs.io/en/latest/details.html#making-random-code-deterministic? It doesn't seem to be something that can be changed within an interpreter process, but possibly tools like hypofuzz might be able to arrange for it to be automatically used. Possibly hypothesis could detect when PYTHONHASHSEED was not set and report it as a possible cause of failure to reproduce examples from the database? I'm not sure when this report would be useful, since normally examples from the database that don't reproduce are because the bug was fixed.
Thanks,
Anne