Dictionary/set order

Anne Archibald

unread,

Feb 29, 2024, 4:12:36 AM2/29/24

to Hypothesis users

Hi,

Python's dictionaries are now guaranteed to preserve the order in which keys are added, but you can still have two dictionaries that test equal but have a different iteration order. Sets don't have a consistent iteration order (normally differs from run to run) but can likewise differ even with identical contents. It would sometimes be useful to verify that the results of some operation (serialisation, say) are independent of iteration order for these objects. Does hypothesis generate the same dictionaries/sets with different iteration orders (explicitly or incidentally)? Conversely, when hypothesis is re-running a test (as with @example or during shrinking or checking for flakiness), is it guaranteed that the input objects will have the same iteration order?

I know that it is possible to generate pairs with (potentially) different iteration orders by applying permutations to a sorted list-ified object (as below), so it is possible to test one's code's independence of iteration order. But does hypothesis attempt to explore this easily forgotten way common Python objects can differ?

Concretely, I am serialising sets and dicts to json, where they will be kept in version control, and I want to ensure that the representation doesn't change unless the objects do. For sets this is easy:

@st.composite
def set_shuffled_pairs(draw, sets):
    left = draw(sets)
    left_list = sorted(left)
    right = set(draw(st.permutations(left_list)))
    return left, right

Unfortunately I have complicated nested objects (examples generated with from_type) some of whose contents are dictionaries and sets, and which are serialised through dictionaries, and it will be a challenge to go through and randomise the orders of all the dictionaries and sets.

Thanks,

Anne

Zac Hatfield Dodds

unread,

Mar 2, 2024, 4:37:34 AM3/2/24

to Anne Archibald, Hypothesis users

This is moderately subtle, so I cross-posted it over to https://stackoverflow.com/questions/78091955/ and answered there :-)

Unfortunately I think that you'll want to explicitly test that "permuting iteration order of collections doesn't change the serialization", which does mean writing some code to apply the permutation. Although note that you _can't_ permute the iteration order of a set!

Best,

Zac

--
You received this message because you are subscribed to the Google Groups "Hypothesis users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hypothesis-use...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/hypothesis-users/c7741285-7437-4c38-9d9c-9c701f063f4an%40googlegroups.com.

Anne Archibald

unread,

Mar 4, 2024, 9:22:59 AM3/4/24

to Zac Hatfield Dodds, Hypothesis users

On Sat, 2 Mar 2024 at 09:37, Zac Hatfield Dodds <zac.hatfi...@gmail.com> wrote:

This is moderately subtle, so I cross-posted it over to https://stackoverflow.com/questions/78091955/ and answered there :-)

Thanks!

Unfortunately I think that you'll want to explicitly test that "permuting iteration order of collections doesn't change the serialization", which does mean writing some code to apply the permutation. Although note that you _can't_ permute the iteration order of a set!

Here I have to beg to differ, with an example provided by hypothesis itself:

pair = ({-7, 1}, {-7, 1})

@given(set_shuffled_pairs(st.sets(st.integers())))
def test_iteration_order_permuted(pair):
left, right = pair
> assert list(left) == list(right)
E assert [1, -7] == [-7, 1]
E
E At index 0 diff: 1 != -7
E Use -v to get more diff
E Falsifying example: test_iteration_order_permuted(
E pair=({-7, 1}, {-7, 1}),
E )

tests/test_contents.py:39: AssertionError

Here we have two equal sets that have different iteration order, as permuted by the set_shuffled_pairs in my original question.

Of course, as you point out, on a different pytest run this example might pass and others might fail. Since it is as simple as applying "list", or writing a for loop, I suspect that many tests involving sets suffer from order-dependence, and thus run-to-run flakiness. It might be worth adding a note about setting PYTHONHASHSEED to improve reproducibility, perhaps to https://hypothesis.readthedocs.io/en/latest/details.html#making-random-code-deterministic? It doesn't seem to be something that can be changed within an interpreter process, but possibly tools like hypofuzz might be able to arrange for it to be automatically used. Possibly hypothesis could detect when PYTHONHASHSEED was not set and report it as a possible cause of failure to reproduce examples from the database? I'm not sure when this report would be useful, since normally examples from the database that don't reproduce are because the bug was fixed.

Thanks,

Anne

--

Reply all

Reply to author

Forward