Combining Hypothesis with Pytest's mark.parametrize()

212 views
Skip to first unread message

Torsten Anders

unread,
Jul 12, 2023, 7:43:26 AM7/12/23
to Hypothesis users
Hi,

I would like to run tests where I generate some (complex) input data with custom Hypothesis strategies, but additionally I need to systematically check all possible values of some other data. (Context: I want to test the behaviour of semi-automatically generated functions, and I want to explicitly test each of these functions.)

It seems I can write tests that combine Hypothesis strategies with the systematic checking of some data by combining Hypothesis' @given() decorator with the Pytest decorator @mark.parametrize(). However, my tests so far do now quite behave as I want them to. Below is some dummy test using such a combined test inputs strategy, but with simpler test data and no actual test.

import pytest
import hypothesis.strategies as hst

@hyp.given(
ys = hst.lists((hst.floats())),
    seed=hst.integers(min_value=1, max_value=10000))
@pytest.mark.parametrize("x", range(3))
@hyp.settings(max_examples=10)
def test_dummy(ys, seed, x):
# No test for simplicity
assert True


When running this test with pytest -vvs, I get the logging output shown below this message. As you can see there, pytest systematically tries all the given values for x in their order, and the values from Hypothesis are randomised, as intended.


OK, here now finally are my questions/problems:

1. Is there a way to ensure that the test data from Hypothesis is more random? As you can see from the logging below, it appears that the data from Hypothesis contains many repeated value combinations (here, ys=[0.0], seed=1), and I see the same in my actual tests. I assume that there is no shrinking going on here, as none of these tests failed. Also, even if there would be some shrinking involved, I would still not expect to have exactly the same test inputs tried over and over.

2. Currently my test quasi loops over the data from mark.parametrize() and then quasi does an inner loop with data from Hypothesis. Is it possible to do this the other way round? The latter would result in more efficient tests for me, because my values given to mark.parametrize() already exist, but my custom Hypothesis strategies are more complex.

I already tried swapping the order of the decorators and test function parameters, but surprisingly that has no effect.


# Test logging
$ pytest my/test/file.py -vvs
...
my/test/file.py::test_dummy[0] Trying example: test_dummy(
    ys=[], seed=1, x=0,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=0,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=0,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=0,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=0,
)
Trying example: test_dummy(
    ys=[], seed=3068, x=0,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=0,
)
Trying example: test_dummy(
    ys=[-2.00001], seed=6794, x=0,
)
Trying example: test_dummy(
    ys=[], seed=1, x=0,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=0,
)
PASSED
my/test/file.py::test_dummy[1] Trying example: test_dummy(
    ys=[], seed=1, x=1,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=1,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=1,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=1,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=1,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=1,
)
Trying example: test_dummy(
    ys=[-2.2250738585072014e-308, -2.925800531803837e-308, inf, nan, nan],
    seed=6971,
    x=1,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=1,
)
Trying example: test_dummy(
    ys=[0.5, -inf, 2.078615020738899e-106, nan], seed=3829, x=1,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=1,
)
PASSED
my/test/file.py::test_dummy[2] Trying example: test_dummy(
    ys=[], seed=1, x=2,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=2,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=2,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=2,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=2,
)
Trying example: test_dummy(
    ys=[], seed=240, x=2,
)
Trying example: test_dummy(
    ys=[], seed=1, x=2,
)
Trying example: test_dummy(
    ys=[], seed=1, x=2,
)
Trying example: test_dummy(
    ys=[], seed=1, x=2,
)
Trying example: test_dummy(
    ys=[0.0], seed=1, x=2,
)
PASSED
...

Torsten Anders

unread,
Jul 12, 2023, 1:07:14 PM7/12/23
to Hypothesis users
> 1. Is there a way to ensure that the test data from Hypothesis is more random?
It turns out that this is general Hypothesis behaviour, unrelated to the combination with mark.parametrize(). The way around it seems only to greatly increase max_examples.

> 2
For swapping the "looping nesting" I have no answer yet, but perhaps the seemingly fixed approach observable is required for proper shrinking, but that is just a guess.

Zac Hatfield Dodds

unread,
Aug 19, 2023, 10:53:22 PM8/19/23
to Torsten Anders, Hypothesis users
Hi Torsten

The current 'loop order' is because from Hypothesis' perspective, each set of values from Pytest is a separate function call.  In principle it would be possible to interleave them (e.g. by manually using the fuzz_one_input hook), but in practice it would be very complex for very little gain.

The obvious workaround would be to use a `for x in range(3):` loop inside your test function - it's a little less obvious which of those parameters was responsible if the test fails, but plausibly the performance gain is worth it for you.

Best,
Zac

--
You received this message because you are subscribed to the Google Groups "Hypothesis users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hypothesis-use...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/hypothesis-users/8bd24b0e-0587-4c10-bcfc-2b9a09bc9016n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages