--
You received this message because you are subscribed to the Google Groups "Hypothesis users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hypothesis-use...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/hypothesis-users/f64a39d8-6d17-4682-8908-bb9959713b00n%40googlegroups.com.
Some possibilities that come to mind:
- One thing to examine is whether the version() strategy that you specified will produce those examples. It looks like it is a user-defined example generation strategy. You might need to specify it to generate simpler examples or in a way that those examples are possible from the strategy.
- You can also try increasing `max_examples`.
Seeing the strategy definition as well would definitely be helpful - one perspective on this is "it seems that our versions() strategy doesn't generate "0", or perhaps other edge cases, often enough to find certain bugs".
That said, bugs which can only be triggered by a single exact value are inherently hard for randomized testing tools like Hypothesis - we upweight a lot of special cases and use various heuristics, but ultimately the 'right way' to find this kind of bug is with a SMT solver or similar. (which is why we're working with the maintainers of CrossHair to support Z3 as a Hypothesis backend!)
Specific things I'd try here:
- Try to build knowledge of your edge cases into your strategies. Restricting the search space somewhat, e.g. major versions in [0..3] and minor + patch versions in [0..15], makes it impossible to find bugs which trigger only outside that range but can make it more likely to find things inside it - make the case-by-case tradeoff in cases where you know the interaction is what matters. (but not otherwise; most missed bugs I see are because of too-narrow strategies)
- Pick the version-to-compare-to from the list of versions. This enormously upweights the chance of having some kind of collision or otherwise hitting comparison edge cases.
- Pick an arbitrary (set of?) comparisons. This would make it less likely to find your known bug, but increases the surface area of others you could find. If it's a short list, you could parametrize over it instead of using Hypothesis; alternatively turn up the max_examples.
- Use https://pypi.org/project/hypofuzz/ for coverage-feedback-guided search. Leaving this running for a few minutes, or overnight, routinely finds bugs that I hadn't otherwise found at all.
On Jan 19, 2024, at 09:05, Paul Moore <p.f....@gmail.com> wrote:
--
You received this message because you are subscribed to a topic in the Google Groups "Hypothesis users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hypothesis-users/JJbS5MD2QIE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hypothesis-use...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/hypothesis-users/efc4370b-8981-473b-97aa-4428d84980fbn%40googlegroups.com.
On Thu, 18 Jan 2024 at 04:51, Paul Zuradzki <paulzu...@gmail.com> wrote:Specific things I'd try here:
- Try to build knowledge of your edge cases into your strategies. Restricting the search space somewhat, e.g. major versions in [0..3] and minor + patch versions in [0..15], makes it impossible to find bugs which trigger only outside that range but can make it more likely to find things inside it - make the case-by-case tradeoff in cases where you know the interaction is what matters. (but not otherwise; most missed bugs I see are because of too-narrow strategies)
That's interesting. I'd gone in the other direction - make the strategy general, so it checks the extreme cases I wouldn't otherwise think of. Versions with 20+ components, versions with the incredibly obscure epoch component, etc... But I think I had a naive view, that the test would "cover" the space of possibilities, whereas in fact, because it's only generating a set number of examples, it will give *less* complete coverage the bigger the space is.