Property-based "fuzzy" testing (Hypothesis) in Django

217 views
Skip to first unread message

Elena Williams

unread,
Feb 21, 2020, 12:34:57 AM2/21/20
to django-d...@googlegroups.com
Dear Django dev community,

More and more of the python ecosystem is implementing property-based testing to find difficult and counter-intuitive edge cases using the library hypothesis: https://github.com/HypothesisWorks/hypothesis/tree/master/hypothesis-python

For example testing using hypothesis has found bugs in the python language itself, apparently JSON and regex implementation is even more interesting than it seems it should be. Zac Hatfield-Dodds (https://github.com/Zac-HD) has been invited to talk about hypothesis at the language summit at PyCon this year and is again running a workshop about the library. 

The context of this email is that I've been working with Zac to try to make sense of Hypothesis in a Django user context. The current documentation for this is here if anyone is curious: https://hypothesis.readthedocs.io/en/latest/django.html ... honestly I think it's pretty opaque for a normal Django dev, though is very powerful and would replace nearly all the tests me and my friends have ever written, and moreover do a better job of it.

As the conversation continued about how testing works in a django instance (turns out we use odd patterns and it's not how they thought it works) it became a giant question mark about if/how hypothesis testing would be implemented in the django framework itself.

To briefly digress: he suggested I write an example for testing a straight model and I was baffled. This isn't something I've thought about for a long time but I did think about it once, my (potentially incorrect) belief is: ..  if I just made a vanilla django model with vanilla fields with no methods (no anything else whatsoever, say no views nor functions nor anything at all) it would be redundant for me to write any tests. They wouldn't do anything  -- there'd effectively be nothing to test. It would all have been brutally validated at the framework/database level.

What I asserted to Zac is that I consider this the responsibility of the framework -- I mean if django field validation/saving is failing tests at the ORM level on an oracle db in a certain environment or something, that might be a case a test written for model would pick up, but I would never think this was a user concern. Sure if I do something exotic I'll write my own tests, but I ... profoundly trust the ORM to validate vanilla model fields. Maybe this is old thinking or I'm misguided, I'd be fascinated to hear what others think and to follow this thread to the end, but this is a digression.

Suffice to say this does seem to point to somewhere that property-based testing would be appropriate at the framework level, not only at the user level.

I'm writing because perhaps we could talk about django using "complex" testing like this, as there are areas for which it might be appropriate to improve the robustness of the framework. It's particularly useful for improving robustness on encoding edge-cases, which are a perenniel problem we care about.

For example where data is being passed around it could perhaps be using "fuzzy" strategy, for example here (I think, though could be wrong): https://github.com/django/django/blob/master/django/test/client.py

I was going to make a ticket, but then thought to just email the list to have a chat instead.

All the very best to everyone and warm hellos to friends old and new,
---
Elena Williams
Github: elena

Elena Williams

unread,
Feb 21, 2020, 1:29:43 AM2/21/20
to django-d...@googlegroups.com
(Goodness, we're always reserved about our own work.) Yabbering further with Zac having sent this email, it become evident I didn't include the link to my personal guide for Hypothesis testing for Django Users that started all this. It's er, quite different, from the official docs and probably more friendly.

Threw it together earlier today, but I'm want to introduce why using these is a good idea and also work towards provide a guide for converting existing test suites:
https://github.com/elena/example-tests-django-hypothesis/

There's some gnarly json stuff in my own world that I'm going to be running at and making examples of once it's nutted through. Of course all feedback would be delightful.

There is also discussions with Zac and David about re-working the official docs, but we're coming from epistemologically different places, so gently gently.
---
Elena Williams
Github: elena

Adam Johnson

unread,
Feb 21, 2020, 4:22:20 AM2/21/20
to django-d...@googlegroups.com
Hi Elena,

I like fuzz testing and how Hypothesis does it. I have used it on a couple of projects and it always found bugs. In fact it was my first Pycon that I sat with David and sprinted on adding it to a project :)

That said I've always found it quite a leap from straightforward tests, being slower and non-deterministic. There is a hard balance to make writing fuzz tests versus writing individual test cases to ensure edge cases keep being covered. And also we generally want tests to be less complex than the code they are testing - short functions with several parameters can easily need many lines of hypothesis code to set up accurate testing.

Personally I tend to settle on using a little bit of fuzzing with random data from factory boy, but adding control to the random seed used in testing using pytest-randomly. I gave this approach a bit of a write up when I started using it in 2014: https://adamj.eu/tech/2014/09/03/factory-boy-fun/ .

As for Django's test sutie: I don't believe there is any fuzz testing there. But it is being tested separately instead with Google's OSS-Fuzz project: https://github.com/google/oss-fuzz/tree/master/projects/django . This was set up by Guido Vranken who announced it in this mailing list thread: https://groups.google.com/forum/#!topic/django-developers/-WweB07YiVQ . Apparently it has caught a few bugs since implementation, and there's scope to expand it quite a lot: https://groups.google.com/d/msg/django-developers/PMtvmMlsjyw/_rOj_LzvCQAJ . I *think* if we were to increase our fuzz testing, we'd probably want to concentrate efforts there. Google also offer $ rewards to critical projects like Django for reaching certain levels of coverage, which is nice of them.

Thanks,

Adam

Elena Williams

unread,
Feb 24, 2020, 10:54:58 PM2/24/20
to django-d...@googlegroups.com
Heya Adam,

Amazing! Thank you! You're correct that it's slower. 

I'd completely forgotten about FactoryBoy! though used it a fair bit a few years ago. Interesting to note that the original ruby project renamed itself to FactoryBot ("offensive or problematic" indeed). 

The conversation continued about Factory* libraries, and Zac agrees that these are useful, but has reservations that examples generated by them are fewer, and more limited in range, eg omitting empty string, control characters -- which is important for "real" bug finding, this is usualy bug gold. He also reckons that they have weaker support for reproducing failures.

Long and the short is possibly that FactoryBoy could be improved and maybe there might be cases for Django projects to use Hypothesis too :) Whatever happens I'm going to keep plugging away at hypothesis examples, and maybe making the official docs, er, more penetrable. Personally I'm doing something nuts, for which FactoryBoy is not suitable, so this makes sense in my case.

It's great to hear about the OSS-Fuzz project. I've passed the details along. It's always nice to cross-polinate and delighted Zac'll be packing for PyCon with this background under his belt. It seems likely more conversation will be happening there.

Thank you so much again,
---
Elena Williams
Github: elena

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMyDDM2%2Bbz%2BwVgiZiVG7YijwO0zzxZcfVE%3Dpxmcper7D3iUJQg%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages