At Zapier we're working with some rather complex and performance sensitive QuerySet building (most currently around experimenting with GraphQL) and I am constantly worried that the laziness of the ORM will surprise us in production (after a few levels of related nesting, the math of a mistake starts to really, really hurt). While not difficult to prevent, it does get a bit tedious maintaining constant vigilance :tm:. I was curious if sealing or locking a QuerySet has ever been seriously discussed before.
For example, you might do something like:
class Tag(models.Model):
name = models.CharField()
slug = models.SlugField()
# etc...
def get_url(self):
return reverse('tag_index', args=[self.slug])
class Post(models.Model):
title = models.CharField()
tags = models.ManyToManyField(Tag)
# etc...
posts = Post.objects.only(
'id', 'title',
).prefetch_related(
Prefetch('tags', queryset=Tag.objects.only('id', 'name')),
)
# we'd .seal() or .strict() and evaluate the queryset here:
for post in posts.seal():
# get_url() would normally execute the classic N+1 queries
# but after qs.seal() would instead raise some exception
print ', '.join(tag.get_url() for tag in post.tags.all())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SealedQuerySetException: Cannot load sealed deferred attribute 'slug' on prefetched Tag model.
Of course the obvious solution is to just add 'slug' to only(), but in a sufficiently complex application with many engineers working across various app boundaries it is both difficult to predict and test against. It requires lots of explicit self.assertNumQueries() and in the worse case can cause "production surprise" as deeply nested QuerySets can potentially explode in queries.
This doesn't apply only to deferred attributes, but also related OneToOne, ManyToOne, ManyToMany, etc. lazy resolution. FWIW, I would image the FK/M2M N+1 pattern is a much more common surprise than opting into .only() or .defer() as my example above outlines.
I had a go at implementing it outside of Django core, but ended up having to go down a monkey patching rabbit hole so I thought I'd reach out to the mailing list. I'd imagine setting a flag when evaluating a QuerySet to disable the laziness of a Model's fields and relations would be sufficient (IE: if it isn't in cache, raise SealedQuerySetException). It also seems backwards compatible, if you don't use .seal() you are still lazy like before.
So, I guess my questions are:
2. Does this seem useful to anyone else? Specifically opting to raise explicit exceptions instead of automatic (and sometimes surprising) queries.
3. Anyone have tips on implementing this as a library external to Django with lots of monkey patching?
I'd be happy to take a swing at it if there was a >50% chance that it would be at least considered.
-bryan, cto & cofounder @ zapier