Sealing or locking QuerySets -- a suggestion to prevent surprise N+1 queries.

328 views
Skip to first unread message

Bryan Helmig

unread,
Jan 3, 2018, 12:34:21 PM1/3/18
to Django developers (Contributions to Django itself)
At Zapier we're working with some rather complex and performance sensitive QuerySet building (most currently around experimenting with GraphQL) and I am constantly worried that the laziness of the ORM will surprise us in production (after a few levels of related nesting, the math of a mistake starts to really, really hurt). While not difficult to prevent, it does get a bit tedious maintaining constant vigilance :tm:. I was curious if sealing or locking a QuerySet has ever been seriously discussed before.

For example, you might do something like:

class Tag(models.Model):
    name = models.CharField()
    slug = models.SlugField()
    # etc...

    def get_url(self):
        return reverse('tag_index', args=[self.slug])

class Post(models.Model):
    title = models.CharField()
    tags = models.ManyToManyField(Tag)
    # etc...

posts = Post.objects.only(
    'id', 'title',
).prefetch_related(
    Prefetch('tags', queryset=Tag.objects.only('id', 'name')),
)

# we'd .seal() or .strict() and evaluate the queryset here:
for post in posts.seal():
    # get_url() would normally execute the classic N+1 queries
    # but after qs.seal() would instead raise some exception
    print ', '.join(tag.get_url() for tag in post.tags.all())

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SealedQuerySetException: Cannot load sealed deferred attribute 'slug' on prefetched Tag model.

Of course the obvious solution is to just add 'slug' to only(), but in a sufficiently complex application with many engineers working across various app boundaries it is both difficult to predict and test against. It requires lots of explicit self.assertNumQueries() and in the worse case can cause "production surprise" as deeply nested QuerySets can potentially explode in queries.

This doesn't apply only to deferred attributes, but also related OneToOne, ManyToOne, ManyToMany, etc. lazy resolution. FWIW, I would image the FK/M2M N+1 pattern is a much more common surprise than opting into .only() or .defer() as my example above outlines.

I had a go at implementing it outside of Django core, but ended up having to go down a monkey patching rabbit hole so I thought I'd reach out to the mailing list. I'd imagine setting a flag when evaluating a QuerySet to disable the laziness of a Model's fields and relations would be sufficient (IE: if it isn't in cache, raise SealedQuerySetException). It also seems backwards compatible, if you don't use .seal() you are still lazy like before.

So, I guess my questions are:

1. Are there any other past discussions around this topic I can look at? I did find https://code.djangoproject.com/ticket/26481 which seems similar, but doesn't mention relationships.
2. Does this seem useful to anyone else? Specifically opting to raise explicit exceptions instead of automatic (and sometimes surprising) queries.
3. Anyone have tips on implementing this as a library external to Django with lots of monkey patching?

I'd be happy to take a swing at it if there was a >50% chance that it would be at least considered.

-bryan, cto & cofounder @ zapier

Tobias McNulty

unread,
Jan 3, 2018, 1:51:10 PM1/3/18
to django-developers
1. I also could not find anything other than #26481 and a brief discussion of it on the list.

2. I've often wished for something like this, but ended up resorting to assertNumQueries() for lack of a better solution. So yes, it'd certainly be useful to me/us, and I'd be curious to see how one might go about implementing it.

3. My gut says that an API, in Django core, that builds off of #26481 (e.g., .only(strict=True) and .defer(strict=True)) is the right direction (even for related fields), and my preference would be for a loud failure (i.e., an exception rather than a log message) when this "strict" mode is enabled. One can easily downgrade that to a log message when needed, less so the other way around.


Tobias McNulty
Chief Executive Officer

tob...@caktusgroup.com
www.caktusgroup.com


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/962ca077-cde8-476e-92db-47adf5785866%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

charettes

unread,
Jan 3, 2018, 6:34:58 PM1/3/18
to Django developers (Contributions to Django itself)
Hey everyone,

I also believe this is a feature that should be in core
judging by the number of times I had to enable query logging
in the past or rely on assertNumQueries to make sure no extra
queries would be inadvertently introduced in a critical code
path.

In the light of Anssi's comments on #26481 I tried implementing
a PoC as a third party app and while it required a bit of hacking
with internal APIs it was easier than I initially expected[0].

I only had to monkey patch ManyToManyField.contribute_to_class()
because unlike ForeignObject it doesn't rely on class level attributes
to create it's forward descriptor[1].

That gives me hope that it should be feasible to add such a feature
to core if deemed useful without invasive changes.

Cheers,
Simon

[0] https://github.com/charettes/django-seal
[1] https://github.com/django/django/blob/602481d0c9edb79692955e073fa481c322f1df47/django/db/models/fields/related.py#L1590

Andres Osinski

unread,
Jan 3, 2018, 11:14:21 PM1/3/18
to django-d...@googlegroups.com
I'm very interested in this feature and would love to assist in making it happen.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

For more options, visit https://groups.google.com/d/optout.



--
Andrés Osinski

Shai Berger

unread,
Jan 4, 2018, 1:32:19 AM1/4/18
to django-d...@googlegroups.com
Hi all,

Django 2.0 has a new feature[1] which allows you to say "I expect no actual
database queries in this piece of code". It is not designed to stop a specific
queryset from spawning requests, so getting it to do exactly what's asked for
in this thread may be a little involved, but if your goal is to prevent
surprise queries, I think it is actually better than sealing a single
queryset.

HTH,
Shai

[1] https://docs.djangoproject.com/en/2.0/topics/db/instrumentation/

Adam Johnson

unread,
Jan 4, 2018, 5:55:21 AM1/4/18
to django-d...@googlegroups.com
1. I don't think the seal API on the QuerySet is enough, there are other ways implicit queries can be triggered without a QuerySet, e.g. Book(author_id=1).author . Shai's suggestion of using a query execution blocker is probably the best approach. If you were really worried you could even implement it the other way round, have a permanent block and then 'unlock it' around blocks of code you expect to make queries.

2. https://github.com/YPlan/django-perf-rec is assertNumQueries on steroids and we used it at YPlan to great success :)

3. auto-prefetch would get rid of a lot of N+1 query problems: https://groups.google.com/forum/#!topic/django-developers/EplZGj-ejvg
--
Adam

charettes

unread,
Jan 4, 2018, 9:32:13 PM1/4/18
to Django developers (Contributions to Django itself)
Shai,

`execute_wrapper` can be useful to prevent any queries from
happening at all but I believe there's merit in wanting to
prevent objects retrieved from a queryset from executing ghost
queries during manipulations of the returned `Model` instances
while allowing other unrelated queries to be performed.

For example, what if a database based cache backend needs to be
called during a queryset serialization, or that the `ContentType`
in-memory cache is not warmed up during serialization, or that
any other kind of query, that is not linked to the manipulated
queryset needs to be executed during the serialization process.

Adam,

While I understand the desire to make life easier for Django
newcomers through automated ORM optimization I feel like there
should still be a way to help developers familar with existing
manual optimization techniques to avoid unexpected queries when
they've clearly defined what they expect the ORM to do.

I feel like a few contributors in the thread you've linked to
express similar concerns.

e.g.

> I completely agree that visibility of this problem is a major
> issue, and would really welcome work on improving this,
> especially in DEBUG mode. One option might be a method that
> replaces lazy loads with exceptions.

- Luke Plant[0]

Best,
Simon

[0] https://groups.google.com/d/msg/django-developers/EplZGj-ejvg/cHyFdoaXCAAJ

Josh Smeaton

unread,
Jan 4, 2018, 9:41:49 PM1/4/18
to Django developers (Contributions to Django itself)
I wasn't aware of this new feature Shai, thanks for pointing it out!

For this particular case I'd prefer locking to be bound to a particular queryset rather than the database as a whole. I would also expect it to fail loudly when accessing a non-fetched related object (book.author), which can be a common cause of pain.

I'm also still very interested in auto-prefetch Adam, is there any help I can lend?

Gordon Wrigley

unread,
Jan 8, 2018, 6:04:52 AM1/8/18
to Django developers (Contributions to Django itself)
Regarding auto prefetch, after the discussion here I created a ticket https://code.djangoproject.com/ticket/28586
It didn't get much attention presumably because it was around Django 2 release time.
I have a todo note to add tests and doco and generally get it to a mergable state in order to push the conversation along.
Also the pull as it stands has a little bug I have to upload the fix for.
Reply all
Reply to author
Forward
0 new messages