More efficient m2m assignment SQL

92 views
Skip to first unread message

Jeremy Dunck

unread,
Jul 27, 2012, 1:15:06 AM7/27/12
to django-d...@googlegroups.com
I have 2 models:

class District(Model):
pass

class Voter(Model):
districts = ManyToManyField(District)

Sometimes I want to reset the m2m list:

voter.districts = other_districts

I noticed that this uses ManyRelatedManager._clear_items, which, among
other things, boils down to:

voter.districts.through._default_manager.filter(voter_id=voter.pk).delete()

I would have expected something along the lines of :

DELETE FROM `api_voter_districts` WHERE `voter_id` = 1

But instead it's 2 queries:

SELECT `api_voter_districts`.`id`, `api_voter_districts`.`voter_id`,
`api_voter_districts`.`district_id` FROM `api_voter_districts` WHERE
`api_voter_districts`.`voter_id` = 1

DELETE FROM `api_voter_districts` WHERE `id` IN (2, 3, 4, 5, 6,...)

Was it intentional to take 2 queries? That is, is there a use case
where avoiding the single DELETE query is preferable?

Anssi Kääriäinen

unread,
Jul 27, 2012, 2:26:05 AM7/27/12
to Django developers
To me it seems there is no reason to do two queries in delete if the
following conditions hold:
- there are no signals sent for the deleted objects
- the delete does not cascade further (actually you don't need to
collect even in this case - you can do the cascade query completely in
the DB without fetching the IDs first).

For automatic M2M tables the above conditions always hold, but it is
not guaranteed if you are using the "through" argument.

To me it seems the correct fix is checking the above conditions. If
they hold, issue immediate delete query, if not, you need to collect
the objects, then delete. Doing the delete in single query can be a
huge win if you are deleting a lot of objects in one go. Actually, if
you are deleting too many objects, you will likely go OOM.

I am not objecting fixing this just for automatic m2m tables if
checking the conditions mentioned above turns out to be complicated.

- Anssi

Jeremy Dunck

unread,
Jul 27, 2012, 11:50:42 AM7/27/12
to django-d...@googlegroups.com
On Thu, Jul 26, 2012 at 11:26 PM, Anssi Kääriäinen
<anssi.ka...@thl.fi> wrote:
> On 27 heinä, 08:15, Jeremy Dunck <jdu...@gmail.com> wrote:
...
>> I would have expected something along the lines of :
>>
>> DELETE FROM `api_voter_districts` WHERE `voter_id` = 1
>>
>> But instead it's 2 queries:
>>
>> SELECT `api_voter_districts`.`id`, `api_voter_districts`.`voter_id`,
>> `api_voter_districts`.`district_id` FROM `api_voter_districts` WHERE
>> `api_voter_districts`.`voter_id` = 1
>>
>> DELETE FROM `api_voter_districts` WHERE `id` IN (2, 3, 4, 5, 6,...)
...
> To me it seems there is no reason to do two queries in delete if the
> following conditions hold:
> - there are no signals sent for the deleted objects
> - the delete does not cascade further (actually you don't need to
> collect even in this case - you can do the cascade query completely in
> the DB without fetching the IDs first).
>
> For automatic M2M tables the above conditions always hold, but it is
> not guaranteed if you are using the "through" argument.

Ah, yes, hmm. Actually, it *is* possible to get signals attached to
the through.

from django.db.models import signals
signals.pre_delete.connect(handler, sender=Voter.districts.through)

So I don't think we can even take that shortcut on just the auto m2m
tables. As the "signals maintainer", let me just say: damn, they
cause a lot of trouble. :-)

OK, it would be easy enough to add a signal method to see if there are
signals attached for a given sender (model in this case).

I'll work up a patch for the simplified case of auto-intermediates and
no receivers for pre/post delete and see where that gets us in
performance.

Do we already have machinery in place for "does this model have the
potential to cascade"? I see Collector uses
._meta.get_all_related_objects, but that seems pretty heavy for the
simple need to branch depending on "yes or no".

Anssi Kääriäinen

unread,
Sep 20, 2012, 12:01:50 PM9/20/12
to Django developers
On 27 heinä, 18:50, Jeremy Dunck <jdu...@gmail.com> wrote:
> Ah, yes, hmm.  Actually, it *is* possible to get signals attached to
> the through.
>
> from django.db.models import signals
> signals.pre_delete.connect(handler, sender=Voter.districts.through)
>
> So I don't think we can even take that shortcut on just the autom2m
> tables.  As the "signals maintainer", let me just say: damn, they
> cause a lot of trouble. :-)
>
> OK, it would be easy enough to add a signal method to see if there are
> signals attached for a given sender (model in this case).
>
> I'll work up a patch for the simplified case of auto-intermediates and
> no receivers for pre/post delete and see where that gets us in
> performance.
>
> Do we already have machinery in place for "does this model have the
> potential to cascade"?  I see Collector uses
> ._meta.get_all_related_objects, but that seems pretty heavy for the
> simple need to branch depending on "yes or no".

This issue came up in #django-dev recently and I decided to do
something about this. I think I have something commit-quality done for
this, see: https://code.djangoproject.com/ticket/18676#comment:2 and
https://github.com/akaariai/django/compare/fast_delete

The code has full fast path support - we can bulk delete in the DB if
there are no signals, and no cascades further (including parent models
and generic foreign keys).

I will be pushing this into 1.5 before feature freeze if no blocker
issues arise.

- Anssi
Reply all
Reply to author
Forward
0 new messages