Prefetch object with to_attr set to the same name as the field triggers rows deletions

163 views
Skip to first unread message

mccc

unread,
Nov 5, 2015, 9:04:31 AM11/5/15
to Django users
I'm using a Prefetch object to follow some GenericRelation, and I was playing around with the to_attr parameter;
I had the DB debug turned on, as I was checking whether it was making any difference, and I started noticing some DELETE statements on the remote table, so I started investigating.
It looks like those deletions are triggered within the prefetch process when the to_attr is set to the same name as the field; also, it does not seem to happen if the relationship is not a generic one.
While this usage is presumably wrong, it is not mentioned anywhere in the docs; also, I think that deleting rows from the database is a somewhat extreme way to educate users :)
Can someone enlighten me on what's going on here?

Thank you very much.

Simon Charette

unread,
Nov 5, 2015, 10:14:06 AM11/5/15
to Django users
Hi mccc,

Would it be possible to provide an example of the models and actions required to trigger the deletion.

I suppose the deletion is triggered by the many to many relationship assignment logic and we should simply raise a ValueError if `to_attr` is set the same name as the referred relationship.

Thanks,
Simon

Michele Ciccozzi

unread,
Nov 5, 2015, 11:17:04 AM11/5/15
to django...@googlegroups.com
Hello Simon,

I'm going to paste a trimmed-down version of the involved classes:

class CeleryTask(models.Model):
    celery_task_id = models.CharField(max_length=50, primary_key=True)
    task_type = models.ForeignKey(ContentType)
    task_id = models.PositiveIntegerField()
    task = GenericForeignKey('task_type', 'task_id')

class AccountManagementTask(models.Model):
    account = models.ForeignKey(Account, null=True, editable=False)
    task_content_type = models.ForeignKey(ContentType)
    task_content_id = models.PositiveIntegerField()
    task_content = GenericForeignKey('task_content_type', 'task_content_id')
    celery_task = GenericRelation(CeleryTask, content_type_field='task_type', object_id_field='task_id',
                                  related_query_name='account_management_tasks')

(the task_content thing is involved in a follow-up question that will be asked in a short while)

This is the code triggering the weirdness:

        previous_tasks = AccountManagementTask.objects.filter(account=account)\
            .prefetch_related(
                Prefetch('celery_task',
                         to_attr='celery_task'))\
            .prefetch_related(
                Prefetch('task_content',
                         to_attr='task_content'))\
            .annotate(created_on=F('celery_task__created_on'))\

Immediately after that, the QuerySet is sent off to a simple DRF serializer - which doesn't even touch the issue since the relevant piece of information is annotated:

class AccountManagementTaskSerializer(serializers.ModelSerializer):
    def to_representation(self, instance):
        return {'id': instance.pk,
                'created_on': instance.created_on,
                'date_from': instance.task_content.date_from,
                'date_to': instance.task_content.date_to}

    class Meta:
        model = AccountManagementTask


Now the follow-up, that you might already have guessed: if I *do* use the same name for following the GenericForeignKey relation (task_content), the content of the relationship does not get deleted(no surprise there..?) and I can go with my naïve serialization over there, while if I specify a different to_attr name I get back a list and I have to deal differently with that.
How do these prefetch_related / Prefetch to_attr / Generic* things really work?

Merci beaucoup,
Michele

--
You received this message because you are subscribed to a topic in the Google Groups "Django users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-users/CDe4McxxCsI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/3a9eae38-e532-4c28-be04-61b5b62ed030%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Simon Charette

unread,
Nov 5, 2015, 6:00:11 PM11/5/15
to Django users
Bonsoir Michele,

I'll try to explain what happens here with a simplified model definition assuming you are using Django 1.8. I used a GenericRelation like your provided example but this issue can be triggered for about any M2M relationship.

Given the following models:

from django.db import models
from django.contrib.contenttypes.fields import (
   
GenericForeignKey, GenericRelation
)
from django.contrib.contenttypes.models import ContentType

class TaggedItem(models.Model):
    tag
= models.SlugField()
    content_type
= models.ForeignKey(ContentType)
    object_id
= models.PositiveIntegerField()
    content_object
= GenericForeignKey('content_type', 'object_id')

class Bookmark(models.Model):
    url
= models.URLField()
    tags
= GenericRelation(TaggedItem)

Assigning directly to a Bookmark instance tags attribute (a ReverseGenericRelatedObjectsDescriptor instance) will result in a delete query followed by multiple insert or update queries depending on whether they exist or not. For example:

>>> b = Bookmark.objects.create(url='www.djangoproject.com')
>>> # The following line will result in the following queries:
>>> # 1) DELETE FROM bookmark WHERE id = b.id;
>>> # 2) INSERT INTO taggeditem ...;
>>> # 3) INSERT INTO taggeditem ...;
>>> b.tags = [TaggedItem(tag='Python'), TaggedItem(tag='Web')]

Note that the DELETE query is not run as of 1.9 since Django now performs updates in 'bulk' instead (that is comparing the existing relationship with this specified value and only perform the required insert/update).

Now, as documented, when you specify a `to_attr` Django simply uses `setattr()` onder de motorkap and ends up triggering the code path described above if the specified attribute happens to be a reverse relationship descriptor. Note that is behavior will be deprecated in Django 1.10 and removed in Django 2.0.

I have to say I'm unsure how we should proceed here. I guess I would favor raising a ValueError if the model the list of prefetched objects will be attached to has a reverse descriptor for the specified `to_attr` at least until this behavior is deprecated since it looks like a major footgun to me.

I guess we should move the discussion to the development mailing list to gather feedback.

Au plaisir,
Simon

Michele Ciccozzi

unread,
Nov 6, 2015, 5:01:35 AM11/6/15
to django...@googlegroups.com
Thank you very much, Simon, for the detailed answer.
Now I'm digging through Django code, pull requests, and tickets, and it's all very interesting.
Thank you also for the reminder of incoming deprecations: we'll definitely be on the lookout for that!

All the best,
Michele

Simon Charette

unread,
Nov 7, 2015, 5:35:57 AM11/7/15
to Django users
Hi Michele,

I don't if someone beat you to it but a ticket for this exact issue just surfaced[1].

Please chime in on Trac if you'd like to follow the progress of the resolution.

Thanks,
Simon

[1] https://code.djangoproject.com/ticket/25693

Michele Ciccozzi

unread,
Nov 9, 2015, 4:28:03 AM11/9/15
to django...@googlegroups.com
Hello Simon,

Thank you very much for this very interesting follow-up: it's like everyday there's a new reason to keep digging in the source code!
I'll see you there ;)

All the best,
Michele

Reply all
Reply to author
Forward
0 new messages