./manage.py loaddata calls .save() on all models... should it?

234 views
Skip to first unread message

George Karpenkov

unread,
Mar 30, 2011, 2:21:43 AM3/30/11
to Django developers
If we'll look into core/management/commands/loaddata we'll see the
line
"obj.save(using=using)" which saves the data.

*however* consider the case when application has some custom database-
altering logic in .save method. The common thing that comes to mind is
timestamp, or something similar. What would happen is that instead of
loading data the data from the fixture will be partly changed, which
is really not what you want.

That's not the worst case though - I've just spent 40 minutes loading
the database from a fixture which contained data in django-tagging,
which inserts it's own "INSERT" SQL statements into save. So loaddata
was consistently crashing with "column blah is not unique" while the
data from the dump was perfectly fine. It made me quite sad.

so coming to think of it i can't really think of a use case where
you'd want the custom logic in .save() to be executed. All the data is
already there, and we *know* that it is valid data - so what else we
might possibly want to do with it? (unless our application
communicates over network with different services and it uses .save()
to maintain integrity with them, but I haven't seen a single django
website like that)
So I think that some lower-level logic should be called instead.

Any comments on why loaddata was implemented this way in the first
place?

George Karpenkov

unread,
Mar 30, 2011, 2:27:15 AM3/30/11
to Django developers
Oh okay, apparently there is a ticket from 4 years ago
http://code.djangoproject.com/ticket/4459

Russell Keith-Magee

unread,
Mar 30, 2011, 2:30:52 AM3/30/11
to django-d...@googlegroups.com
On Wed, Mar 30, 2011 at 2:21 PM, George Karpenkov
<true.c...@gmail.com> wrote:
> If we'll look into core/management/commands/loaddata we'll see the
> line
> "obj.save(using=using)" which saves the data.

... and if you dig a little deeper, you'll find that "obj" in that
context is a "DeserializedObject", and calling save() on a
deserialized object invokes a "raw save" on the underlying object's
base save.

That is, the save() method on the object *shouldn't* be invoked as a
result of loading a fixture. That's what was reported in #4459, and
fixed in r5658.

If you can provide a test case that demonstrates otherwise, please
open a ticket.

Yours,
Russ Magee %-)

George Karpenkov

unread,
Mar 30, 2011, 4:55:35 AM3/30/11
to Django developers
Oh thanks Russel!

Turns out django-tagging was creating those objects via the post-save
signal hook.

http://code.djangoproject.com/ticket/8399 <- here is the ticket which
proposes an option to disable the signal handling during the loaddata
operation.

Malcolm says that some people might want to use signals while running
loaddata - ie to create related objects.

I don't really understand why anyone would want that -- the only use
case I've ever seen for the loaddata operation was "dump the entire
database -> load entire database", though I guess use cases differ.

Any updated comments on disabling the signal handling for the loaddata
operation?

On Mar 30, 5:30 pm, Russell Keith-Magee <russ...@keith-magee.com>
wrote:
> On Wed, Mar 30, 2011 at 2:21 PM, George Karpenkov
>

Russell Keith-Magee

unread,
Mar 30, 2011, 6:26:12 AM3/30/11
to django-d...@googlegroups.com
On Wed, Mar 30, 2011 at 4:55 PM, George Karpenkov
<true.c...@gmail.com> wrote:
> Oh thanks Russel!
>
> Turns out django-tagging was creating those objects via the post-save
> signal hook.
>
> http://code.djangoproject.com/ticket/8399 <- here is the ticket which
> proposes an option to disable the signal handling during the loaddata
> operation.
>
> Malcolm says that some people might want to use signals while running
> loaddata - ie to create related objects.
>
> I don't really understand why anyone would want that -- the only use
> case I've ever seen for the loaddata operation was "dump the entire
> database -> load entire database", though I guess use cases differ.

There's the rub -- use cases do differ. Even in the "dump and load"
scenario, there are at least two ways to interpret the problem:

* Dump the *entire* database, and load the *entire* database
* Dump the "core" of the database, and reconstitute parts of the
database that can be automatically rebuilt.

Consider, for example, model permissions or contenttypes -- should
they be serialized and reloaded, or generated and reference (using
natural keys)?

> Any updated comments on disabling the signal handling for the loaddata
> operation?

Not beyond what is listed in that ticket. The original solution
proposed is the obvious solution and Malcolm has raised the obvious
objection. The ticket has been accepted; there is merit in the idea in
principle; it's just a matter of finding a workable solution that
doesn't suffer from the problem Malcolm highlights.

I can see how gsong's solution would work, but I'm not wild about it.
Firstly, it requires stack inspection, which wont necessarily catch
all the ways the signal may need to be silenced. Secondly, it requires
registration at the time the signal is defined, which means that if a
signal is provided as part of a third party app, the app author (and
not the end-user) makes the decision as to whether the signal should
be silenced. This may be possible in many cases, but there will be
cases where this isn't possible.

Yours,
Russ Magee %-)

Patryk Zawadzki

unread,
Mar 31, 2011, 3:02:13 PM3/31/11
to django-d...@googlegroups.com
On Wed, Mar 30, 2011 at 10:55 AM, George Karpenkov
<true.c...@gmail.com> wrote:
> Oh thanks Russel!
>
> Turns out django-tagging was creating those objects via the post-save
> signal hook.
>
> http://code.djangoproject.com/ticket/8399 <- here is the ticket which
> proposes an option to disable the signal handling during the loaddata
> operation.
>
> Malcolm says that some people might want to use signals while running
> loaddata - ie to create related objects.
>
> I don't really understand why anyone would want that -- the only use
> case I've ever seen for the loaddata operation was "dump the entire
> database -> load entire database", though I guess use cases differ.
>
> Any updated comments on disabling the signal handling for the loaddata
> operation?

Rather simple fix to the signal handler:

def on_something_saved(sender, instance, created, **kwargs):
if not kwargs.get('raw', False):
do_stuff()

--
Patryk Zawadzki
I solve problems.

Jeremy Dunck

unread,
Mar 31, 2011, 3:40:38 PM3/31/11
to django-d...@googlegroups.com, Patryk Zawadzki
On Thu, Mar 31, 2011 at 2:02 PM, Patryk Zawadzki <pat...@pld-linux.org> wrote:
...

> Rather simple fix to the signal handler:
>
> def on_something_saved(sender, instance, created, **kwargs):
>    if not kwargs.get('raw', False):
>        do_stuff()

Concur, if the handler shouldn't run on loaddata, checking raw is the
right change.

Hobson Lane

unread,
Aug 29, 2012, 5:11:11 PM8/29/12
to django-d...@googlegroups.com
OOPS, turns out I had a post_save receiver creating the reciprocal rel. So this is not a django problem.

On Wednesday, August 29, 2012 1:40:52 PM UTC-7, Hobson Lane wrote:
In django 1.4, my model save() gets called for ManyToManyField "through" models for loaddata on that model:

    class Entity(models.Model):
        related_entities = models.ManyToManyField('self', through='EntityRelationship', symmetrical=False, related_name='related_to+')
   
    class EntityRelationship(models.Model):
        from_entity       = models.ForeignKey(Entity, related_name='from_entities')
        to_entity           = models.ForeignKey(Entity, related_name='to_entities')

        def save(self, *args, **kwargs):
            print >>sys.stderr, EntityRelationship.objects.all()
            super(EntityRelationship, self).save(*args, **kwargs)
            print >>sys.stderr, EntityRelationship.objects.all()
            assert False

        class Meta:
            db_table     = u'EntityRelationship'
            ordering = ["to_entity"]
            unique_together = (('from_entity','to_entity'))

        def __unicode__(self):
            return '%s -> %s'%(self.from_entity.pk,self.to_entity.pk)
          
Interestingly, `save()` is not called for the forward relationship, the one explicity specified in the json fixture. loaddata must do a "raw" insert for this one. But is called as part of validation of the relationship when inserting a record for the reciprocal relationship:

    >>> manage.py loaddata fixtures/EntityRelationship.json
    [<EntityRelationship: 1 -> 2>]
    [<EntityRelationship: 1 -> 2>, <EntityRelationship: 2 -> 1>]

I wouldn't expect loaddata should insert a receiprocal relationship record not specified in the fixture. And doing so without calling save() for both inserts makes it very difficult to deal with fixtures that contain a random network of connected entities. You eventually get an IntegrityError during the loaddata because duplicate records (relationships) have already been entered. And it prevents the user from loading a fixture of asymmetrical (not necessarily a reciprocal relationship for every edge of the graph) relationships. At least I'm having trouble doing so.

Perhaps the '+' in the related name ('related_to+') is causing me grief, or some other subtlety of self-referential relationships in Django that I don't understand.
Reply all
Reply to author
Forward
0 new messages