ORM performance

70 views
Skip to first unread message

msoulier

unread,
Sep 3, 2014, 11:22:56 PM9/3/14
to django...@googlegroups.com
Hi,

I am looking at Django's performance with respect to modifying large numbers of objects, each in a unique way that cannot be batched. If I make a simple change to one of my Django models and save(), and then do the same thing in sqlalchemy, I notice a performance difference of about 48 times as far as the rate that the objects are processed to my postgresql db.

The code is a simple property update and save, in a loop, trying to process as many objects as possible.

Is the Django ORM known to be slower in this regard, or is it likely something that I'm doing?

Thanks,
Mike

Tom Lockhart

unread,
Sep 3, 2014, 11:44:27 PM9/3/14
to django...@googlegroups.com
I haven't had to deal with this myself, but the speed difference smacks of transactional issues. If you can run your loop by wrapping all of it or pieces of it (say, 100 or 1000 chunks) in a single transaction you'll probably see some significant speedup.

https://docs.djangoproject.com/en/dev/topics/db/transactions/

hth

- Tom

Benjamin Scherrey

unread,
Sep 4, 2014, 12:21:55 AM9/4/14
to django-users
The short answer to your question is no, the Django ORM is not inherently slower in that regard and it's very likely something you're doing. The useful answer is probably more complicated. :-) Naive usage of the ORM without an understanding of how it translates to SQL is likely to result in some really awful non-performant database requests for any reasonably complex models/queries. The good news is that it isn't very hard to get quite good performance out of it. 

Impossible to give you any more specific to your particular problem without seeing code, of course. That said, some common issues when grabbing individual model instances related to a larger query are often dramatically improved by using select_related() or fetch_related() as appropriate. Also for doing large amounts of writes/updates you should look into doing the transaction management yourself as Tom suggests. Postgres can really fly once you understand how the ORM works with it. MySQL does nicely as well but I have less experience with it. Ultimately, if your request can't be efficiently modeled with the ORM (rare but does happen) then you can use .extra() to pass in some direct SQL quite easily.

Otherwise here's a decent little writeup of a good approach to providing better access to your ORM models: http://www.dabapps.com/blog/higher-level-query-api-django-orm/

Good luck,

  -- Ben


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/9049bff2-470e-4560-b93f-dee56bc924d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Chief Systems Architect Proteus Technologies
Personal blog where I am not your demographic.

This email intended solely for those who have received it. If you have received this email by accident - well lucky you!!

Sithembewena Lloyd Dube

unread,
Sep 4, 2014, 5:36:33 AM9/4/14
to django...@googlegroups.com
Thanks to the OP for asking this, and to all who answered. Ben, special stars to you - you just shared some very valuable insight into efficieint use of the ORM (wasn't obvious to me).

Kind regards,
Lloyd



For more options, visit https://groups.google.com/d/optout.



--
Regards,
Sithu Lloyd Dube

Tom Evans

unread,
Sep 4, 2014, 1:32:46 PM9/4/14
to django...@googlegroups.com
On Thu, Sep 4, 2014 at 12:22 AM, msoulier <msou...@digitaltorque.ca> wrote:
> Hi,
>
> I am looking at Django's performance with respect to modifying large numbers
> of objects, each in a unique way that cannot be batched. If I make a simple
> change to one of my Django models and save(), and then do the same thing in
> sqlalchemy, I notice a performance difference of about 48 times as far as
> the rate that the objects are processed to my postgresql db.
>
> The code is a simple property update and save, in a loop, trying to process
> as many objects as possible.

Is the update invariant? By using the ORM like this:

for obj in MyObject.objects.all():
obj.foo = 'hello'
obj.save()

then you have to pull all the data for each object out of the
database, convert the raw DB column to it's python type, assign it to
a model instance, convert each python field back to its raw DB value
and save it in the database. That is N+1 queries, and many
conversions.

If the update is invariant, you can apply it without any of the
overhead, only 1 query and one conversion:

MyObject.objects.all().update(foo='hello')

If the update doesn't depend on the other fields in the model, this
avoids some of the overhead, still N+1 queries but virtually no
conversion overhead:

for pk in MyObject.objects.all().values_list('pk', flat=True):
MyObject.objects.filter(pk=pk).update(foo=make_new_foo())

>
> Is the Django ORM known to be slower in this regard, or is it likely
> something that I'm doing?

Are both Django and the sqlalchemy doing the same sort of update?

Cheers

Tom

Michael P. Soulier

unread,
Sep 4, 2014, 2:06:53 PM9/4/14
to django...@googlegroups.com
On 04/09/14 Tom Evans said:

> Is the update invariant? By using the ORM like this:

As I said, each update is unique and they cannot be batched.

> Are both Django and the sqlalchemy doing the same sort of update?

Yes. Identical.

Mike

Michael P. Soulier

unread,
Sep 4, 2014, 2:07:24 PM9/4/14
to django...@googlegroups.com
On 03/09/14 Tom Lockhart said:

> I haven't had to deal with this myself, but the speed difference smacks of
> transactional issues. If you can run your loop by wrapping all of it or
> pieces of it (say, 100 or 1000 chunks) in a single transaction you'll
> probably see some significant speedup.

Yeah I tried that, and noticed no difference.

Mike

Michael P. Soulier

unread,
Sep 4, 2014, 2:10:31 PM9/4/14
to django...@googlegroups.com
On 04/09/14 Benjamin Scherrey said:

> The short answer to your question is no, the Django ORM is not inherently
> slower in that regard and it's very likely something you're doing. The

Given that it's basically

for obj in foo.objects.all():
obj.prop = new_value
obj.save()

I fail to see how it's something that I am doing. The whole thing runs in a
single transaction, and the only other addition would be Django's signals. I
believe I had none registered but I'll double check.

I should note that while sqlalchemy was 48 times faster, raw sql was roughly
100 times faster.

> Impossible to give you any more specific to your particular problem without
> seeing code, of course. That said, some common issues when grabbing
> individual model instances related to a larger query are often dramatically
> improved by using select_related() or fetch_related() as appropriate. Also

There is a foreign key involved in this model, not that it is being accessed,
but I can update the query and retry.

> Otherwise here's a decent little writeup of a good approach to providing
> better access to your ORM models:
> http://www.dabapps.com/blog/higher-level-query-api-django-orm/

I'll take a look, thanks.

Mike
Reply all
Reply to author
Forward
0 new messages