Expensive queryset cloning

65 views
Skip to first unread message

Alexander Schepanovski

unread,
Mar 14, 2011, 8:49:24 PM3/14/11
to Django developers
I was optimizing my django app and ran into this. My app was spending
too much time cloning querysets. I looked into code but didn't find
any simple way to make it faster. But this is not needed actually. In
most use cases "a parent" of a clone is thrown out. So usually one
just need to mutate queryset not clone it.

For example:
Item.objects.filter(category=12).exclude(visible=False).order_by('-
date')[:20]
clones 4 times and 0 is needed. Using Q objects we can reduce it to 3
(still too much) and complicate the code.

Even
Item.objects.get(pk=2)
clones 2 times.

Personally, I would like all querysets mutate not clone by default.
And when one need a clone just make it explicitly.

Carl Meyer

unread,
Mar 15, 2011, 12:14:11 AM3/15/11
to django-d...@googlegroups.com
Hi Alex,

On 03/14/2011 08:49 PM, Alexander Schepanovski wrote:
> Personally, I would like all querysets mutate not clone by default.
> And when one need a clone just make it explicitly.

This is not an option. It will break quite a lot of existing code, and
often in highly confusing ways. You'll need to find other ways to
optimize. I'd be surprised if the cloning of querysets is actually a
significant bottleneck relative to the database query itself, unless
you're doing loops with hundreds or thousands of clones, in which case
it can almost certainly be optimized via Q objects.

If you really think it will make a significant difference for your app,
you can probably achieve this for yourself by using your own subclass of
QuerySet and overriding the _clone method.

Carl

Alexander Schepanovski

unread,
Mar 15, 2011, 3:16:57 AM3/15/11
to Django developers
> I'd be surprised if the cloning of querysets is actually a
> significant bottleneck relative to the database query itself

I was too.
Query via ORM is 2-4 times slower for me than just database query +
network roundtrip. It is not only due queryset cloning, but cloning
takes 0.5-1 of that 2-4 times.
Also, I cache db queries issued via ORM, so for cache hit I need to
construct queryset but don't need to execute query.

> unless you're doing loops with hundreds or thousands of clones, in which case
> it can almost certainly be optimized via Q objects.
not loops but many ifs. I already use dicts and Q objects (when need
complex conditions) but I still have no less than 3 (filter + order_by
+ slice) clones for each query. Here we see an interesting thing -
cloning querysets made me write less beautiful code.

> If you really think it will make a significant difference for your app,
> you can probably achieve this for yourself by using your own subclass of
> QuerySet and overriding the _clone method.
Already do this, but it does not cover User model and such. I probably
go with some monkey patch for now.

> This is not an option. It will break quite a lot of existing code, and
> often in highly confusing ways.

Not so much code. In most cases cloning is not needed. But it will be
confusing breakage, I agree. On other hand for an unbiased person
cloning can be confusing too. I wonder how many people were surprised
why
qs.filter(...)
does not filter qs.

Ok, if not by default, still we can provide some .mutating() method
or .cloning(bool) method which switches queryset into/off mutating
mode. Then we can use it in some django code such as Manager.get to
avoid useless cloning.

David Cramer

unread,
Mar 16, 2011, 12:00:15 AM3/16/11
to Django developers
In our profiling we've also noticed the cloning to be one of the
slowest parts of the app (that and instantiation of model objects).

We haven't yet, but had been planning on exploring a way to mutate the
existing class in most circumstances, but haven't
dug into it too much yet.

On Mar 14, 11:16 pm, Alexander Schepanovski <suor....@gmail.com>
wrote:

Alexander Schepanovski

unread,
Mar 16, 2011, 3:31:45 AM3/16/11
to Django developers
> We haven't yet, but had been planning on exploring a way to mutate the
> existing class in most circumstances, but haven't
> dug into it too much yet.

I have and I use this monkey patch for now: https://gist.github.com/872145

Thomas Guettler

unread,
Mar 16, 2011, 4:14:24 AM3/16/11
to django-d...@googlegroups.com
Hi Alexander,

I have seen this in my app, too. It still runs fast enough. But
I guess the django code could be optimized.

Thomas

--
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de

akaariai

unread,
Mar 16, 2011, 10:28:54 AM3/16/11
to Django developers


On Mar 16, 10:14 am, Thomas Guettler <h...@tbz-pariv.de> wrote:
> Hi Alexander,
>
> I have seen this in my app, too. It still runs fast enough. But
> I guess the django code could be optimized.
>

I had a patch for this problem somewhere, but can't find it now.
Basically it added inplace() method to queryset, and after that no
cloning of the inner query class would happen. The outer QuerySet
would still be cloned, but that is relatively cheap. This was to
prevent usage of the old reference to the QuerySet accidentally.* This
was done by keeping a "usage" count both in the inner query instance
and in the outer QuerySet instance.

- Anssi

(*)
qs.filter(pk=1)
qs.filter(foo=bar) would be an error
but
qs = qs.filter(pk=1)
qs.filter(foo=bar) would be ok.

Alexander Schepanovski

unread,
Mar 16, 2011, 9:11:59 PM3/16/11
to Django developers
> I had a patch for this problem somewhere, but can't find it now.
> Basically it added inplace() method to queryset, and after that no
> cloning of the inner query class would happen. The outer QuerySet
> would still be cloned, but that is relatively cheap. This was to
> prevent usage of the old reference to the QuerySet accidentally.* This
> was done by keeping a "usage" count both in the inner query instance
> and in the outer QuerySet instance.

Can you find that patch and post somewhere?
If not still thanks for this idea.

akaariai

unread,
Mar 17, 2011, 2:43:19 AM3/17/11
to Django developers
On Mar 17, 3:11 am, Alexander Schepanovski <suor....@gmail.com> wrote:
> Can you find that patch and post somewhere?
> If not still thanks for this idea.

Unfortunately, no. Gone with my old laptop.

- Anssi
Reply all
Reply to author
Forward
0 new messages