QuerySet.exists() - Possible bug?

Jerome Leclanche

unread,

Feb 10, 2010, 2:11:50 AM2/10/10

to django-d...@googlegroups.com

Hi list

Using David Cramer's really nice replacement devserver
(http://github.com/dcramer/django-devserver), I noticed a bunch of
huge queries in some table generation code. Example:
http://dpaste.com/156907/
This is from one of my smallest tables, bigger ones with 120+ish rows
can take multiple ms to query.

I narrowed it down to a SomeModel.some_reverse_foreignkey.exists().
Using something as simple as .only("id").exists() will make the query
several dozen times faster.

Is this a bug, or am I missing something in exists() behaviour? From
what I understood, it's supposed to be faster than count() in
situations where I just need to know whether there are rows present.

Using psycopg2 with postgresql 8.4.

J. Leclanche / Adys

Russell Keith-Magee

unread,

Feb 10, 2010, 2:22:14 AM2/10/10

to django-d...@googlegroups.com

It certainly doesn't look right; .exists() should be faster than
.count(), and having all the extra columns in the query looks a little
weird.

However, to say for sure, we need a few more details:

* Does this only happen when using David's devserver, or does it
happen in a normal Django shell?

* Can you give a *specific* example of a model and query that is
causing this? Or, if it's happening in the process of running a
management command or other Django-internal activity, what do you need
to do to stimulate the problem?

* Are you doing anything interesting with select_related() or
anything like that?

* What version of Django (including SVN revision, if appropriate) are
you using?

Yours,
Russ Magee %-)

Jerome Leclanche

unread,

Feb 10, 2010, 2:29:33 AM2/10/10

to django-d...@googlegroups.com

Django version 1.2 alpha 1 SVN-12267

I'm not using select_related at all anywhere at the moment, I just
started database optimization. David's server is just a wrapper
django-toolbar-like and doesn't perform anything smart on the sql or
the like.

I haven't yet tested on a vanilla setup, but I suppose something like
this should do the trick:

Class MyModel(Model):
field_1 = IntegerField()
field_2 = IntegerField()
field_3 = IntegerField()
field_4 = IntegerField()
field_5 = IntegerField()

class OtherModel(Model):
my_m2m = ForeignKey(MyModel, related_name="my_relation")

>>> MyModel.objects.all()[:1][0].my_relation.exists()

J. Leclanche / Adys

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>

Karen Tracey

unread,

Feb 10, 2010, 3:05:26 AM2/10/10

to django-d...@googlegroups.com

On Wed, Feb 10, 2010 at 2:22 AM, Russell Keith-Magee <freakb...@gmail.com> wrote:

It certainly doesn't look right; .exists() should be faster than
.count(), and having all the extra columns in the query looks a little
weird.

However, to say for sure, we need a few more details:

* Does this only happen when using David's devserver, or does it
happen in a normal Django shell?

I can recreate with just simple shell queries and checking the sql in django.db.connection

* Can you give a *specific* example of a model and query that is
causing this? Or, if it's happening in the process of running a
management command or other Django-internal activity, what do you need
to do to stimulate the problem?

Every exists() query I tried shows this behavior. For example:

>>> from django.contrib.auth.models import User
>>> User.objects.filter(username='kmt').exists()
True
>>> from django.db import connection
>>> connection.queries[-1]
{'time': '0.001', 'sql': u'SELECT (1) AS `a`, `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`password`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`is_superuser`, `auth_user`.`last_login`, `auth_user`.`date_joined` FROM `auth_user` WHERE `auth_user`.`username` = kmt '}
>>>

* Are you doing anything interesting with select_related() or
anything like that?

No.

* What version of Django (including SVN revision, if appropriate) are
you using?

I tried current trunk and backed off to the changeset where the function was added -- query was the same. Looking at the idiom it replaced, the extra select of the constant 1 value as 'a' was followed by a values('a') that turned the result into a ValuesQuerySet. I don't see where this is being done with the exists() implementation. As a result it seems to be a regular QuerySet with a query with default_cols=True, resulting in all the default columns getting pulled into the query.

Karen

Jeremy Dunck

unread,

Feb 10, 2010, 8:35:34 AM2/10/10

to django-d...@googlegroups.com

On Wed, Feb 10, 2010 at 2:05 AM, Karen Tracey <kmtr...@gmail.com> wrote:
...

>> * What version of Django (including SVN revision, if appropriate) are
>> you using?
>
> I tried current trunk and backed off to the changeset where the function was
> added -- query was the same. Looking at the idiom it replaced, the extra
> select of the constant 1 value as 'a' was followed by a values('a') that
> turned the result into a ValuesQuerySet. I don't see where this is being
> done with the exists() implementation. As a result it seems to be a regular
> QuerySet with a query with default_cols=True, resulting in all the default
> columns getting pulled into the query.

It seems odd to me that adding columns to the result of a returned row
would be significantly slower than the PK. It also seems to me that
using the default ordering might be unnecessary overhead, too.

But I wonder if different DBs return different things based on the
columns in the select clause? Perhaps Malcolm had a reason for making
the change. :-/

David Cramer

unread,

Feb 10, 2010, 10:45:35 PM2/10/10

to Django developers

MySQL, in this situation, would have to actually select a row to
return a result, so it's slower. If it was just select 1 as a from
table where indexed_value = N, it doesn't even hit the tables, just
the indexes.

It's definitely not more efficient, and probably just an oversight
somewhere.

On Feb 10, 7:35 am, Jeremy Dunck <jdu...@gmail.com> wrote:

Reply all

Reply to author

Forward