Proposal: QuerySet.exists() method

9 views
Skip to first unread message

Adrian Holovaty

unread,
Jul 13, 2007, 5:22:07 PM7/13/07
to django-d...@googlegroups.com
I'd like to add a QuerySet.exists() method, which would return True or
False if the given QuerySet contains at least one record. This would
be more efficient than qs.count() or len(qs) because it would perform
the following SQL under the hood:

SELECT 1 FROM [table] WHERE [where] LIMIT 1;

I'm biased, because I have an immediate need for this in a project,
but this seems general and useful enough for inclusion in QuerySet.
Thoughts?

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com

Tom Tobin

unread,
Jul 13, 2007, 5:31:42 PM7/13/07
to django-d...@googlegroups.com
On 7/13/07, Adrian Holovaty <holo...@gmail.com> wrote:
>
> I'd like to add a QuerySet.exists() method, which would return True or
> False if the given QuerySet contains at least one record. This would
> be more efficient than qs.count() or len(qs) because it would perform
> the following SQL under the hood:
>
> SELECT 1 FROM [table] WHERE [where] LIMIT 1;

+0; seems like a reasonable addition. I wonder of "any" might be a
better method name (along the lines of the Python 2.5 built-in
function), but either name would be fine IMHO.

Tom Tobin

unread,
Jul 13, 2007, 5:32:18 PM7/13/07
to django-d...@googlegroups.com
On 7/13/07, Tom Tobin <kor...@korpios.com> wrote:
> +0; seems like a reasonable addition. I wonder of "any" might be a
> better method name (along the lines of the Python 2.5 built-in
> function), but either name would be fine IMHO.

Err, I wonder *if*. ::sigh:: :-)

Frédéric Roland

unread,
Jul 13, 2007, 5:34:27 PM7/13/07
to django-d...@googlegroups.com
Why not QuerySet.isEmpty() ?

Adrian Holovaty a écrit :

Deryck Hodge

unread,
Jul 13, 2007, 5:52:11 PM7/13/07
to django-d...@googlegroups.com
On 7/13/07, Adrian Holovaty <holo...@gmail.com> wrote:
>
> I'd like to add a QuerySet.exists() method, which would return True or
> False if the given QuerySet contains at least one record. This would
> be more efficient than qs.count() or len(qs) because it would perform
> the following SQL under the hood:
>
> SELECT 1 FROM [table] WHERE [where] LIMIT 1;
>
> I'm biased, because I have an immediate need for this in a project,
> but this seems general and useful enough for inclusion in QuerySet.
> Thoughts?
>

Just this week I was thinking it would be nice to have something like
this for what I'm working on -- doing a check against what is
potentially either a large result set or an empty one. +1 from me.

Cheers,
deyck

Ivan Sagalaev

unread,
Jul 13, 2007, 5:55:04 PM7/13/07
to django-d...@googlegroups.com
Adrian Holovaty wrote:
> I'd like to add a QuerySet.exists() method, which would return True or
> False if the given QuerySet contains at least one record. This would
> be more efficient than qs.count() or len(qs) because it would perform
> the following SQL under the hood:
>
> SELECT 1 FROM [table] WHERE [where] LIMIT 1;
>
> I'm biased, because I have an immediate need for this in a project,
> but this seems general and useful enough for inclusion in QuerySet.
> Thoughts?

I often want a queryset to behave like a list in this regard:

if queryset:
# not empty, do something


If you implement it as __nonzero__ this will work more efficient than it
does now calling __len__.

SmileyChris

unread,
Jul 13, 2007, 5:58:43 PM7/13/07
to Django developers

Adrian, I think it's useful enough. But why do you need a .exists() if
we could just use __nonzero__ (like Ivan suggested)?

Adrian Holovaty

unread,
Jul 13, 2007, 6:30:42 PM7/13/07
to django-d...@googlegroups.com
On 7/13/07, SmileyChris <smile...@gmail.com> wrote:
> Adrian, I think it's useful enough. But why do you need a .exists() if
> we could just use __nonzero__ (like Ivan suggested)?

I hadn't thought of that! Yes, we should definitely implement a
QuerySet.__nonzero__() method. However, I'd like there to be an
explicit method (named either exists() or any() or whatever), because
it's easier to say this:

To determine whether a QuerySet has at least one record, call its
exists() method.

...than this:

To determine whether a QuerySet has at least one record, put it in
an "if" statement

It's convenient to be able to access the result of exists() inline,
rather than having to use it in a certain context, such as within an
"if" clause. Sure, if we went with only __nonzero__(), you could call
__nonzero__() directly, but accessing the Python double-underscore
magic methods is slightly ugly. Make sense?

Jacob Kaplan-Moss

unread,
Jul 13, 2007, 6:41:56 PM7/13/07
to django-d...@googlegroups.com
On 7/13/07, Adrian Holovaty <holo...@gmail.com> wrote:
> Yes, we should definitely implement a
> QuerySet.__nonzero__() method. However, I'd like there to be an
> explicit method (named either exists() or any() or whatever), because
[snip]

Agreed -- I'm +0 on .exists() and __nonzero__() being an alias.

Jacob

Brian Harring

unread,
Jul 13, 2007, 7:02:55 PM7/13/07
to django-d...@googlegroups.com
On 7/13/07, Adrian Holovaty <holo...@gmail.com> wrote:
> On 7/13/07, SmileyChris <smile...@gmail.com> wrote:
> > Adrian, I think it's useful enough. But why do you need a .exists() if
> > we could just use __nonzero__ (like Ivan suggested)?
>
> I hadn't thought of that! Yes, we should definitely implement a
> QuerySet.__nonzero__() method. However, I'd like there to be an
> explicit method (named either exists() or any() or whatever), because
> it's easier to say this:
>
> To determine whether a QuerySet has at least one record, call its
> exists() method.
>
> ...than this:
>
> To determine whether a QuerySet has at least one record, put it in
> an "if" statement

tell them to do bool(QuerySetInstance) then; its basically what if is
doing anyways, and any decent python users will recognize the bool as
being redundant.

> It's convenient to be able to access the result of exists() inline,
> rather than having to use it in a certain context, such as within an
> "if" clause. Sure, if we went with only __nonzero__(), you could call
> __nonzero__() directly, but accessing the Python double-underscore
> magic methods is slightly ugly.

And kills kittens (the protocol methods there usually should be
accessible by the paired builtin ;).

Either way, -1 on exists, +1 on nonzero.
~harring

Michael Trier

unread,
Jul 13, 2007, 10:32:26 PM7/13/07
to django-d...@googlegroups.com
+1 on both. I was needing this just the other day.

Michael

James Bennett

unread,
Jul 13, 2007, 11:18:35 PM7/13/07
to django-d...@googlegroups.com
On 7/13/07, Adrian Holovaty <holo...@gmail.com> wrote:
> I'm biased, because I have an immediate need for this in a project,
> but this seems general and useful enough for inclusion in QuerySet.

Implementing the check in __nonzero__ and having people test by doing
'if some_queryset' or 'if not some_queryset' feels more Pythonic,
while implementing it as a specialized 'exists' method feels closer to
SQL. Since the Django ORM already leans more toward the Python side
than the SQL, I'd say do __nonzero__ and advise boolean checks on
QuerySets instead of implementing a method to do the same.


--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

Malcolm Tredinnick

unread,
Jul 13, 2007, 11:31:56 PM7/13/07
to django-d...@googlegroups.com
On Fri, 2007-07-13 at 16:22 -0500, Adrian Holovaty wrote:
> I'd like to add a QuerySet.exists() method, which would return True or
> False if the given QuerySet contains at least one record. This would
> be more efficient than qs.count() or len(qs) because it would perform
> the following SQL under the hood:
>
> SELECT 1 FROM [table] WHERE [where] LIMIT 1;

Due to Oracle inclusion, this has to be

select count(*) from [table] where [...]

and then check that the result is > 0, at least in the Oracle backend
(no "limit" extension in Oracle). The problem being that count(*) is not
an optimised operation in PostgreSQL, however, we haven't yet split up
those cases in a lot of the code (the recent Oracle merge moved a bunch
of similar things to the count(*) case and I keep meaning to look at
whether we can move them all to count(id_col), which is faster).

> I'm biased, because I have an immediate need for this in a project,
> but this seems general and useful enough for inclusion in QuerySet.
> Thoughts?

I'm +0.

__nonzero__ needs a bit more care, though, I think (although we need it
and it's my fault it hasn't been implemented so far; I'm dragging my
feet a bit too much in this area). We can implement __nonzero__ (more
importantly, make bool() make sense) using a single item lookahead cache
internally. So when you call __nonzero__, it pulls back the first item
to check if it exsits and the subsequently iterating over the queryset
will still return that first item before the rest. Net result is that it
doesn't require two SQL queries to do this:

if qs:
# do something with the results of qs

which is pretty easy to walk into in templates.

By the way, for all database backends except SQLite, you can implement
all of this sort of stuff (including exists and __len__) fairly fast
using cursor.rowcount (the number of rows in the result set).
Unfortunately, SQLite always returns -1 for rowcount. I'm building a few
of those optimisations into the QuerySet rewrite as a way to cut down on
queries.

Regards,
Malcolm


--
Experience is something you don't get until just after you need it.
http://www.pointy-stick.com/blog/

Ian Kelly

unread,
Jul 13, 2007, 11:56:27 PM7/13/07
to django-d...@googlegroups.com
> By the way, for all database backends except SQLite, you can implement
> all of this sort of stuff (including exists and __len__) fairly fast
> using cursor.rowcount (the number of rows in the result set).
> Unfortunately, SQLite always returns -1 for rowcount. I'm building a few
> of those optimisations into the QuerySet rewrite as a way to cut down on
> queries.

cursor.rowcount is also problematic in cx_Oracle. IIRC, it returns
the number of rows modified for an INSERT, UPDATE, or DELETE
statement, but for a SELECT statement it returns the number of rows
already fetched rather than the total number of rows.

Ian

Malcolm Tredinnick

unread,
Jul 14, 2007, 12:26:26 AM7/14/07
to django-d...@googlegroups.com

Ah, yes, that sounds familiar.

Regards,
Malcolm

--
Remember that you are unique. Just like everyone else.
http://www.pointy-stick.com/blog/

Leo Soto M.

unread,
Jul 14, 2007, 12:56:04 AM7/14/07
to django-d...@googlegroups.com
On 7/13/07, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:

[...]


> Due to Oracle inclusion, this has to be
>
> select count(*) from [table] where [...]
>
> and then check that the result is > 0, at least in the Oracle backend
> (no "limit" extension in Oracle). The problem being that count(*) is not
> an optimised operation in PostgreSQL, however, we haven't yet split up
> those cases in a lot of the code (the recent Oracle merge moved a bunch
> of similar things to the count(*) case and I keep meaning to look at
> whether we can move them all to count(id_col), which is faster).

Just curious: In which backends is count(id_col) faster than count(*)?.

I'd say that any decent database engine should take both as the same
(as long as a PK exists and no outer joins are performed), but I could
be wrong and would be happy to know why.

--
Leo Soto M.

Gary Wilson

unread,
Jul 14, 2007, 1:03:10 AM7/14/07
to django-d...@googlegroups.com
James Bennett wrote:
> On 7/13/07, Adrian Holovaty <holo...@gmail.com> wrote:
>> I'm biased, because I have an immediate need for this in a project,
>> but this seems general and useful enough for inclusion in QuerySet.
>
> Implementing the check in __nonzero__ and having people test by doing
> 'if some_queryset' or 'if not some_queryset' feels more Pythonic,
> while implementing it as a specialized 'exists' method feels closer to
> SQL. Since the Django ORM already leans more toward the Python side
> than the SQL, I'd say do __nonzero__ and advise boolean checks on
> QuerySets instead of implementing a method to do the same.

I agree, James. I'm +1 for __nonzero__ and -1 for .exists()

Gary

Malcolm Tredinnick

unread,
Jul 14, 2007, 1:05:05 AM7/14/07
to django-d...@googlegroups.com

I don't know yet. I want to look at this. I do know that count(*) is
slower on PostgreSQL (proportionally when compared to other operations)
than on SyBase and Oracle. So research is needed. It may not even need
to be changed if the difference is not really significant (i.e. if it's
fast enough already).

Regards,
Malcolm

--
The hardness of butter is directly proportional to the softness of the
bread.
http://www.pointy-stick.com/blog/

Malcolm Tredinnick

unread,
Jul 14, 2007, 1:08:03 AM7/14/07
to django-d...@googlegroups.com
On Sat, 2007-07-14 at 13:31 +1000, Malcolm Tredinnick wrote:
> On Fri, 2007-07-13 at 16:22 -0500, Adrian Holovaty wrote:
> > I'd like to add a QuerySet.exists() method, which would return True or
> > False if the given QuerySet contains at least one record.
[...]

> I'm +0.
>
> __nonzero__ needs a bit more care, though, I think

I've inadvertently hijacked the thread. Sorry 'bout that. :-(

Implement one or both, since the API isn't really controversial
(exists() is a zero-argument method, except for "self") and we can dick
around with the implementation under the covers later.

Regards,
Malcolm

--
Tolkien is hobbit-forming.
http://www.pointy-stick.com/blog/

David Larlet

unread,
Jul 14, 2007, 4:38:06 AM7/14/07
to django-d...@googlegroups.com
2007/7/14, Gary Wilson <gary....@gmail.com>:

I agree too, if documentation is a problem it can be resolved with a
simple example. So I'm +1 for __nonzero__ and -1 for .exists()

David

Masida

unread,
Jul 14, 2007, 5:28:34 AM7/14/07
to Django developers
Although Oracle doesn't have LIMIT, it does have ROWNUM which
basically acts as a counter.
So I think you should do:

SELECT 1 FROM [table] WHERE [where] AND ROWNUM <= 1;

When there are JOINS you probably have to put the original SELECT in a
subquery of the ROWNUM <= 1 query.
More info about ROWNUM:
http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html

I believe some databases are actually quite slow when doing a
COUNT(*). (Can't remember which RDBMS I experienced this).

Regards,
- Matthias

On Jul 14, 5:31 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

Honza Král

unread,
Jul 14, 2007, 6:04:23 AM7/14/07
to django-d...@googlegroups.com
On 7/14/07, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
>
> On Fri, 2007-07-13 at 16:22 -0500, Adrian Holovaty wrote:
> > I'd like to add a QuerySet.exists() method, which would return True or
> > False if the given QuerySet contains at least one record. This would
> > be more efficient than qs.count() or len(qs) because it would perform
> > the following SQL under the hood:
> >
> > SELECT 1 FROM [table] WHERE [where] LIMIT 1;
>
> Due to Oracle inclusion, this has to be
>
> select count(*) from [table] where [...]

How about
SELECT 1 FROM DUAL WHERE EXISTS ( SELECT 1 FROM [table] WHERE [where] );
for oracle?
it is definitely better then requiring Oracle to count all the rows.

>
> and then check that the result is > 0, at least in the Oracle backend
> (no "limit" extension in Oracle). The problem being that count(*) is not
> an optimised operation in PostgreSQL, however, we haven't yet split up
> those cases in a lot of the code (the recent Oracle merge moved a bunch
> of similar things to the count(*) case and I keep meaning to look at
> whether we can move them all to count(id_col), which is faster).
>


--
Honza Král
E-Mail: Honza...@gmail.com
ICQ#: 107471613
Phone: +420 606 678585

Tai Lee

unread,
Jul 15, 2007, 9:14:40 PM7/15/07
to Django developers
I'm also +1 __nonezero__ -1 .exists(). Using .count() already gives us
the required functionality and makes logical sense, the only question
then is how big of a performance gain do we get by using SELECT 1
LIMIT 1.

SELECT 1 FROM [table] LIMIT 1 also feels kinda hackish to me, and I'm
guessing it's likely not supported in all database engines used by
Django and/or is not the most efficient statement in all supported
engines, in which case .exists() would only become a performance
workaround for some engines and either using .count() or simply
having .exists() as an alias to .count() for those engines where
count(*) or count(pk) is most efficient.

I'd rather petition the PostgreSQL developers to optimise count(*) and
suggest Django users implement their own changes or workaround in the
meantime if they're working with large datasets where count(*) is a
serious performance penalty.

I like Malcom's idea of just checking the first value in a queryset to
save a second SQL statement, but I'm not sure it would be appropriate
in all cases. There may be times when people *just* want to check if
records exist, without actually selecting or iterating through
records. In those cases wouldn't it be more efficient to do a
single .count() than selecting all your records and only returning the
first one?

Tom Tobin

unread,
Jul 15, 2007, 9:34:27 PM7/15/07
to django-d...@googlegroups.com
On 7/15/07, Tai Lee <mrmac...@gmail.com> wrote:
>
> I'd rather petition the PostgreSQL developers to optimise count(*) and
> suggest Django users implement their own changes or workaround in the
> meantime if they're working with large datasets where count(*) is a
> serious performance penalty.

The two options (implement hackish exists() for PostgresSQL/etc., or
petition the database developers to optimize the call) aren't mutually
exclusive; we can strive for purity in the long run while dealing with
the practical reality at hand. There's enough weirdness abstracted
away in the db backends already; that's their point. :-)

Reply all
Reply to author
Forward
0 new messages