SELECT 1 FROM [table] WHERE [where] LIMIT 1;
I'm biased, because I have an immediate need for this in a project,
but this seems general and useful enough for inclusion in QuerySet.
Thoughts?
Adrian
--
Adrian Holovaty
holovaty.com | djangoproject.com
+0; seems like a reasonable addition. I wonder of "any" might be a
better method name (along the lines of the Python 2.5 built-in
function), but either name would be fine IMHO.
Err, I wonder *if*. ::sigh:: :-)
Adrian Holovaty a écrit :
Just this week I was thinking it would be nice to have something like
this for what I'm working on -- doing a check against what is
potentially either a large result set or an empty one. +1 from me.
Cheers,
deyck
I often want a queryset to behave like a list in this regard:
if queryset:
# not empty, do something
If you implement it as __nonzero__ this will work more efficient than it
does now calling __len__.
Adrian, I think it's useful enough. But why do you need a .exists() if
we could just use __nonzero__ (like Ivan suggested)?
I hadn't thought of that! Yes, we should definitely implement a
QuerySet.__nonzero__() method. However, I'd like there to be an
explicit method (named either exists() or any() or whatever), because
it's easier to say this:
To determine whether a QuerySet has at least one record, call its
exists() method.
...than this:
To determine whether a QuerySet has at least one record, put it in
an "if" statement
It's convenient to be able to access the result of exists() inline,
rather than having to use it in a certain context, such as within an
"if" clause. Sure, if we went with only __nonzero__(), you could call
__nonzero__() directly, but accessing the Python double-underscore
magic methods is slightly ugly. Make sense?
Agreed -- I'm +0 on .exists() and __nonzero__() being an alias.
Jacob
tell them to do bool(QuerySetInstance) then; its basically what if is
doing anyways, and any decent python users will recognize the bool as
being redundant.
> It's convenient to be able to access the result of exists() inline,
> rather than having to use it in a certain context, such as within an
> "if" clause. Sure, if we went with only __nonzero__(), you could call
> __nonzero__() directly, but accessing the Python double-underscore
> magic methods is slightly ugly.
And kills kittens (the protocol methods there usually should be
accessible by the paired builtin ;).
Either way, -1 on exists, +1 on nonzero.
~harring
Michael
Implementing the check in __nonzero__ and having people test by doing
'if some_queryset' or 'if not some_queryset' feels more Pythonic,
while implementing it as a specialized 'exists' method feels closer to
SQL. Since the Django ORM already leans more toward the Python side
than the SQL, I'd say do __nonzero__ and advise boolean checks on
QuerySets instead of implementing a method to do the same.
--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."
Due to Oracle inclusion, this has to be
select count(*) from [table] where [...]
and then check that the result is > 0, at least in the Oracle backend
(no "limit" extension in Oracle). The problem being that count(*) is not
an optimised operation in PostgreSQL, however, we haven't yet split up
those cases in a lot of the code (the recent Oracle merge moved a bunch
of similar things to the count(*) case and I keep meaning to look at
whether we can move them all to count(id_col), which is faster).
> I'm biased, because I have an immediate need for this in a project,
> but this seems general and useful enough for inclusion in QuerySet.
> Thoughts?
I'm +0.
__nonzero__ needs a bit more care, though, I think (although we need it
and it's my fault it hasn't been implemented so far; I'm dragging my
feet a bit too much in this area). We can implement __nonzero__ (more
importantly, make bool() make sense) using a single item lookahead cache
internally. So when you call __nonzero__, it pulls back the first item
to check if it exsits and the subsequently iterating over the queryset
will still return that first item before the rest. Net result is that it
doesn't require two SQL queries to do this:
if qs:
# do something with the results of qs
which is pretty easy to walk into in templates.
By the way, for all database backends except SQLite, you can implement
all of this sort of stuff (including exists and __len__) fairly fast
using cursor.rowcount (the number of rows in the result set).
Unfortunately, SQLite always returns -1 for rowcount. I'm building a few
of those optimisations into the QuerySet rewrite as a way to cut down on
queries.
Regards,
Malcolm
--
Experience is something you don't get until just after you need it.
http://www.pointy-stick.com/blog/
cursor.rowcount is also problematic in cx_Oracle. IIRC, it returns
the number of rows modified for an INSERT, UPDATE, or DELETE
statement, but for a SELECT statement it returns the number of rows
already fetched rather than the total number of rows.
Ian
Ah, yes, that sounds familiar.
Regards,
Malcolm
--
Remember that you are unique. Just like everyone else.
http://www.pointy-stick.com/blog/
[...]
> Due to Oracle inclusion, this has to be
>
> select count(*) from [table] where [...]
>
> and then check that the result is > 0, at least in the Oracle backend
> (no "limit" extension in Oracle). The problem being that count(*) is not
> an optimised operation in PostgreSQL, however, we haven't yet split up
> those cases in a lot of the code (the recent Oracle merge moved a bunch
> of similar things to the count(*) case and I keep meaning to look at
> whether we can move them all to count(id_col), which is faster).
Just curious: In which backends is count(id_col) faster than count(*)?.
I'd say that any decent database engine should take both as the same
(as long as a PK exists and no outer joins are performed), but I could
be wrong and would be happy to know why.
--
Leo Soto M.
I agree, James. I'm +1 for __nonzero__ and -1 for .exists()
Gary
I don't know yet. I want to look at this. I do know that count(*) is
slower on PostgreSQL (proportionally when compared to other operations)
than on SyBase and Oracle. So research is needed. It may not even need
to be changed if the difference is not really significant (i.e. if it's
fast enough already).
Regards,
Malcolm
--
The hardness of butter is directly proportional to the softness of the
bread.
http://www.pointy-stick.com/blog/
> I'm +0.
>
> __nonzero__ needs a bit more care, though, I think
I've inadvertently hijacked the thread. Sorry 'bout that. :-(
Implement one or both, since the API isn't really controversial
(exists() is a zero-argument method, except for "self") and we can dick
around with the implementation under the covers later.
Regards,
Malcolm
--
Tolkien is hobbit-forming.
http://www.pointy-stick.com/blog/
I agree too, if documentation is a problem it can be resolved with a
simple example. So I'm +1 for __nonzero__ and -1 for .exists()
David
SELECT 1 FROM [table] WHERE [where] AND ROWNUM <= 1;
When there are JOINS you probably have to put the original SELECT in a
subquery of the ROWNUM <= 1 query.
More info about ROWNUM:
http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html
I believe some databases are actually quite slow when doing a
COUNT(*). (Can't remember which RDBMS I experienced this).
Regards,
- Matthias
On Jul 14, 5:31 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
How about
SELECT 1 FROM DUAL WHERE EXISTS ( SELECT 1 FROM [table] WHERE [where] );
for oracle?
it is definitely better then requiring Oracle to count all the rows.
>
> and then check that the result is > 0, at least in the Oracle backend
> (no "limit" extension in Oracle). The problem being that count(*) is not
> an optimised operation in PostgreSQL, however, we haven't yet split up
> those cases in a lot of the code (the recent Oracle merge moved a bunch
> of similar things to the count(*) case and I keep meaning to look at
> whether we can move them all to count(id_col), which is faster).
>
--
Honza Král
E-Mail: Honza...@gmail.com
ICQ#: 107471613
Phone: +420 606 678585
SELECT 1 FROM [table] LIMIT 1 also feels kinda hackish to me, and I'm
guessing it's likely not supported in all database engines used by
Django and/or is not the most efficient statement in all supported
engines, in which case .exists() would only become a performance
workaround for some engines and either using .count() or simply
having .exists() as an alias to .count() for those engines where
count(*) or count(pk) is most efficient.
I'd rather petition the PostgreSQL developers to optimise count(*) and
suggest Django users implement their own changes or workaround in the
meantime if they're working with large datasets where count(*) is a
serious performance penalty.
I like Malcom's idea of just checking the first value in a queryset to
save a second SQL statement, but I'm not sure it would be appropriate
in all cases. There may be times when people *just* want to check if
records exist, without actually selecting or iterating through
records. In those cases wouldn't it be more efficient to do a
single .count() than selecting all your records and only returning the
first one?
The two options (implement hackish exists() for PostgresSQL/etc., or
petition the database developers to optimize the call) aren't mutually
exclusive; we can strive for purity in the long run while dealing with
the practical reality at hand. There's enough weirdness abstracted
away in the db backends already; that's their point. :-)