Shortcut for session.query(somequery.exists()).scalar()

1,315 views
Skip to first unread message

Adrian

unread,
Dec 1, 2016, 9:13:07 AM12/1/16
to sqlalchemy
Is there any shorter/prettier way for this?

    session.query(session.query(Foo).exists()).scalar()

It's not hard to add a custom method to the base query class that returns

    self.session.query(self.exists()).scalar()

but it feels like something that should be part of SQLAlchemy.

Also, is there any case where `.enable_eagerloads(False)` right before `.exists()` could cause problems / different results?
If not, wouldn't it make sense for exists() to do that automatically? Loading a relationship doesn't ever seem to make sense
when checking exists().

mike bayer

unread,
Dec 1, 2016, 9:27:27 AM12/1/16
to sqlal...@googlegroups.com


On 12/01/2016 09:13 AM, Adrian wrote:
> Is there any shorter/prettier way for this?
>
> session.query(session.query(Foo).exists()).scalar()


I would normally do


session.query(Foo).count()

I've never embedded an EXISTS like that.


>
> It's not hard to add a custom method to the base query class that returns
>
> self.session.query(self.exists()).scalar()

how does this interact with filtering etc? seems like this method
would work just like count().


>
> but it feels like something that should be part of SQLAlchemy.
>
> Also, is there any case where `.enable_eagerloads(False)` right before
> `.exists()` could cause problems / different results?

this query is against column expressions, not entities, so eager loading
is not involved.

> If not, wouldn't it make sense for exists() to do that automatically?
> Loading a relationship doesn't ever seem to make sense
> when checking exists().

there's no entity loaded so relationships aren't involved.


>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Adrian

unread,
Dec 1, 2016, 9:37:55 AM12/1/16
to sqlalchemy
> I would normally do session.query(Foo).count()

COUNT is somewhat expensive compared to just checking whether rows exist, especially if lots of rows match (2.2M rows in the example):

In [2]: %timeit -n1 -r1 EventPerson.query.count()
1 loop, best of 1: 135 ms per loop
In [3]: %timeit -n1 -r1 db.session.query(EventPerson.query.exists()).scalar()
1 loop, best of 1: 2.44 ms per loop



> how does this interact with filtering etc?
http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.exists :P

It keeps all the filters present on the query object (in that example, `self` is the existing query object)


> this query is against column expressions, not entities, so eager loading is not involved.
That's what I first thought, but the related OUTER JOINs are still added. Maybe a bug?
https://gist.github.com/ThiefMaster/00f812a5be0ce321c91241de297dbdd0

mike bayer

unread,
Dec 1, 2016, 9:44:29 AM12/1/16
to sqlal...@googlegroups.com


On 12/01/2016 09:37 AM, Adrian wrote:
>> I would normally do session.query(Foo).count()
>
> COUNT is somewhat expensive compared to just checking whether rows
> exist, especially if lots of rows match (2.2M rows in the example):

I would have assumed you had some filtering condition.

Checking an EXISTS for an entire table at once, that may either have
zero or 2.2M rows is a little unusual :) I'd do
session.query(Foo.id).first() in that case.

>
> In [2]: %timeit -n1 -r1 EventPerson.query.count()
> 1 loop, best of 1: 135 ms per loop
> In [3]: %timeit -n1 -r1
> db.session.query(EventPerson.query.exists()).scalar()
> 1 loop, best of 1: 2.44 ms per loop
>
>
>> how does this interact with filtering etc?
> http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.exists
> :P
>
> It keeps all the filters present on the query object (in that example,
> `self` is the existing query object)

well exists() returns a core SQL element whereas count() is special in
that it invokes a query. You're possibly asking for some
session.query.row_exists() type of thing.


>
>> this query is against column expressions, not entities, so eager
> loading is not involved.
> That's what I first thought, but the related OUTER JOINs are still
> added. Maybe a bug?
> https://gist.github.com/ThiefMaster/00f812a5be0ce321c91241de297dbdd0

it's not ideal it adds those in, there's the edge case that someone is
joined eager loading with innerjoin=True and is expecting that to
participate in the row being present, that's not the correct use of that
but it could break things for someone.

Adrian

unread,
Dec 1, 2016, 9:47:25 AM12/1/16
to sqlalchemy
Would you be interested in a PR adding `Query.row_exists()` or even `Query.row_exists(disable_eagerloads=True)` which would also disable eagerloads by default?

mike bayer

unread,
Dec 1, 2016, 9:59:12 AM12/1/16
to sqlal...@googlegroups.com
if it works like count(), then it would be making use of from_self()
which disables eager loads already.

Adrian

unread,
Dec 1, 2016, 11:49:07 AM12/1/16
to sqlalchemy
`from_self().exists()` seems to produce an unnecessarily complex query (still containing all the original columns)

mike bayer

unread,
Dec 1, 2016, 12:10:39 PM12/1/16
to sqlal...@googlegroups.com


On 12/01/2016 11:49 AM, Adrian wrote:
> `from_self().exists()` seems to produce an unnecessarily complex query
> (still containing all the original columns)

it is unnecessarily complex in some situations but in others it is not.
count() for many years tried to guess which situations it could do a
straight SELECT COUNT and in which it needed to wrap inside of a
subquery, which itself was an error-prone process. New edge cases
continued to be reported. Once I moved it to a simple from_self(), it
returns the right result in all cases, the complexity of the source code
went way down, and I got my time back from constantly having to fight
new edge cases with it.

It's a reasonable tradeoff that count() returns the right result in all
cases with simple source code, and that a more optimized query if
necessary is available through manual use of func.count(), which is also
documented.

Here's the ticket where the logic for count() was made to just use
from_self() in all cases:

https://bitbucket.org/zzzeek/sqlalchemy/issues/2093


Here is the diff where 60 lines of buggy, complicated code were replaced
with two that have never failed since:

https://bitbucket.org/zzzeek/sqlalchemy/commits/e7be5a5729cabc6133b4fc83c06f6c5277a7af19#Llib/sqlalchemy/orm/query.pyF2057

If I had a crew of people to help triage bug reports, review code, and
write very high quality tests, areas like these may have more
out-of-the-box optimizations. Short of having that, having code that is
straightforward and returns the right result without issue is the next
best thing.

mike bayer

unread,
Dec 1, 2016, 12:17:30 PM12/1/16
to sqlal...@googlegroups.com


On 12/01/2016 11:49 AM, Adrian wrote:
> `from_self().exists()` seems to produce an unnecessarily complex query
> (still containing all the original columns)


also please see the source for count() re: the original columns:

col = sql.func.count(sql.literal_column('*'))
return self.from_self(col).scalar()

replaced with '*'. a built-in exists() would do something similar.
Reply all
Reply to author
Forward
0 new messages