q = session.query(Snapshot).filter(Snapshot.totalqty==func.max(Snapshot.totalqty))
When I try and get the results of the query, I get the error below.
How should I use func.max here?
__version__ = 0.5.0rc3
In [54]: len(q.all())
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in ?
File "/home/titan/johnh/dev/lib/python2.4/site-packages/SQLAlchemy-0.5.0rc3dev_r5205-py2.4.egg/sqlalchemy/orm/query.py",
line 994, in all
return list(self)
File "/home/titan/johnh/dev/lib/python2.4/site-packages/SQLAlchemy-0.5.0rc3dev_r5205-py2.4.egg/sqlalchemy/orm/query.py",
line 1082, in __iter__
return self._execute_and_instances(context)
File "/home/titan/johnh/dev/lib/python2.4/site-packages/SQLAlchemy-0.5.0rc3dev_r5205-py2.4.egg/sqlalchemy/orm/query.py",
line 1085, in _execute_and_instances
result = self.session.execute(querycontext.statement,
params=self._params, mapper=self._mapper_zero_or_none(),
_state=self._refresh_state)
File "/home/titan/johnh/dev/lib/python2.4/site-packages/SQLAlchemy-0.5.0rc3dev_r5205-py2.4.egg/sqlalchemy/orm/session.py",
line 749, in execute
return self.__connection(engine, close_with_result=True).execute(
File "/home/titan/johnh/dev/lib/python2.4/site-packages/SQLAlchemy-0.5.0rc3dev_r5205-py2.4.egg/sqlalchemy/engine/base.py",
line 826, in execute
return Connection.executors[c](self, object, multiparams, params)
File "/home/titan/johnh/dev/lib/python2.4/site-packages/SQLAlchemy-0.5.0rc3dev_r5205-py2.4.egg/sqlalchemy/engine/base.py",
line 877, in execute_clauseelement
return self._execute_compiled(elem.compile(dialect=self.dialect,
column_keys=keys, inline=len(params) > 1), distilled_params=params)
File "/home/titan/johnh/dev/lib/python2.4/site-packages/SQLAlchemy-0.5.0rc3dev_r5205-py2.4.egg/sqlalchemy/engine/base.py",
line 889, in _execute_compiled
self.__execute_raw(context)
File "/home/titan/johnh/dev/lib/python2.4/site-packages/SQLAlchemy-0.5.0rc3dev_r5205-py2.4.egg/sqlalchemy/engine/base.py",
line 898, in __execute_raw
self._cursor_execute(context.cursor, context.statement,
context.parameters[0], context=context)
File "/home/titan/johnh/dev/lib/python2.4/site-packages/SQLAlchemy-0.5.0rc3dev_r5205-py2.4.egg/sqlalchemy/engine/base.py",
line 942, in _cursor_execute
self._handle_dbapi_exception(e, statement, parameters, cursor)
File "/home/titan/johnh/dev/lib/python2.4/site-packages/SQLAlchemy-0.5.0rc3dev_r5205-py2.4.egg/sqlalchemy/engine/base.py",
line 924, in _handle_dbapi_exception
raise exc.DBAPIError.instance(statement, parameters, e,
connection_invalidated=is_disconnect)
ProgrammingError: (ProgrammingError) (1111, 'Invalid use of group
function') u'SELECT snapshot.symbol AS snapshot_symbol,
snapshot.strategy AS snapshot_strategy, snapshot.longshort AS
snapshot_longshort, snapshot.datetime AS snapshot_datetime,
snapshot.date AS snapshot_date, snapshot.year AS snapshot_year,
snapshot.month AS snapshot_month, snapshot.qty AS snapshot_qty,
snapshot.totalqty AS snapshot_totalqty, snapshot.price AS
snapshot_price, snapshot.possize AS snapshot_possize, snapshot.pnl AS
snapshot_pnl, snapshot.realized AS snapshot_realized,
snapshot.pnl_hedged AS snapshot_pnl_hedged, snapshot.is_stop AS
snapshot_is_stop, snapshot.stop_flag AS snapshot_stop_flag \nFROM
snapshot \nWHERE snapshot.totalqty = max(snapshot.totalqty)' []
http://weblogs.sqlteam.com/jeffs/jeffs/archive/2007/07/20/60261.aspx
in this case you probably want
query.filter(Snapshot.totalqty==func.max(Snapshot.totalqty).select()).
Indeed, that does work:
session.query(Snapshot.symbol).filter(Snapshot.datetime==func.max(Snapshot.datetime).select()).all()
and I read the tutorial you pointed me an certainly get the idea that
everything in the select list must be contained in the group by clause
or used in an aggregate function, but I am still unclear on what the
role of the "select" method is on the filter clause above, and how it
pertains to the tutorial message.
I'm now having a related problem that perhaps will shed light on my
confusions...
Here is a query that lists the sum(pnl) for each symbol and strategy
in my snapshots table
session.query(Snapshot.strategy, Snapshot.symbol,
func.sum(Snapshot.pnl)).group_by(Snapshot.strategy,
Snapshot.symbol).all()
That works fine. But what if I only want to list the rows where the
sum(pnl)<-15000 ? I tried a few things:
session.query(Snapshot.strategy, Snapshot.symbol,
func.sum(Snapshot.pnl)).group_by(Snapshot.strategy,
Snapshot.symbol).filter(func.sum(Snapshot.pnl)<-15000).all()
but got the error below
raise exc.DBAPIError.instance(statement, parameters, e,
connection_invalidated=is_disconnect)
ProgrammingError: (ProgrammingError) (1111, 'Invalid use of group
function') u'SELECT snapshot.strategy AS snapshot_strategy,
snapshot.symbol AS snapshot_symbol, sum(snapshot.pnl) AS sum_1 \nFROM
snapshot \nWHERE sum(snapshot.pnl) < %s GROUP BY snapshot.strategy,
snapshot.symbol' [-15000]
So I thought I'd try your "select" magic method, which as noted above
I don't really understand how to use. But I get a new error when I
call the select method on the sum func, eg
In [47]: session.query(Snapshot.strategy, Snapshot.symbol,
func.sum(Snapshot.pnl)).group_by(Snapshot.strategy,
Snapshot.symbol).filter(-15000>func.sum(Snapshot.pnl).select()).all()
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in ?
File "<string>", line 1, in <lambda>
File "/home/titan/johnh/dev/lib/python2.4/site-packages/SQLAlchemy-0.5.0rc3dev_r5205-py2.4.egg/sqlalchemy/orm/query.py",
line 52, in generate
fn(self, *args[1:], **kw)
File "/home/titan/johnh/dev/lib/python2.4/site-packages/SQLAlchemy-0.5.0rc3dev_r5205-py2.4.egg/sqlalchemy/orm/query.py",
line 624, in filter
raise sa_exc.ArgumentError("filter() argument must be of type
sqlalchemy.sql.ClauseElement or string")
ArgumentError: filter() argument must be of type
sqlalchemy.sql.ClauseElement or string
Sorry to be flailing around a bit here -- hopefully you can shed some
light for me!
JDH
[SNIP]
> Here is a query that lists the sum(pnl) for each symbol and strategy
> in my snapshots table
>
> session.query(Snapshot.strategy, Snapshot.symbol,
> func.sum(Snapshot.pnl)).group_by(Snapshot.strategy,
> Snapshot.symbol).all()
>
> That works fine. But what if I only want to list the rows where the
> sum(pnl)<-15000 ? I tried a few things:
>
> session.query(Snapshot.strategy, Snapshot.symbol,
> func.sum(Snapshot.pnl)).group_by(Snapshot.strategy,
> Snapshot.symbol).filter(func.sum(Snapshot.pnl)<-15000).all()
>
> but got the error below
>
> raise exc.DBAPIError.instance(statement, parameters, e,
> connection_invalidated=is_disconnect)
> ProgrammingError: (ProgrammingError) (1111, 'Invalid use of group
> function') u'SELECT snapshot.strategy AS snapshot_strategy,
> snapshot.symbol AS snapshot_symbol, sum(snapshot.pnl) AS sum_1 \nFROM
> snapshot \nWHERE sum(snapshot.pnl) < %s GROUP BY snapshot.strategy,
> snapshot.symbol' [-15000]
>
I'm no SQL expert, so please take this with a pinch of salt, but as far
as I know, conditions in the 'WHERE' clause of an SQL statement are
applied BEFORE any grouping, so you can't use grouping functions (such
as SUM, MAX etc) in the WHERE clause. Instead, you add a 'HAVING'
statement on the end to filter the rows AFTER the grouping.
ie. The SQL you want is something like:
SELECT snapshot.strategy AS snapshot_strategy,
snapshot.symbol AS snapshot_symbol,
sum(snapshot.pnl) AS sum_1
FROM snapshot
GROUP BY snapshot.strategy, snapshot.symbol
HAVING sum(snapshot.pnl) < 15000
In SA, I think you might be able to write that as (untested):
session.query(Snapshot.strategy,
Snapshot.symbol,
func.sum(Snapshot.pnl))
.group_by(Snapshot.strategy, Snapshot.symbol)
.having(func.sum(Snapshot.pnl)<-15000).all()
BTW, I think the 'no grouping functions in WHERE clause' rule is also
the reason why your MAX query didn't work. The fix that Mike gave you
turned that part of your query into a subquery that only produced that
single value. This statement:
func.max(Snapshot.datetime).select()
...gets turned into something like 'SELECT max(datetime) FROM snapshot'.
This then gets embedded as a subquery into your larger query. It's
probably worth printing the SQL produced by each of the queries so that
you can see the difference.
Hope that helps,
Simon
> I'm no SQL expert, so please take this with a pinch of salt, but as far
> as I know, conditions in the 'WHERE' clause of an SQL statement are
> applied BEFORE any grouping, so you can't use grouping functions (such
> as SUM, MAX etc) in the WHERE clause. Instead, you add a 'HAVING'
> statement on the end to filter the rows AFTER the grouping.
Ahh, that helps a lot.
> BTW, I think the 'no grouping functions in WHERE clause' rule is also
> the reason why your MAX query didn't work. The fix that Mike gave you
> turned that part of your query into a subquery that only produced that
> single value. This statement:
I see. That is why the select worked in my first test case but not
the second. In the max case, there was only a single value to return.
In the sum case, there was a sum grouped by (strategy, symbol).
Indeed, the having clause does what I want::
In [39]: q = session.query(Snapshot.strategy, Snapshot.symbol,
sum_pnl).group_by(Snapshot.strategy, Snapshot.symbol).having(sum_pnl<-15000)
In [40]: print q SELECT snapshot.strategy AS snapshot_strategy,
snapshot.symbol AS
snapshot_symbol, sum(snapshot.pnl) AS sum_1
FROM snapshot GROUP BY snapshot.strategy, snapshot.symbol
HAVING sum(snapshot.pnl) < %s
This produces a list of (strategy, symbol, sum(pnl)) as desired.
Now what I'm trying to figure out how to do is get a count over each
strategy of the number of symbols where the sum(pnl)<-15000. So I
need to do one group_by over (strategy, symbol) to get the right sums,
and then one group_by over strategy alone to get the symbol counts
where the threshold criterion is met.
To be honest, I don't really know how to do this in pure SQL, so this
is part sqlalachemy, part SQL question. In the past, I have done
naive sql queries and done the extra logic in python, so this time
around I am trying to be a little more persistent in figuring out the
sql way to do things.
Thanks for your explanation!
JDH
I think the query should look something like this:
SELECT strategy, COUNT(*)
FROM (SELECT snapshot.strategy AS strategy,
snapshot.symbol AS snapshot_symbol,
sum(snapshot.pnl) AS sum_1
FROM snapshot
GROUP BY snapshot.strategy, snapshot.symbol
HAVING sum(snapshot.pnl) < 1500) AS strategies
GROUP BY strategy
Run that by hand on your database and see if you get the results you
expect. The nested query gets the list of strategies that match the
original criteria, and the outer query uses that to produce the counts.
(Note that there are other ways to get the same result. For example, you
could JOIN your snapshot table to the subquery, which might be useful if
you wanted other columns from it in the outer query)
It should be fairly easy to build that query with SA's underlying
expression language. I'm not certain how to do it through session.query,
but I'm sure it's possible.
Simon
The snippet you posted does do what I want when inserted directly into
mysql database. I am getting close on the sqlalchemy incantation::
In [106]: q = session.query(Snapshot.strategy, Snapshot.symbol,
sum_pnl).group_by(Snapshot.strategy,
Snapshot.symbol).having(sum_pnl<-15000)
In [107]: newq = session.query(Snapshot.strategy, func.count('*'))
In [108]: print newq.select_from (q.subquery
()).group_by(Snapshot.strategy )
SELECT anon_1.strategy AS anon_1_strategy, count(%s) AS count_1
FROM (SELECT snapshot.strategy AS strategy, snapshot.symbol AS
symbol, sum(snapshot.pnl) AS sum_1
FROM snapshot GROUP BY snapshot.strategy, snapshot.symbol
HAVING sum(snapshot.pnl) < %s) AS anon_1 GROUP BY snapshot.strategy
The main problem is that the last "GROUP BY snapshot.strategy" is not
what I want, since I need to be doing "GROUP BY anon_1.strategy". Is
there some label magic I can employ to explicitly refer to the
Snapshot.query from the original "q" rather than the one from "newq"?
Thanks,
JDH
I haven't used session.query to do anything more than simple queries, so
I don't honestly know. Does this section of the documentation help at
all?
http://www.sqlalchemy.org/docs/05/ormtutorial.html#datamapping_joins_ali
ases
Actually, the section after that (Using Subqueries) probably does
something very close to what you want. What's the result of these lines:
q1 = (session.query(Snapshot.strategy, Snapshot.symbol, sum_pnl)
.group_by(Snapshot.strategy, Snapshot.symbol)
.having(sum_pnl<-15000)).subquery()
q2 = (session.query(Snapshot.strategy, func.count('*'))
.join(q1, Snapshot.strategy = q1.c.strategy)
.groupby(Snapshot.strategy))
print q2
Simon
> Actually, the section after that (Using Subqueries) probably does
> something very close to what you want. What's the result of these lines:
>
> q1 = (session.query(Snapshot.strategy, Snapshot.symbol, sum_pnl)
> .group_by(Snapshot.strategy, Snapshot.symbol)
> .having(sum_pnl<-15000)).subquery()
>
> q2 = (session.query(Snapshot.strategy, func.count('*'))
> .join(q1, Snapshot.strategy = q1.c.strategy)
> .groupby(Snapshot.strategy))
I assume you mean '=='? I get a foreign key error on the join::
sum_pnl = func.sum(Snapshot.pnl)
q1 = (session.query(Snapshot.strategy, Snapshot.symbol, sum_pnl)
.group_by(Snapshot.strategy, Snapshot.symbol)
.having(sum_pnl<-15000)).subquery()
q2 = (session.query(Snapshot.strategy, func.count('*'))
.join(q1, Snapshot.strategy==q1.c.strategy)
.groupby(Snapshot.strategy))
File "/home/titan/johnh/dev/lib/python2.4/site-packages/SQLAlchemy-0.5.0rc3dev_r5205-py2.4.egg/sqlalchemy/sql/util.py",
line 109, in join_condition
raise exc.ArgumentError(
ArgumentError: Can't find any foreign key relationships between
'snapshot' and '{ANON 157186924 anon}'
Still playing with the aliases in the link you referred me to above
but haven't gotten there yet...
you need an extra tuple on the join, query.join((q1, s.s==q1.c.s))
i think I might need to look into raising an error when the arguments
aren't sent properly, im not sure why it doesn't do that already.
> you need an extra tuple on the join, query.join((q1, s.s==q1.c.s))
This gets past the syntax error, but does not produce the right
results. I had to take some time off today to work on other problems,
but am now returning to this query. To better take advantage of all
of your generous time :-) I wrote a free-standing example that
populates a test database. The initial query Simon suggested works
and produces the desired output -- the goal is to replicate this with
a sqlalchemy query. I also include the join we were working on. Now
that the syntax is correct, it runs, but gives the wrong output.
Since someone proposed a bowling example earlier I decided to run with
that since it fits my problem quite well: instead of finding the
number of symbols per strategy where the sum(pnl)<-150000, we are
looking for the number of bowlers per league where the
sum(frames)>200. Example below
Thanks for any additional input!
import datetime
import sqlalchemy as sa
import sqlalchemy.orm as orm
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import func
Base = declarative_base()
class Frame(Base):
__tablename__ = 'frame'
league = sa.Column(sa.String(12), primary_key=True)
bowler = sa.Column(sa.String(12), primary_key=True)
frame = sa.Column(sa.Integer, primary_key=True)
score = sa.Column(sa.Integer)
def __init__(self, league, bowler, frame, score):
self.league = league
self.bowler = bowler
self.frame = frame
self.score = score
def __repr__(self):
return "Game('%s', '%s', '%d', '%d')"%(self.league,
self.bowler, self.frame, self.score)
def populate(session):
'add some random bowling data to the dbase'
import random
for league in 'strikers', 'punters', 'plungers', 'scorers':
for i in range(random.randint(3,10)):
bowler = chr(i+65)
for frame in range(1, 11):
score = random.randint(0,30)
session.add(Frame(league, bowler, frame, score))
session.commit()
if __name__=='__main__':
engine = sa.create_engine("sqlite:///test.db")
Base.metadata.bind = engine
Session = orm.sessionmaker()
session = Session(bind=engine)
Base.metadata.drop_all()
Base.metadata.create_all()
populate(session)
# this is what we are trying to achieve
query = """\
SELECT league, COUNT(*)
FROM (SELECT frame.league AS league,
frame.bowler AS frame_bowler,
sum(frame.score) AS sum_1
FROM frame
GROUP BY frame.league, frame.bowler
HAVING sum(frame.score) < 200) AS frames
GROUP BY league
"""
print 'desired', session.execute(query).fetchall()
# this is what Simon suggested
total_score = func.sum(Frame.score)
q1 = (session.query(Frame.league, Frame.bowler,
total_score).group_by(Frame.league,
Frame.bowler).having(total_score<200)).subquery()
q2 = (session.query(Frame.league, func.count('*')).join((q1,
Frame.league==q1.c.league)).group_by(Frame.league))
print q2
print q2.all()
Hi John,
I had to play around with this for a while, but I got there in the end.
The problem with the version I suggested was that the subquery produces
a number of rows for each league. When we join that subquery back to the
'league' table, you get a combinatorial explosion in the number of rows
returned. Joining is generally only useful if the field that you are
joining on is unique in at least one of the tables ('league' wasn't
unique in either).
The actual solution is much simpler than I expected. I didn't realise
that session.query doesn't actually need to start with a mapped class or
attribute at all - you can ask it directly for columns in the subquery.
Here's the result:
q1 = (session.query(Frame.league, Frame.bowler, total_score)
.group_by(Frame.league, Frame.bowler)
.having(total_score<200)).subquery()
q2 = (session.query(q1.c.league, func.count('*'))
.group_by(q1.c.league))
By using 'q1.c.league' instead of 'Frame.league', SA doesn't try and put
the 'league' table in the outer query. The SQL looks like this:
SELECT anon_1.league AS anon_1_league,
count(?) AS count_1
FROM (SELECT frame.league AS league,
frame.bowler AS bowler,
sum(frame.score) AS sum_1
FROM frame
GROUP BY frame.league, frame.bowler
HAVING sum(frame.score) < ?) AS anon_1
GROUP BY anon_1.league
Which is pretty much the query we wanted, apart from the names. I hope
it works in your original example as well!
Cheers,
Simon
> Which is pretty much the query we wanted, apart from the names. I hope
> it works in your original example as well!
This worked great -- and I learned a bunch of useful sql and
sqlalchemy tricks along the way. Many thanks for taking the time to
walk me through this one.
JDH