Using count and distinct with window function

1,114 views
Skip to first unread message

Eduardo

unread,
May 16, 2012, 6:07:26 AM5/16/12
to sqlal...@googlegroups.com
Hello,
I have got a query of the following type:

rows = session.query(*[func.count().over().label("count")]+map(lambda column: MyClass.__dict__[columns],columns)).filter(...).limit(n).offset(m).all()

it returns the number of results together with values of selected columns. The problem is when I try to apply the distinct query I see that func.count().over().label("count") does not return correct results, that is, it returns the number of results for the query by which the distinct function is omitted.
Is there any workaround for this problem?
The query contains no joins.
Thank you
ED

Eduardo

unread,
May 16, 2012, 10:05:08 AM5/16/12
to sqlal...@googlegroups.com
To put it more clear why:
rows = session.query(*[func.count().over().label("count")]+map(lambda column: MyClass.__dict__[columns],columns)).filter(...).limit(n).offset(m).all()
works and
rows = session.query(*[func.count().
over().label("count")]+map(lambda column: MyClass.__dict__[columns],columns)).filter(...).distinct().limit(n).offset(m).all()
does not?
Thanks

Michael Bayer

unread,
May 16, 2012, 11:30:01 AM5/16/12
to sqlal...@googlegroups.com
whats the SQL you're looking for ?    work it out from a SQL perspective first - get an exact SELECT, then run it straight on that database, to work out what should be done here.  Window functions + DISTINCT + LIMIT/OFFSET sounds like it's really getting out there, and I'm not even sure what various DB's would be doing in response to that.


--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/W80zRGtGptAJ.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

Eduardo

unread,
May 16, 2012, 12:13:28 PM5/16/12
to sqlal...@googlegroups.com

Hi,
I am trying to write this SQL coomand:

SELECT DISTINCT(col1, col2,col3) FROM mytable WHERE col1='something'


in sqlalchemy form:


columns=['col1','col2','col3']

rows = session.query(*[func.count().over().label("count")]+map(lambda column: MyClass.__dict__[columns],columns)).filter(...).limit(n)
.offset(m).all()

this query works it returns the results together with the number of hits.
The problem is that when I apply the distinct() command
rows = session.query(*[func.count().over().label("count")]+map(lambda column: MyClass.__dict__[columns],columns)).filter(...).limit(n).offset(m).distict().all()
 func.count().over().label("count") does not show the correct number of hits because func.count() relates to all column in table and to those selected - that is col1, col2 and col3
My question is how to write the count function that will correctly count number of distinct hits but only for the selected columns.

I tried with:
 
func.count().over(partition_by=MyClass.col1,MyClass.col2,MyClass.col3).label("count")

and it did not work

If I want to use:


 
func.count(distinct(MyClass.col1,MyClass.col2,MyClass.col3)).over().label("count")

only one column is allowed as the input argument for the count() function.


Is there any way to solve this?
Thanks



Am Mittwoch, 16. Mai 2012 12:07:26 UTC+2 schrieb Eduardo:

Michael Bayer

unread,
May 16, 2012, 1:28:15 PM5/16/12
to sqlal...@googlegroups.com
I dont see any "COUNT", OVER, LIMIT, or anything else in that SQL statement, so I dont see why you're using func.count().over(), limit(), offset(), etc.

The query as written is equivalent to "SELECT DISTINCT col1, col2, col3 FROM mytable WHERE col1='something'",  which is:

from sqlalchemy.sql import column, table, distinct
from sqlalchemy.orm import Session

mytable = table('mytable', column('col1'), column('col2'), column('col3'))
print Session().query(
        mytable.c.col1, 
            mytable.c.col2, 
            mytable.c.col3).\
            distinct().\
        filter(mytable.c.col1=='something')

but this is not what you asked for earlier, so I suspect this is not specifically what you want.

The PG dialect supports special extensions for DISTINCT which allows DISTINCT ON (list of columns), if you pass a list of them to query.distinct(), for example.



--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/11I7cmn-UaoJ.

Eduardo

unread,
May 16, 2012, 1:48:43 PM5/16/12
to sqlal...@googlegroups.com


Am Mittwoch, 16. Mai 2012 12:07:26 UTC+2 schrieb Eduardo:

Eduardo

unread,
May 16, 2012, 1:50:14 PM5/16/12
to sqlal...@googlegroups.com
SORRY

I want to create query that will return as a first element the number of hits


SELECT COUNT(*) FROM(SELECT (DISTINCT(col1,col2,col3)) FROM mytable where col1='something' LIMIT 1 OFFSET 2)

and in the second part should be values of the hits

 
SELECT (DISTINCT(col1,col2,col3)) FROM mytable where col1='something' LIMIT 1 OFFSET 2

is it possible to solve this with one query I want to avoid having to apply first query.count() and then query.all() to get results is this possible to do this with one query.
thanks




Michael Bayer

unread,
May 16, 2012, 1:59:10 PM5/16/12
to sqlal...@googlegroups.com
print Session().query(
        mytable.c.col1, 
            mytable.c.col2, 
            mytable.c.col3).\
            distinct().\
        filter(mytable.c.col1=='something').\
        limit(1).offset(2).\
        from_self(func.count('*'))

or just say count() instead of from_self(), will execute it.   you don't need "distinct(col1, col2, col3)" here, only "DISTINCT col1, col2, col3", since you aren't applying DISTINCT to a subset of the available columns.  DISTINCT(x, y, z) is a Postgresql-specific extension (see http://www.postgresql.org/docs/9.0/static/sql-select.html#SQL-DISTINCT )




--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/xr5fGu1B0yoJ.

Eduardo

unread,
May 16, 2012, 2:20:24 PM5/16/12
to sqlal...@googlegroups.com


Am Mittwoch, 16. Mai 2012 19:59:10 UTC+2 schrieb Michael Bayer:
If I put count() the result will be the number of hits I would like to have both the number of hits and resulting column values for example
[(33,val1,val2,val3),(33, val21,val22,val23),...]
If I execute it with
Session().query(
        mytable.c.col1, 
            mytable.c.col2, 
            mytable.c.col3).\
            distinct().\
        filter(mytable.c.col1=='something').\
        limit(1).offset(2).\
        count()
I get only [33] (for example)
if I use
from_self()
I get an error
 ProgrammingError: (ProgrammingError) column "anon_1.col1" must appear in the GROUP BY clause or be used in an aggregate function

Thanks
 
To unsubscribe from this group, send email to sqlalchemy+unsubscribe@googlegroups.com.

Michael Bayer

unread,
May 16, 2012, 2:52:32 PM5/16/12
to sqlal...@googlegroups.com

On May 16, 2012, at 2:20 PM, Eduardo wrote:

>
>
> Am Mittwoch, 16. Mai 2012 19:59:10 UTC+2 schrieb Michael Bayer:
> If I put count() the result will be the number of hits I would like to have both the number of hits and resulting column values for example

you'd need to use GROUP BY for that. if you say "SELECT COUNT(*) FROM <anything>", you get just one column back - the number of rows in <anything>.


> If I execute it with
> Session().query(
> mytable.c.col1,
> mytable.c.col2,
> mytable.c.col3).\
> distinct().\
> filter(mytable.c.col1=='something').\
> limit(1).offset(2).\
> count()
> I get only [33] (for example)
> if I use
> from_self()
> I get an error
> ProgrammingError: (ProgrammingError) column "anon_1.col1" must appear in the GROUP BY clause or be used in an aggregate function

can't reproduce - see sample:

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base= declarative_base()

class A(Base):
__tablename__ = "a"

id = Column(Integer, primary_key=True)
col1 = Column(String)
col2 = Column(String)
col3 = Column(String)

e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)
Base.metadata.create_all(e)

s = Session(e)
s.add_all([
A(col1='col1A', col2='col21', col3='col31'),
A(col1='col1A', col2='col22', col3='col32'),
A(col1='col1A', col2='col23', col3='col33'),
A(col1='col1B', col2='col24', col3='col34'),
])
print s.query(
A.col1,
A.col2,
A.col3).\
distinct().\
filter(A.col1=='col1A').\
limit(1).offset(2).\
from_self(func.count('*')).\
all()

output (after create table):

SELECT count(%(param_1)s) AS count_1
FROM (SELECT DISTINCT a.col1 AS a_col1, a.col2 AS a_col2, a.col3 AS a_col3
FROM a
WHERE a.col1 = %(col1_1)s
LIMIT %(param_2)s OFFSET %(param_3)s) AS anon_1
2012-05-16 14:51:03,099 INFO sqlalchemy.engine.base.Engine {'param_1': '*', 'col1_1': 'col1A', 'param_3': 2, 'param_2': 1}
[(1L,)]

it would be very helpful if you could supply actual specific working code here instead of excerpts, it takes a lot of time trying to guess exactly what issue you're having, thanks


Reply all
Reply to author
Forward
0 new messages