How to generate PostgreSQL "DISTINCT ON" ?

1,017 views
Skip to first unread message

Jeff Putsch

unread,
Jun 2, 2008, 1:01:34 PM6/2/08
to sqlalchemy
I see a note in a change log that this is possible.

I've got the following SQLA code:

ors = []
ors.append(func.lower(pwEntries.uname).like("%%jeff%%"))
ors.append(func.lower(pwEntries.gcos).like("%%jeff%%"))
print pwEntries.query().filter(
or_(*ors)
).order_by(func.lower(pwEntries.eid).desc()).order_by(pwEntries.uname).compile()

And it generates the following query:

SELECT pw_entries.domain_id AS pw_entries_domain_id,
pw_entries.domain_name AS pw_entries_domain_name,
pw_entries.nis_account_id AS pw_entries_nis_account_id, pw_entries.uid
AS pw_entries_uid, pw_entries.eid AS pw_entries_eid, pw_entries.uname
AS pw_entries_uname, pw_entries.gid AS pw_entries_gid, pw_entries.gcos
AS pw_entries_gcos, pw_entries.home AS pw_entries_home,
pw_entries.shell AS pw_entries_shell, pw_entries.canonical AS
pw_entries_canonical, pw_entries.vendor AS pw_entries_vendor,
pw_entries.reserved AS pw_entries_reserved, pw_entries.expires AS
pw_entries_expires, pw_entries.terminated AS pw_entries_terminated
FROM pw_entries
WHERE lower(pw_entries.uname) LIKE %(lower_1)s OR
lower(pw_entries.gcos) LIKE %(lower_2)s ORDER BY lower(pw_entries.eid)
DESC, pw_entries.uname

Which is good. I'd like the select to be modified to have a "DISTINCT
ON (pw_entries.eid, pw_entries.uname, pw_entries.uid)" before the rest
of the columns, something like this:

SELECT DISTINCT ON (pw_entries.eid, pw_entries.uname, pw_entries.uid)
pw_entries.domain_id AS pw_entries_domain_id,

I've tried adding a distinct() call with an array argument, that
didn't work.

I've also tried putting a select(distinct=[..]) call in the chain with
no luck.

Any advice/guidance is very much appreciated.

Thanks,

Jeff.


Michael Bayer

unread,
Jun 2, 2008, 3:29:24 PM6/2/08
to sqlal...@googlegroups.com

On Jun 2, 2008, at 1:01 PM, Jeff Putsch wrote:

>
> Which is good. I'd like the select to be modified to have a "DISTINCT
> ON (pw_entries.eid, pw_entries.uname, pw_entries.uid)" before the rest
> of the columns, something like this:
>
> SELECT DISTINCT ON (pw_entries.eid, pw_entries.uname, pw_entries.uid)
> pw_entries.domain_id AS pw_entries_domain_id,
>
> I've tried adding a distinct() call with an array argument, that
> didn't work.
>
> I've also tried putting a select(distinct=[..]) call in the chain with
> no luck.
>

we support a "distinct()" operator which should acheive this, but
looking at the PG source code it is not correctly implemented. There
seems to be an extension to the "distinct" keyword argument with PG
which works like this:

select(...., distinct=[<list of expressions>])

That should work in 0.4, though this seems to be very old code without
unit tests. In 0.5, I'm really not sure I like that so look for a
possible change of that to: select([distinct(<list of exprs>)])
(ticket:1069)

Michael Bayer

unread,
Jun 2, 2008, 3:37:52 PM6/2/08
to sqlal...@googlegroups.com

On Jun 2, 2008, at 1:01 PM, Jeff Putsch wrote:

> print pwEntries.query().filter(
> or_(*ors)
> ).order_by
> (func.lower(pwEntries.eid).desc()).order_by(pwEntries.uname).compile()
>

reading further, it seems like "DISTINCT ON" doesn't even work the way
it does in MySQL - theres no comma between the DISTINCT_ON portion of
things and the rest of the columns and it is not actually returning
columns....so my notion of using select([distinct(...)]) is not
appropraite here. Additionally, you're trying to do this with ORM
and Query only has a "distinct()" method which does the regular SQL
form of "DISTINCT".

I think the only way this would be possible in ORM for the time being
is query.from_statement(<statement>), where <statement> is a select()
construct or a full text string. the solution for select() /query()
would be adding arguments to (select()/query()).distinct().

Jeff Putsch

unread,
Jun 2, 2008, 4:04:55 PM6/2/08
to sqlalchemy

On Jun 2, 12:37 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
> I think the only way this would be possible in ORM for the time being  
> is query.from_statement(<statement>), where <statement> is a select()  
> construct or a full text string.  the solution for select() /query()  
> would be adding arguments to (select()/query()).distinct().

Thanks, the use of "from_statement" is what I suspected I'd need to
do, but I figured I'd double check first.

For what it's worth it would be nice for the PostgreSQL adapter to
support something like:

query().distinct([...])

where the array/list were a list of columns for the DISTINCT ON.

Thanks for the help.

Jeff.

Michael Bayer

unread,
Jun 2, 2008, 4:31:46 PM6/2/08
to sqlal...@googlegroups.com

yeah thats what I think needs to be done here (ticket is updated).

Reply all
Reply to author
Forward
0 new messages