distinct with a field name

738 views
Skip to first unread message

jcorbett

unread,
Jun 1, 2009, 11:47:35 PM6/1/09
to web2py Web Framework
I was trying to do a select, where I have distinct records, but only
for certain fields. While I don't want to restrict the field on
input, when doing a query sometimes I want to eliminate duplates of a
particular field. Normally in SQL you could do a DISTINCT [field
name], but the distinct in the select seems to do a global distinct on
all fields.

A simple example would be a table with an id, name, created
(timestamp), and possibly other fields. although normally multiple
inserts with the same name are allowed (because of the other fields),
when doing a report on this data I want to restrict the output by the
"newest" items with a unique name (eliminating rows with duplicate
names).

Right now with web2py all I can see is some form of post processing,
but that would be unfortunate. Any other ideas? Am I forced to write
custom sql?

Jason

mdipierro

unread,
Jun 2, 2009, 12:03:12 AM6/2/09
to web2py Web Framework
db(....).select(...,distinct=True)

I think it does what you ask. if you give it a try, please let us
know.

Massimo

Horst Herb

unread,
Jun 2, 2009, 2:26:28 AM6/2/09
to web...@googlegroups.com
On Tue, Jun 2, 2009 at 2:03 PM, mdipierro <mdip...@cs.depaul.edu> wrote:
>
> db(....).select(...,distinct=True)
>
> I think it does what you ask. if you give it a try, please let us
> know.

I don't think it does - I have the same problem as jcorbett. As soon
as you include the "id" in the query, "distinct=True" will return all
rows, because they are distinct by id.

Some SQL dialects (eg in Postgres) however let you specify "select
distinct on(columns), other columns ..." or similar which seems quite
convenient

However, after reading
http://www.databasejournal.com/features/postgresql/article.php/3437821/SELECT-DISTINCT-A-SQL-Case-Study.htm
I started thinking about alternatives

Horst

SergeyPo

unread,
Jun 2, 2009, 2:48:42 AM6/2/09
to web2py Web Framework
This works for sure in sqlite, mysql and oracle:

db().select(db.tablename.fieldname, distinct=True)

and this is correct approach because I can not imagine biz logic when
you want to select distinct values AND their ID's etc. fields. You
normally use select distinct when you need a list of available values
in a field.

On Jun 2, 10:26 am, Horst Herb <my.list.subscripti...@gmail.com>
wrote:
> On Tue, Jun 2, 2009 at 2:03 PM, mdipierro <mdipie...@cs.depaul.edu> wrote:
>
> > db(....).select(...,distinct=True)
>
> > I think it does what you ask. if you give it a try, please let us
> > know.
>
> I don't think it does - I have the same problem as jcorbett. As soon
> as you include the "id" in the query, "distinct=True" will return all
> rows, because they are distinct by id.
>
> Some SQL dialects (eg in Postgres) however let you specify "select
> distinct on(columns),  other columns ..." or similar which seems quite
> convenient
>
> However, after readinghttp://www.databasejournal.com/features/postgresql/article.php/343782...

Horst Herb

unread,
Jun 2, 2009, 3:10:37 AM6/2/09
to web...@googlegroups.com
On Tue, Jun 2, 2009 at 4:48 PM, SergeyPo <ser...@zarealye.com> wrote:
>
> This works for sure in sqlite, mysql and oracle:
>
> db().select(db.tablename.fieldname, distinct=True)
>
> and this is correct approach because I can not imagine biz logic when
> you want to select distinct values AND their ID's etc. fields. You
> normally use select distinct when you need a list of available values
> in a field.

practical example from my own web app:
A doctor prescribes medication. He can select it from a pharmaceutical database.
He can select the drug by generic or brand name.
Each drug comes in a variety of forms, strengths, packet sizes etc.
plus some incur health system subsidies, some barnd price premiums
etc.

In most instances, I am not interested whatsoever in 20 manufacturers
producing the same product and it coming in a variety of packet sizes
- all I want is Amoxycilin trihydrate tablets 500mg (and not a hundred
variations thereof cluttering my selection widget). There are some
60,000 drug product entries in the larger drug database - and less
than 2000 unique drugs

But, in order to call up product information and to check allergies
and interactions - I need the id of the drug.

Just one of a great many practical examples.

Horst

mdipierro

unread,
Jun 2, 2009, 10:19:56 AM6/2/09
to web2py Web Framework
Try the following:

edit gluon/sql.py and replace

if attributes.get('distinct', False):
sql_s += ' DISTINCT'

with

if attributes.get('distinct', False)==True:
sql_s += ' DISTINCT'
ekif attributes.get('distinct', False):
sql_s += ' DISTINCT ON %s' % attributes['distinct']

then in your query use

db(...).select(...,distinct=db.table.field1|db.table.field2)

Does it do what you need?

SergeyPo

unread,
Jun 2, 2009, 10:32:00 AM6/2/09
to web2py Web Framework
If you are ok that your selection widget contains uncomplete
information, than you are right. But what I am telling about, is that
you give the user a selector with unique drug names at first:

db().select(db.drugs.name, distinct=True)

user selects 'Amoxycilin', you make another query:

Amoxycilines = db(db.drugs.name.like('Amoxycilin')).select()

and than if you want to offer google's 'I am feeling lucky' approach
you just give the user Amoxycilines[0],
and if you want the user to choose from variety, you print all
Amoxycilines to selector widget.

'I am feeling lucky' approach that works without much overhead:

an_Amoxycilin = db(db.drugs.name.like('Amoxycilin')).select(limitby=
(0,1))

On Jun 2, 11:10 am, Horst Herb <my.list.subscripti...@gmail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages