How to find duplicate rows with Web2py DAL

612 views
Skip to first unread message

Loreia

unread,
Aug 16, 2012, 5:20:54 AM8/16/12
to web...@googlegroups.com
Hi,

I am looking for a way to automatically find any possible duplicate records (rows) in one table of my Postgres database.
Quick google-ing reveals it is easy to do it with pure SQL, e.g. one way specific for Postgres: http://stackoverflow.com/questions/1746213/how-to-delete-duplicate-entries-in-postgresql

But I would like to be able to do it with web2p DAL (which would make it database independent).
Is there a way to do it with web2py DAL?

I just want to be able to find those duplicates, I don't need to delete them.

Thank you and best regards,
Loreia

Niphlod

unread,
Aug 16, 2012, 7:23:39 AM8/16/12
to web...@googlegroups.com
group by your unique columns, count the rows, find the ones with count > 1.

db.define_table(
   
'finddup',
   
Field('f1_name'),
   
Field('f2_name'),
   
Field('f3_name')
   
)
fd
= db.finddup
count
= fd.id.count()
rtn
= db(fd.id>0).select(fd.f1_name, fd.f2_name, fd.f3_name, count, groupby=fd.f1_name|fd.f2_name|fd.f3_name, having=count>1)

Loreia

unread,
Aug 17, 2012, 11:09:40 AM8/17/12
to web...@googlegroups.com
Thanks a lot.

BR
Loreia

Niphlod

unread,
Aug 17, 2012, 3:06:59 PM8/17/12
to web...@googlegroups.com
np.
BTW: no simple ways to do that. Just mind that if there are millions rows (especially if the columns are not indexed) that is going to take some time.

Loreia

unread,
Aug 17, 2012, 3:48:39 PM8/17/12
to web...@googlegroups.com
Don't worry this table is modest in size.
Any solution is good enough for me, as long as it gets the job done :-))

Thanks and BR
Loreia

--
 
 
 

Reply all
Reply to author
Forward
0 new messages