Removing Duplicates

52 views

Skip to first unread message

Huff

unread,

Jun 4, 2013, 5:52:00 PM6/4/13

to buz...@googlegroups.com

I just recently started using Buzhug. I have a question that is hopefully an easy one to answer, with me just missing the obvious answer.

I have a database containing thousands of rows, with the intention to eventually have it contain hundreds of thousands, if not over a million. There is one field that I'm intending to use as a key -- outside of __id__ -- but the data is not checked for duplicates before entering into the database. What is the best way to remove any records whose 'key' field is a duplicate of another one in the database?

Currently I'm brute-forcing it. Using a Select to get the 'key' and '__id__' for all entries, then iterating over those values one at a time throwing new 'key' values into a "uniques" array and dupes into a "duplicates." I then delete all the duplicate entries.

Is there a more efficient way to do this with buzhug? Something I am just missing, as a new user of the system.

Huff

unread,

Jun 5, 2013, 2:59:53 PM6/5/13

to buz...@googlegroups.com

I suppose an easier way to ask this question would be if buzhug has an equivalent to the "distinct" operator in SQL. It doesn't have to perform deletions, just filter the result of a select() statement to only unique values in whatever field you specify.

Ex: dB.select(['key','blah'], blah>10).distinct('key')

Reply all

Reply to author

Forward

0 new messages