How to keep only one of duplicates records

527 views
Skip to first unread message

damien henrotin

unread,
Nov 16, 2020, 8:11:08 AM11/16/20
to OpenRefine
Hello all,

I try to exclude some duplicate result but I need to keep the unique result.

Here is a example (no tthe real but just an example):

I have many RECORDS (not rows) and I have a column with one number by record.


I would like to keep only one :

N1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11
N12

Can you help me ? Thanks

damien henrotin

unread,
Nov 16, 2020, 8:12:24 AM11/16/20
to OpenRefine
duplicates.JPG

damien henrotin

unread,
Nov 16, 2020, 8:13:40 AM11/16/20
to OpenRefine
Here is the project
test_duplicates.openrefine.tar.gz

Thad Guidry

unread,
Nov 16, 2020, 10:15:48 AM11/16/20
to openr...@googlegroups.com
If I understand you correctly, then I think you are looking for a way to filter or facet by duplicates?
Take a look at Facets -> Customized facets -> Duplicate facets  (then you might click on "false" to select only the rows that are not duplicated).
In order to eliminate duplicates, then that's another problem, which can be handled by flagging the rows, and then optionally deleting the rows, or simply blanking out the values after the 1st occurrence (blank down)



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/79ef7664-b52b-4ce4-a848-949cbe434e42n%40googlegroups.com.

damien henrotin

unread,
Nov 16, 2020, 12:32:44 PM11/16/20
to OpenRefine
Hello Thad,

Thanks for your reply.

I'm ok for Duplicate Facet. Problem is how to remove duplicate on my example. If I Push on True to see all the duplicates how can I remove the duplicates and keep only one of each?

Thad Guidry

unread,
Nov 16, 2020, 12:55:47 PM11/16/20
to openr...@googlegroups.com
If some ordering is important in your other record columns, then you might want to sort on those columns and reorder permanently.
Then try Blank Down on your Number column to see how it works.  You can always Undo as you experiment :-)



damien henrotin

unread,
Nov 16, 2020, 3:47:13 PM11/16/20
to OpenRefine
Thad, I try but I did not get the good result. Can you show me on my example project please ?

Thank you

Thad Guidry

unread,
Nov 16, 2020, 3:54:29 PM11/16/20
to openr...@googlegroups.com
Here is an example of what I mean.
There are several guides and videos on the internet if you just search with "OpenRefine remove duplicates"  or "OpenRefine removing duplicates", etc.



damien henrotin

unread,
Nov 16, 2020, 4:02:23 PM11/16/20
to OpenRefine
Nice Thank you thad. Just one thing again, you suggest me to transform my column to numeric value, it's better before apply a sort ?

Thad Guidry

unread,
Nov 16, 2020, 4:08:25 PM11/16/20
to openr...@googlegroups.com
Sort works on any of the value types that is listed in the Sort dialog.  Did you take a look? :-)
And you have control over ascending/descending order with the radio button option.
Click on the various radio buttons in the Sort dialog to see what and how it changes. (you can even position Blank values first if you want, or last, etc. with the other option in the Sort dialog)
Remember if you mess up on the Sort, you can always Undo and try again! :-)



damien henrotin

unread,
Nov 16, 2020, 4:08:27 PM11/16/20
to OpenRefine
Ok NIce Thad, I think I'm on the right way

Capture d’écran 2020-11-16 à 22.07.26.jpg

Thad Guidry

unread,
Nov 16, 2020, 4:12:36 PM11/16/20
to openr...@googlegroups.com
Great Damien, glad to hear it!
Facets are a great way to also double check things after you perform operations.  Use as many Facets as needed to accomplish inspection tasks you need to do.



damien henrotin

unread,
Nov 16, 2020, 4:15:28 PM11/16/20
to OpenRefine
Indeed :-) 

As always, thank you for your advice Thad ! 

Reply all
Reply to author
Forward
0 new messages