Remove rows if 2 rows match in one column and do not in another column

55 views
Skip to first unread message

Salem Ababneh

unread,
May 28, 2018, 4:48:50 PM5/28/18
to OpenRefine
Hello wonderful community.

I was thinking if you can help me with this problem.

Lets say I have 3 columns, customer name, address, mobile number, and I want each mobile number for each customer to be unique, I don't want it to be repeated with another customer.
Unfortunately I have one mobile number for 3 different customers, which is not right, so I delete the three of them...

Is there a way I can automate this through OpenRefine? To figure out that this number was used by different customer names?

Thanks

Example:

Row 1 - Salem - Subway street - 0799897979
Row 2 - Jack - Back alley - 0799897979
Row 3 - Sandy - Scotch building  - 0797775215

The result:

Row 3 - Sandy - Scotch building  - 0797775215

Thad Guidry

unread,
May 28, 2018, 5:02:09 PM5/28/18
to openr...@googlegroups.com
Look at our Duplicates Facet in the Column menu.
We also have a uniques() GREL expression.

-Thad

Salem Ababneh

unread,
May 28, 2018, 5:12:07 PM5/28/18
to OpenRefine
Hello Thad,

Duplicates Facet can help me find duplicates in one column, but not find duplicates in one and column and not in another column.

And to add a new column that has a value of true or false for numbers that are not unique for the same customer.

And I'm reading about uniques() and cannot find a way that would help me.

Can you please Thad give me more details of what you were thinking.

Thank you

Thad Guidry

unread,
May 28, 2018, 5:16:32 PM5/28/18
to openr...@googlegroups.com
Use multiple Text Facets... or...

If you are trying to eliminate multiple duplicate rows (many columns together)...

Then make a new column that concatenates all your columns

Then run the Duplicates Facet on that newly created column.
Then flag or star or remove rows if need be from the ALL menu.

-Thad

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Salem Ababneh

unread,
May 28, 2018, 5:32:37 PM5/28/18
to OpenRefine
Hey Thad,

Thank you again for the fast reply.

But when I did concatenate the two columns, then it only gave me the rows when the mobile number and name are the same.

What I want is to find when the mobile number was repeated by another name, not the same name.
Like:

Salem - 07965656565
Salem - 07965656565
Sandy - 07974512121
Jack - 07222487224
Sindy - 0772242778
Jonathan - 07222487224
Richard - 07965656565

To have these results after I run code:

Salem - 07965656565 - true
Salem - 07965656565 - true
Sandy - 07974512121 - false
Sandy - 07974512121 - false
Sandy - 07974512121 - false
Jack - 07222487224 - true
Sindy - 0772242778 - false
Jonathan - 07222487224 - true
Richard - 07965656565 - true

And I will simply remove the true ones, and have it automated each time I upload a report :D

And using multiple Facets will require manual work and not automated each time. To manually select which is different for each row. And I have about 10k+ rows.

I'm new to OpenRefine, maybe I don't understand what you are telling me.

Ettore Rizza

unread,
May 29, 2018, 4:39:12 AM5/29/18
to OpenRefine
Hi Salem,

I think the easiest way is to turn your phone number column into records and check which of these records contains different names. Here is a demo in screencast. 




The GREL formula I used is :

if(row.record.cells['name'].value.length() > 1, true, false)

(replace the word "name" by the name of your column containing your customers names)

Ettore Rizza

unread,
May 29, 2018, 8:08:15 AM5/29/18
to OpenRefine
My apologies, the right GREL formula for your case is :

if(row.record.cells['name'].value.uniques().length() > 1, true, false)

Salem Ababneh

unread,
May 29, 2018, 8:12:52 AM5/29/18
to OpenRefine
Hello Ettore, thank you for your quick reply.

I tried the solution you gave me, but I had an error with Sandy.


Sandy should be false, because the mobile number for her wasn't given for another person, so it is unique for her, so it should give false for her.
But the other ones are correct because the mobile number was given to another name.

I added a column to know which is correct and not.

Really appreciate your massive help and support, and kindly check the picture I attached to view what happened.

Salem Ababneh

unread,
May 29, 2018, 8:16:37 AM5/29/18
to OpenRefine
Ettore, you are an amazing person. Really appreciate your help, it worked perfectly.

Thank you for your support and kind help.
And thank you Thad for your help.
Reply all
Reply to author
Forward
0 new messages