Comparing multiple columns

345 views
Skip to first unread message

Liam O'Dwyer

unread,
Jul 27, 2021, 7:32:00 AM7/27/21
to OpenRefine
Hi,
I have a dataset with a column of mixed values that I want to check against other columns in that row. I compared against an individual column using a custom text facet:
value == cells["Address line 1"].value

Is there a way I can specify multiple columns to that operation? Or ideally not specify the column at all? I don't care which column the data reappears in, just want to see if it is somewhere else in that row! 
Any help appreciated,
Liam

Owen Stephens

unread,
Jul 27, 2021, 7:48:34 AM7/27/21
to OpenRefine
There's a bit of a trick to working across multiple columns. You can get column names by using the expression:
row.columnNames
This gives you an array of column names. Once you have this, you can use a loop control such as "forEach" or "filter" to feed these values into an expression one by one - so using this you can essentially loop through all the cells in a row. In this case you want to compare this to your value, so I'd use:

filter(row.columnNames,cn,cells[cn].value==value)

This takes each column name, assigns it to the variable 'cn' then uses that to check whether your value matches the value in the column currently being checked - and because we are using `filter` the output is an array of the matching values. We can then check the length of the array output by `filter` to find how many matches there are. You want this to be greater than 1 (because the expression checks all columns, including the one you are creating the filter on, which of course always includes the value you are checking for. So the custom text facet is:

filter(row.columnNames,cn,cells[cn].value==value).length()>1

If there are any issues let me know

Owen

Liam O'Dwyer

unread,
Jul 27, 2021, 10:04:02 AM7/27/21
to OpenRefine
That worked perfectly, thanks so much for the explanation! 

Best,

Liam

Reply all
Reply to author
Forward
0 new messages