GREL : why is there no "IN" function ?

523 views
Skip to first unread message

Ettore Rizza

unread,
Apr 12, 2017, 4:25:26 PM4/12/17
to OpenRefine
Hi everyone, 

I just noticed, by answering a question on StackOverflow, that the documentation on "arrays" could be enriched. Notably on the different ways to extract one or more elements. 

But before I do that, I want to make sure I don't forget anything. 

Let's take an example of an array: the list of column names in a project. 

row.columnNames

[ "ID", "TYPE", "TXT_FRE", "x", "y", "Code postal", "Commune" ]
 

If you want to select all the elements except the first one, you can use slice(), true?

row.columnNames.slice(1)

[ "TYPE", "TXT_FRE", "x", "y", "Code postal", "Commune" ]
 


To select the first 3 elements, it would be: 

row.columnNames.slice(0,3)

[ "ID", "TYPE", "TXT_FRE" ]
 


Brackets can also be used to isolate one or more elements, for example all except the last: 

row.columnNames [0, -1]

[ "ID", "TYPE", "TXT_FRE", "x", "y", "Code postal" ]
 


If we want to select an element by its value, we could use filter(): 

filter(row.columnNames, v, v == "ID")

[ "ID" ]
 


But how to select multiple elements of an array by their values? So far, I have found no alternative to using Python/Jython: 

return [x for x in row['columnNames'] if x in ["ID", "TYPE", "Code postal"]]

[ "ID", "TYPE", "Code postal" ]



This kind of thing would be easy to do in GREL if the contains() function accepts a regular expression, but it accepts only a string. And the function or() can take only two arguments, no more. I do not know much about Java, but it looks like an IN function for arrays would be relatively simple to create

Or is there another method to get the same result in GREL that I have forgotten?

Owen Stephens

unread,
Apr 13, 2017, 3:52:27 AM4/13/17
to OpenRefine
There are a couple ways you can combine 'filter' with other functions to select multiple values:

1 Use match with a regular expression and check the match was successful (I've used a check on length to see if 'match' worked here but there are other ways you could do this)

filter(row.columnNames, v, v.match(/(ID|TYPE|Code postal)/).length>0)

2 Use 'or' to combine multiple conditions (unfortunately 'or' only takes 2 conditions, which means doing > 2 conditions leads to some slightly awkward syntax

filter(row.columnNames, v, or(v=="ID",or(v=="TYPE",v=="Code postal")))

I suspect there might be some other ways of doing this too, but those are the two that spring to mind. Not as neat as the 'in' syntax though

Owen

Ettore Rizza

unread,
Apr 13, 2017, 9:58:39 AM4/13/17
to OpenRefine
Hi Owen,

I don't know why the solution with value.match() doesn't work on my project... I do not like this function very much, I find it counterintuitive and laborious. While waiting to find out where is my error, I added a Jython recipe on how to compare an Open Refine array with a list of values in a file.

Owen Stephens

unread,
Apr 13, 2017, 10:10:06 AM4/13/17
to OpenRefine
Hi Ettore,

It doesn't work because I missed some parentheses after 'length' :) Try:

filter(row.columnNames, v, v.match(/(ID|TYPE|Code postal)/).length()>0)

Owen

Ettore Rizza

unread,
Apr 13, 2017, 10:25:40 AM4/13/17
to OpenRefine
Arf, of course. Now, it works ! This .length()>0 is an interesting trick, thanks !

Owen Stephens

unread,
Apr 13, 2017, 10:59:23 AM4/13/17
to OpenRefine
Another trick with 'match' output is to test it's type - if the match expression has worked (even if you haven't captured anything) it will have type of 'array'. If it fails it will be 'undefined'

match(/.*/).type()

Will return 'array' (always, because that match expression will always work)

Owen Stephens

unread,
May 22, 2017, 10:03:49 AM5/22/17
to OpenRefine
Just to say that the GOKb Utilities extension I've just announced includes an 'inArray' GREL function:

To install the extension, download the zip file from https://github.com/ostephens/refine-gokbutils/archive/master.zip, unzip the files and drop the resulting folder into the /extensions folder in OpenRefine.

You can use it to test for the existence of a specific value in an array like - given an array ["12","23","34"]

value.inArray("12") would return true
value.inArray("1") would return false

Feedback welcome here or in GitHub issues at https://github.com/ostephens/refine-gokbutils/issues

Owen

Ettore RIZZA

unread,
May 22, 2017, 10:05:35 AM5/22/17
to openr...@googlegroups.com
<3

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages