GREL 'match' 'conains' commands are not working.

958 views
Skip to first unread message

함형건

unread,
Sep 24, 2014, 2:16:10 AM9/24/14
to openref...@googlegroups.com
I have used GREL commands with rerular expressions, and found out 'match' and
'contains' are not working at all.
They only result in 'null' or '[]'. 'Contains' works fine when I use with general strings, but when it conmes to regular expression it works strangely again.

The cell is a very simple string like 'abcdefgh',
andI I imput value.match(/a.c/).
The result is 'null'
Other attempts have not worked, either.

value.match(/a.c/) -> null
value.match('/a.c/') -> null
value.match('/a......h/') -> [ ]
value.contains('/a.c/') -> false


Any idea as to what went wrong?


Thad Guidry

unread,
Sep 24, 2014, 12:24:02 PM9/24/14
to openref...@googlegroups.com
Try to evaluate the output of match() function, which is actually an Array. (a set of capture groups, technically where the code is here: https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/expr/functions/strings/Match.java)

contains() does return a Boolean, and works differently.

match() does not return a Boolean, as you are expecting, but instead it returns capture groups.    The GREL code actually utilizes part of this: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html

Anyways,

In your GREL, You can use something like this, if you want to see if your regex pattern matches some groups or not:

value.match(/a.c/) != null

value.match(/a.c/) == null

or

isNull(value.match(/a.c/))

You can also use if() clause for further comparison or to allow you to create or take a different path, like joining or replacing values for each matching pattern,etc.. like so:

forEach(
  "Marcos Valério delatou Luiz Inácio, dizendo que ele recebeu dinheiro no mensalão.".split(/\s+/),
  v,
  if(v.match(/[A-Z].*/) != null, v, '|')
)
  .join(' ')
  .replace(/( ?\| ?)+/, '|')
  .replace(/\|$/, '')

Here's a recipe that I have used quite often:

All the best,








--
You received this message because you are subscribed to the Google Groups "OpenRefine Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Thad Guidry

unread,
Sep 24, 2014, 12:38:19 PM9/24/14
to openref...@googlegroups.com
I should have probably given you an actual example that shows Capture Groups within Regex.  Capture Groups are defined by parentheses around the pattern your looking for.

value.match(/(100)|(100003)/)

The above GREL looks for a pattern group of 100 OR a pattern group of 100003 in your cell values and returns the array.

You could do something like

value.match((a.c)|(/a......h/))

Use the pipe separator between your capture groups when you want to find 1 capture group pattern OR another capture group pattern.
You also need to make sure that you use standard Java style Regex syntax.


Message has been deleted

함형건

unread,
Sep 26, 2014, 1:57:18 PM9/26/14
to openref...@googlegroups.com
 I've tested GREL again, and it seems that for some reason, the match and contains commands don't work at all.
 
 I attached the captured images from my work. As I said before, the cell is very simple string like 'abcdefgh' ,and a  GREL command like "value.match(/a.c/) "produces 'null',  which doesn't make any sense at all.
 
 'value.match(/a.c/)==azz' results in 'true' ,which is nonsensical.

 I still have no idea as to what went wrong.
     

example1_'match'.jpg
example2_'match'.jpg
example3_'match'.jpg
example4_'match'.jpg
example5_'match'.jpg
example6_'contains'.jpg

Thad Guidry

unread,
Sep 26, 2014, 5:03:07 PM9/26/14
to openref...@googlegroups.com
No, you need to use CAPTURE GROUPS... those are expressed with the use
of Parentheses... ( and )

You forgot to add the Parentheses.

Like this:

value.match(/(a......h)/)

You can then do something like this:

isNotNull(value.match(/(a......h)/))

to get a basic True or False output for the idea of "Does this Capture
Group pattern match this value or any other words, IsNotNull() : True
or False?"

Thad Guidry

unread,
Sep 26, 2014, 5:14:48 PM9/26/14
to openref...@googlegroups.com
Does this updated explanation of match() make more sense to you ?

Updated match() explanation on wiki:
https://github.com/OpenRefine/OpenRefine/wiki/GREL-String-Functions#matchstring-s-regexp-p

함형건

unread,
Sep 27, 2014, 12:11:41 AM9/27/14
to openref...@googlegroups.com
 
Thank you very much for your kind replies, now I 've got it.

    The 'match' command seems work somewhat differently from other GREL commands like 'spilt' and  'partition' which all can be used with regex but do not need parentheses with them.

    Adding parentheses to 'match' command does the trick, but I am stil wondering if there is any way possible I can extract just a portion of the string using 'match'.

    For example, in the previous simple cell I showed, what I wanted to do was to extract just a part of string of 'abcdefgh' and produce 'abc' using regex like /a.c/. 

   With the  'match' command, it results in 'abcdefgh' from 'value.match.(/(a......h)/)' which handles a certain string in its entirety but do not extract a part of the string that matches a given regex.

   I can extract part of the string using a command like 'partiton' but it would be very useful if I could do it with 'match',too.

   And, for the 'contains' command, adding parentheses still doesn't make any difference. 

  Thank you, again!.

           
  

Thad Guidry

unread,
Sep 27, 2014, 3:20:03 PM9/27/14
to openref...@googlegroups.com
match() does return an array of matches... but not the extracted
substring that is the job of partition()

Like this:

"abcdefgh".partition(/a.c/)[1]

value.partition(/a.c/)[1]

You use square brackets after the partition function to pick which
part of the Array output you want. It starts at [0] You can use
square brackets on all of the GREL functions that output an Array or
Set.

See the updated example I just added to the wiki:
https://github.com/OpenRefine/OpenRefine/wiki/GREL-String-Functions#partitionstring-s-string-or-regex-frag-optional-boolean-omitfragment

Is the wiki example now clear enough ?

barrious12

unread,
Sep 28, 2014, 12:27:03 PM9/28/14
to openref...@googlegroups.com
Thank you, it's very clear.
    One last question.  It seems that the 'contains' command doesn't handle regex. Is that right?
    
    
      

Thad Guidry

unread,
Sep 28, 2014, 7:55:14 PM9/28/14
to openref...@googlegroups.com
Correct, the contains() expects strings, and does not parse or use
Regex....See the Reference material on our wiki to see which functions
can use Regex.

barrious12

unread,
Oct 24, 2014, 12:27:24 AM10/24/14
to openref...@googlegroups.com
Hello, can I ask another question about 'match( ) function' again?
It seems to me that 'match' work for English language but sometimes doesn;t work for foreign languages like Korean while other GREL srting functions work for Korean without any problems.
[가-힝] in Korean means the same as [a-z] in Enlglsh, but when in the attached samples,   value.match(/([가-힣].*)/) worked fine but value.match(/([가-힣]*)/) produce only 'null's.
Can you check it out?

Thank you! 

'match' example' image.jpg
'match' example image2.jpg

Owen Stephens

unread,
Feb 20, 2015, 4:28:57 AM2/20/15
to openref...@googlegroups.com
I think this question might be better on the general OpenRefine group https://groups.google.com/forum/#!forum/openrefine rather than on the Dev group

I'm afraid not reading Korean makes this challenging to debug and I may be completely wrong, but one possibility is you aren't matching the whole string. 'match' requires the regular expression to match the whole string - essentially value.match(/.*/) is evaluated as value.match(/^.*$/). Using latin alphabet to demonstrate. Take the following strings:

abcd
1234
1234abcd
abcd1234

This is how different 'match' expressions work for this string

value.match(/([a-z].*)/)

This will find matches where the string starts with a lowercase alpha - followed by any number of any other characters

abcd -> ["abcd"]
1234 -> null
1234abcd -> null # contains lowercase alphas but doesn't start with one, so no match
abcd1234 -> ["abcd1234"]

value.match(/([a-z]*)/)

This will find matches where the string of any length that *only* contains lowercase alphas

abcd -> ["abcd"]
1234 -> null
1234abcd -> null
abcd1234 -> null #no match because string contains characters other than lowercase alphas

value.match(/(.*[a-z]+.*)/)

will match a string of any length which contains one or more lowercase alphas plus any other characters 

abcd -> ["abcd"]
1234 -> null #contains no lowercase alphas
1234abcd -> ["1234abcd"]
abcd1234 -> ["abcd1234"]

I hope this helps

Owen
Reply all
Reply to author
Forward
0 new messages