Array of match with two string

32 views
Skip to first unread message

Hosioneh

unread,
Aug 10, 2018, 2:14:14 PM8/10/18
to OpenRefine

I have a cell like this

The brown cat and white cat are over there

and i want to use match function + regex to break it from first (cat) which is brown cat.
I used value.match(/.*(cat)(.*)/) function but it will find the white cat and break it from there.


1. Need a match + regex for this array
"The brown" "cat" "and white cat are over there"

2. Need a function of match and regex for this one either
"The brown" "cat" "and white" "cat" "are over there"

Thad Guidry

unread,
Aug 10, 2018, 3:30:21 PM8/10/18
to openr...@googlegroups.com
Use our new find() function instead.  Available in OpenRefine 3.0


--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--

Hosioneh

unread,
Aug 10, 2018, 3:45:50 PM8/10/18
to OpenRefine
i installed v3.0 and used this 
value.find(/(.*)(cat)(.*)/)

no success :(
Message has been deleted

aurielle perlmann

unread,
Aug 10, 2018, 3:57:39 PM8/10/18
to OpenRefine
not sure if I replied correctly but this worked for me: 
value.replace(/(cat)/, ';$1;').split(';')

Hosioneh

unread,
Aug 10, 2018, 4:02:19 PM8/10/18
to OpenRefine
Thank you

John Little

unread,
Aug 10, 2018, 4:06:32 PM8/10/18
to openr...@googlegroups.com
I like that approach, Aurielle. (I like the find function too, but I also couldn't get it to work for this problem.)  Anyway, I'm not sure why but I always find value.match regex harder to use than value.replace with regex.

For example, I think I understand this problem, using match, has something to do with making regex quantifiers greedy or reluctant.  I'd be happy to learn the match solution as well if anyone can explain it in this context.


Here were two incomplete solutions I came up with....

Adding a question mark before the first capture group mark makes the match non-greedy?  Or is it that adding a question mark after the first .* makes it non-greedy?

value.match(/.*?(cat)(.*)/)                                                     # result:    [ "cat", " and white cat are over there" ]

I also tried this and it captured one more word but I don't understand the regex any better.  
value.match(/(\w+\s)+?(cat)\s(.*)$/)                                     #  result:   [ "brown ", "cat", "and white cat are over there" ]

On Fri, Aug 10, 2018 at 3:57 PM aurielle perlmann <auri...@datafox.co> wrote:
not sure if I replied correctly but this worked for me: 
value.replace(/(cat)/, ';$1;').split(';')

Thad Guidry

unread,
Aug 10, 2018, 4:13:00 PM8/10/18
to openr...@googlegroups.com
Sorry, just re-read your actual need.

It is a split() but keeping the fragment.

On partition() we have the optional Boolean omitFragment
On split() we have the optional Boolean preserveAllTokens

What you need...

A split() that has optional Boolean preserveAllTokens

You can open a new issue for the need if this is what you want.
The functionality would then look like this...

value.split("cat", true, false)

[ "The brown ", "cat", " and white ", "cat", " are over there" ]

Thad Guidry

unread,
Aug 10, 2018, 4:36:15 PM8/10/18
to openr...@googlegroups.com
If it isn't already obvious,

In OpenRefine we decided to have useful GREL functions that didn't do too much work, but instead could be combined in various ways.
This particular use case highlights 2 function areas we already have, where # 2 we might provide additional useful functions if the community needs them and asks for them:

1. finding and matching fragments of a string

2. splitting a string by fragments

--
Reply all
Reply to author
Forward
0 new messages