Perl uses the g flag to request a match that resumes where the last
match left off. This functionality is provided implicitly by the
Matcher class: Repeated invocations of the find method will resume
where the last match left off, unless the matcher is reset.
Remember that match() does output an Array currently.
So... can you provide your use case pattern(s) and the kind of Regex
that you are looking for matching ?
Is the issue close to this use case and does the answer help somewhat
https://github.com/OpenRefine/OpenRefine/issues/647 ?
HOWEVER:
If you just want to get some work done FAST --
Then my suggestion is just to use Jython as your expression language
in OpenRefine for this...and perhaps using re.search() instead:
http://www.jython.org/docs/library/re.html#search-vs-match But also make sure to read though that whole Jython reference document
to see if you can make it work for you.
The basic wiring for Regex functions with Jython as your expression
language will look something like this:
import re
g = re.search(ur"\u2014 (.*),\s*BWV", value)
return g.group(1)
> --
> You received this message because you are subscribed to the Google Groups
> "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to openrefine+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
--
-Thad
+ThadGuidry
Thad on LinkedIn
andy
unread,
Sep 30, 2014, 5:15:38 PM9/30/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to openr...@googlegroups.com
Hi,
and thank you.
On Tue, Sep 30, 2014 at 6:12 PM, Thad Guidry <thadg...@gmail.com> wrote:
Remember that match() does output an Array currently.
I know that it's an array but I have always length = 1, than I think there is an error of mine.
So... can you provide your use case pattern(s) and the kind of Regex
that you are looking for matching ?
I'm attacching a sample project. I apply "length(value.match(/.*(<a href=".*?" id=".*?" class="dldlnk".*?>).*?/))" to the fist column and I obtain "1".
But as you can see here (the same source code) I should obtain "2".
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to openr...@googlegroups.com
There are different ways of doing this - one is to split the string into an array, and then use the match statement against each item in the array - something like:
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to openr...@googlegroups.com
Hi,
On Tue, Sep 30, 2014 at 11:40 PM, Owen Stephens <ow...@ostephens.com> wrote:
Hope this helps
I'm sure that this will help me.
I have a question for you. In the "match" documentation I read (match(string s, regexp p)):
Attempts to match the string s in its entirety against the regex pattern p and returns an array of capture groups.
Then I should obtain an array with two items with my regex. Am I wrong? And why?
Thank you and be patient :)
Thad Guidry
unread,
Sep 30, 2014, 6:10:41 PM9/30/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to openrefine
Andy,
Since your use case has some HTML, I thought you might want to know
about this, just in case your are not aware of the built-in HTML GREL
functions...
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to openrefine
>
> I have a question for you. In the "match" documentation I read (match(string
> s, regexp p)):
>>
>> Attempts to match the string s in its entirety against the regex pattern p
>> and returns an array of capture groups.
>
>
> Then I should obtain an array with two items with my regex. Am I wrong? And
> why?
would capture two groups. If you know that 'value' will only ever contain two relevant groupings then this would do the job. However, you can't do this when you don't know how many times the relevant grouping will be repeated. I don't think 'match' is the right tool for capturing an arbitrary number of repeated groupings.
Thad's suggestion of using parseHtml is an excellent one - you may be able to do what you need with something simple like:
value.parseHtml().select("a.dldlnk")
Thad Guidry
unread,
Sep 30, 2014, 7:11:07 PM9/30/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to openrefine
Andy,
You are not iterating over the instances of each <a> link for example.
Try looking at how this GREL expression works and copy and paste it
and play with it...
> --
> You received this message because you are subscribed to the Google Groups
> "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to openrefine+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
andy
unread,
Oct 2, 2014, 5:38:24 AM10/2/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message