In regular expression "replace", how to use the "match" as parameters to a Function?

535 views
Skip to first unread message

Yunde Zhong

unread,
Nov 12, 2015, 4:06:59 AM11/12/15
to julia-users
Hi, All, 

I am new to Julia from Matlab. I am learning regular expression today. 

Here is an example: 

First, I'd like to high-lighten the word "goat" and "boat" in the sentence "a goat in a boat". 

x = "oat";
reg = Regex("(\\w+(?>$x))");

y = "a goat in a boat";
z = replace(y, reg, s"*\g<1>*")   

The result is: 
"a *goat* in a *boat*"

Then, I'd like to covert these two words into uppercase, that is, to get a new string of "a GOAT in a BOAT". 

I expect that Julia might have something like this:  z = replace(y, reg, s"$(uppercase(\g<1>))"). but it does not work. 

So, Here is my questions:  How to have a Function call as replacement in the regular expression? 

Thanks, 

Yunde

JobJob

unread,
Nov 12, 2015, 9:10:21 AM11/12/15
to julia-users
Does this do what you need?

julia> replace(y, reg, uppercase)

JobJob

unread,
Nov 12, 2015, 9:24:08 AM11/12/15
to julia-users
Forgot to say: welcome to Julia :)

btw you can see the different methods for a function with e.g.:
julia> methods(replace)

also to see available documentation for a function type ?, e.g.
?replace


Yunde Zhong

unread,
Nov 12, 2015, 11:30:14 AM11/12/15
to julia...@googlegroups.com
Yes, it is what I was looking for. It allows me to pass in local variables. Thanks, JobJob. 

One more questions. This replace function allows access to each match in a SubString format. Is there any function to allow the access to each match in the format of RegexMatch to exposure the "captures"?

For example: I'd like to convert time format to remove "am" and "pm". Below is one solution (to parse time string twice): 

y = "time format 02:45pm, 14:20, 03:45am"
f(x) = replace(x, r"^0\d:[0-6]\d(am|pm)"i, x->lowercase(x[end-1])=='a'?x[1:end-2]:string(parse(Int,x[1:2])+12, x[3:end-2]));
replace(y, r"([01]\d)(:[0-6]\d)(am|pm)"i, f)
# "time format 14:45, 14:20, 03:45"

If the replace function allows to access each match in RegxMatch format, then the solution could be something like this: 

replace(y, r"([01]\d)(:[0-6]\d)(am|pm)"i, x->x.capture[3]=="am"?x[1:end-2]:string(parse(Int, x.captures[1])+12, x.captures[2]))

and it only needs to parse the time string once. 

Thanks, again. Julia is great. 

Yunde



JobJob

unread,
Nov 12, 2015, 3:04:58 PM11/12/15
to julia-users
I like that idea, but it's not really available AFAICT.

The function eachmatch(::Regex, ::String) gives you an iterator over the match objects.

In any case, I had a little play with this for fun :) - so here's one way to do it:
(n.b. the function arguments don't really need to be typed, I just do it to make the code clearer to read):

replacerange(s::AbstractString, replacement::AbstractString, range::UnitRange{Int64}) = 
  s[1:range.start-1]*replacement*s[range.stop+1:end]

replacematches(s::AbstractString, r::Regex, f::Function) = 
    foldl((_s,m) -> begin
        offsetd = length(_s) - length(s) #needed because previous replaces may change the length of _s
        repl_range = (offsetd + m.offset):(offsetd + m.offset + length(m.match)-1)
        replacerange(_s, f(m), repl_range)
    end, s, eachmatch(r,s))


y = "time format 02:45pm, 14:20, 03:45am"
replacematches(y, r"([01]\d)(:[0-6]\d)(am|pm)"i, 
  x->lowercase(x.captures[3])=="am"?x.match[1:end-2]:string(parse(Int, x.captures[1])+12, x.captures[2]))

Yunde Zhong

unread,
Nov 12, 2015, 4:54:29 PM11/12/15
to julia-users
Thanks for your help. It works very well. It also provides a very good example for me to learn the "foldl" function. 

Inspired by your reply and the implementation of "replace" function defined in "...\share\julia\base\strings\util.jl", I defined a similar version for my research, which is copied below for your reference.

function replacematches(f::Function, s::AbstractString, r::Regex)
  out = IOBuffer();
  pos = foldl((_pos,m) -> begin
          write(out, SubString(s, _pos, m.offset-1)) ; 
          ns::String = f(m);
          write(out, ns);
          _pos = m.offset + length(m.match);
        end, 1, eachmatch(r,s))
  write(out, SubString(s, pos));
  takebuf_string(out);
end

y = "time format 02:45pm, 14:20, 03:45am. The End."

replacematches(y, r"([01]\d)(:[0-6]\d)(am|pm)"i) do x 
  lowercase(x.captures[3])=="am"?x.match[1:end-2]:string(parse(Int, x.captures[1])+12, x.captures[2])
end


Thanks 

Yunde 
Reply all
Reply to author
Forward
0 new messages