ngx.re.match for multiline string returns null

280 views
Skip to first unread message

RJoshi

unread,
Nov 30, 2015, 10:12:32 AM11/30/15
to openresty-en
Hello,
  I am trying to use ngx.re.match for multi-line string returns null. It works fine with single line string.  I tried using options "m" but doesn't help

### Single Line ###
local s="\n\n<key>abcd</key>fdfdsds"
local m=ngx.re.match(s,"<key>(.*)</key>", "jo")  

print(m[1])
-- abcd


###Multi Line###
but if I change string multiline, it returns null

local s=[[ \n\n<key>abcd
 </key>fdfdsds]]

local m=ngx.re.match(s,"<key>(.*)</key>", "jom")  
print(m[1])
-- m is null

Lord Nynex

unread,
Nov 30, 2015, 4:23:12 PM11/30/15
to openre...@googlegroups.com
Hello,

You should probably not use a greedy regex for this. It is inefficient and can have undesired results. I also recommend using a regex coach on the data first. I use https://chrome.google.com/webstore/detail/regexr/igncokdengclanbghanhfichlnhlpihl?hl=en

Make sure you have a build of PCRE that has jit enabled. 
I don't think there is any performance benefit to jit compiling this greedy regex. 
If this regex runs in a buffered context you will need to assemble the full buffer before matching. 

Here is a more efficient working example

$ resty -e '
local s=[[ \n\n<key>abcd
 </key>fdfdsds]]
local m=ngx.re.match(s,[[<key>\s*(.+?)\s*</key>]], "m")
print(m[1])
'
abcd

--
You received this message because you are subscribed to the Google Groups "openresty-en" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openresty-en...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

RJoshi

unread,
Nov 30, 2015, 5:19:00 PM11/30/15
to openresty-en
Thanks for your help.  Yes, I have PCRE jit enabled as  I am using  ./configure --with-pcre-jit

 I have a soap request body from which I am trying to extract value for three different elements.   

e.g.  Each element could be multiline string
<key>abc</key>
<msg-id>1234</msgid>
<body>xxxxx</body>

Currently, I am using string:find  function three times but wanted to make it efficient.

 local token_start, token_end, key = request_body:find('<key>(.*)</key>')
 local token_start, token_end, msg_id = request_body:find('<msg-id>(.*)</msg-id>', token_end)
 local token_start, token_end, body = request_body:find('<body>(.*)</body>', token_end)

Below works fine for single line  but doesn't work with multi-line.

local m=ngx.re.match(s,"<key>(.*)</key>.*<msg-id>(.*)</msg-id>.*<body>(.*)</body>", "jo")
print (m[1] .. m[2] .. m[3])

Your suggestion work fine for single match. How can I match all three elements?

Ming

unread,
Nov 30, 2015, 9:19:00 PM11/30/15
to openre...@googlegroups.com
"." means any character except newline. so  ".*"  can not match new lines.

 look this:

MingdeMac-mini:bin ming$ ./resty -e '
> local s = [[\n\n<key>abdc
> </key>asdasdasd]]
> local m = ngx.re.match(s, [[<key>([\S\s]*)</key>]],"jom")
> print (m[1])
> '
abdc

you can use http://regexpal.com/ to test your regular expression before write code.

RJoshi

unread,
Dec 1, 2015, 7:22:07 AM12/1/15
to openresty-en
Perfect.  Thanks for your help.  I am able to exctract all three tokens with this.

a=ngx.re.match(s,[[<key>([\S\s]*)</key>[\S\s]*<msg-id>([\S\s]*)</msg-id>[\S\s]*<body>([\S\s]*)</body>]], "jom")
print(a[1] .. "|" .. a[2] .. "|" .. a[3])

Maanas Royy

unread,
Dec 1, 2015, 9:25:37 PM12/1/15
to openresty-en

We can achieve the same using lua string.find as well. What is more optimal solution using ngx.re.match or string.find of lua? Any suggestions.

RJoshi

unread,
Dec 2, 2015, 11:11:38 PM12/2/15
to openresty-en
ngx.re.match with "jo" option is significantly faster than string.match but ngx.re.match without "jo" option seem to be slower.

 e.g in below test, ngx.re.match with "jo" takes 3 second vs 10 seconds for string.match on my laptop.

local ngx_re_match = ngx.re.match
 local string_match = string.match
 local s=[[\n5463463463565fhfghfghfgdhfhgfhhdfg\n<key>abcd</key>fdfdsdsfgdfhhfghgfdhgfhfgh<msgid>1234</msgid>4656456645366456464363466dfdfd <v1:Cmd                         Name=\"1\">dsfdsfsdfsfsf</v1:Cmd>]]
 local regex = "<key>(.*)</key>.*<msgid>(.*)</msgid>.*<v1:Cmd.*>(.*)</v1:Cmd>"
 local regex_jo  = "jo"
 local count = 500000
 local t1= os.time()
 for i = 1, count do
 --print(ngx_re_match(s,"<key>(.*)</key>.*<msgid>(.*)</msgid>.*<v1:Cmd.*>(.*)</v1:Cmd>","jo"))
 local a =ngx_re_match(s,regex, regex_jo)
 end
 print(os.time()-t1)

 local t1= os.time()
 for i = 1, count do
 --print(string.match(s,regex))
 local a = string_match(s,regex, 1, regex_jo)
 end
 print(os.time()-t1)

Yichun Zhang (agentzh)

unread,
Dec 4, 2015, 1:26:13 AM12/4/15
to openresty-en
Hello!

On Thu, Dec 3, 2015 at 12:11 PM, RJoshi wrote:
> e.g in below test, ngx.re.match with "jo" takes 3 second vs 10 seconds for
> string.match on my laptop.
>

Well, using lua-resty-core can make things even faster if you are not already.

See https://github.com/openresty/lua-resty-core

Regards,
-agentzh
Reply all
Reply to author
Forward
0 new messages