question about org.apache.oro.text.regex package

57 views
Skip to first unread message

melinite

unread,
Mar 22, 2012, 4:52:30 PM3/22/12
to Railo
Doing some profiling on our production servers and it shows that
package org.apache.oro.text.regex is using between 30%-46% of
application execution time.

Specifically classes org.apache.oro.text.regex.Perl5Matcher and
org.apache.oro.text.regex.opCode.

Any specific reason for this? What internal uses do these classes have
inside the railo engine besides obvious regex functionality?

Are query of queries involved?

Our production environment performs well but wanted to do some
profiling.

melinite

unread,
Mar 22, 2012, 4:54:30 PM3/22/12
to Railo
and reason for perl5matcher vs using a native java regex call?

Peter Boughton

unread,
Mar 22, 2012, 7:40:57 PM3/22/12
to ra...@googlegroups.com
melinite wrote:
> Any specific reason for this? What internal uses do these classes have
> inside the railo engine besides obvious regex functionality?

I'm not sure of anything specific, but you can of course browse or
download the Railo source code and have a look yourself. :)

Query of query code starts here:
https://github.com/getrailo/railo/blob/master/railo-java/railo-core/src/railo/runtime/tag/Query.java#L445

Seems to just pass it straight to HSQL, so unless that uses it, I
suspect that's a red-herring.


Have you already checked/ruled out if this is simply a single badly
performing regex?

It is possible for some seemingly innocent patterns to cause
exponentially bad performance, so if you can narrow down any of that
time to one or more refind/rematch/rereplace calls then feel free to
post the regex and see if it can be improved.

> and reason for perl5matcher vs using a native java regex call?

Compatibility with ACF.

(In fact, I suspect it's ultimately backwards compatibility with CF5,
which wasn't Java-based.)


If the regex engine was simply switched, any rereplace calls with
backreferences would break, because the native java regex
(java.util.regex/JUR) uses $1 instead of \1 for backreferences.

Also, Apache ORO supports \u,\l,\U,\L,\E constructs (which allow
uppercasing/lowercasing of specific characters/text) and JUR doesn't
support those (nor even $u equivalents), so that's another issue.


If you're in a position to fork your code and experiment, I have
created a set of functions that wrap JUR (and provide some extra
functionality besides), which you can setup in Railo as if they were
built-in functions. (I haven't packaged as a Railo Extension, but
installation is just copying a few files.) Anyway, if you're
interested, see cfregex.net

Unfortunately you can't just do a drop-in replacement, since I also
took the opportunity to make the API more consistent/useful than the
one CF provides, but if it's not a specific regex problem, or you want
some of the improvements JUR allows (atomic grouping and possessive
quantifiers are useful for improving performance), then it might be
worth trying out.

Michael Offner

unread,
Mar 23, 2012, 5:05:45 AM3/23/12
to ra...@googlegroups.com
Peter already give a great explanation for everything.
the apache oro package is just used for the re... functions, nothing else.
we would love to get rid of those and use the java regex stuff instead, but like peter is writing this is not possible, because the re... function use the perl regex dialect.
perhaps we could add a setting in the railo admin for this
dialect used for regex:
(x) perl (CFML default)
( )  java

/micha



2012/3/22 melinite <meli...@gmail.com>

melinite

unread,
Mar 23, 2012, 8:44:42 AM3/23/12
to Railo
We do use ReMatch quite often in our content validator regexes.

I will try to refactor those to utilize JUR and maybe look at some of
Peter's stuff as well.
I will post some profiling statistics.

Again I want to stress to those reading this thread that this is not
an issue with railo. Our production environment performs extremely
well on railo.
Just doing some internal performance metrics.

thanks.



On Mar 23, 5:05 am, Michael Offner <mich...@getrailo.com> wrote:
> Peter already give a great explanation for everything.
> the apache oro package is just used for the re... functions, nothing else.
> we would love to get rid of those and use the java regex stuff instead, but
> like peter is writing this is not possible, because the re... function use
> the perl regex dialect.
> perhaps we could add a setting in the railo admin for this
> *dialect used for regex:*
> *(x) perl (CFML default)*
> *( )  java*
>
> /micha
>
> 2012/3/22 melinite <melin...@gmail.com>

melinite

unread,
Mar 23, 2012, 10:34:31 AM3/23/12
to Railo
Ok just confirmed -

java.util.regex.pattern blows away ReMatch, Rematchnocase

on our dev box
a loop of a thousand times utilizing the following runs about 200ms:

***************************************
variables.pattern = createObject(
"java"
, "java.util.regex.Pattern"
).compile('\[\$SURVEY\:(.*?)\$\]');

variables.matcher = variables.pattern.matcher(variables.thisemail);
while(variables.matcher.find()) {

arrayappend(variables.thisregex,variables.matcher.group(1));


};
***************************************

vs

2000ms for

*********************
variables.thisregex=
REMatchNoCase('\[\$SURVEY\:(.*?)\$\]',variables.thisemail);
*********************


looks like we are going to drop rematch in favor of the JUR.
Again I want to state this is purely a micro performance tweak. Actual
usage real world usage of Rematch should not be noticeable.

Igal

unread,
Mar 23, 2012, 12:29:57 PM3/23/12
to ra...@googlegroups.com
+1 on adding the setting option to choose Java regex dialect.
Reply all
Reply to author
Forward
0 new messages