Performance test feedback

795 views
Skip to first unread message

jj...@yottaa.com

unread,
Jul 26, 2015, 10:42:01 PM7/26/15
to re2j-discuss
Hi,

I am trying to find a highperformance java regular expression, RE2j Seems should be a good option.

I then ran a very simple *test*.


         @Test


        public void testRe() throws InterruptedException{


                Pattern re = Pattern.compile(".*(d|e|cart|jinjian|.*)");

               for(int i = 0 ; i < 10000 ; i++)

                  re.matches("aajinjianaksdjfl");

               System.gc();

               Thread.sleep(5000);

               boolean result = false;


               long s = System.nanoTime();

               for(int i = 0 ; i < 1000000 ; i++)

                       result = re.matches("aajinjianaksdjflaajinjianaksdjflaajinjianaksdjfl");

               long e = System.nanoTime();

               
               System.out.println( TimeUnit.NANOSECONDS.toMillis( e - s) );

               System.out.println(result);



        }


 I ran this code by using Java Regex and Re2j.

Java version only takes 220ms, but re2j need 14000 ms. 

Is my test here wrong? 

Russ Harmon

unread,
Jul 27, 2015, 12:23:07 PM7/27/15
to jj...@yottaa.com, re2j-discuss
Just a guess (someone more familiar with re2j's internals may know better), but your regex ".*(d|e|cart|jinjian|.*)" will always match, since the regex engine can ignore the entire parenthesized group via the second ".*". You should try moving it out of the parenthesized group.

--
You received this message because you are subscribed to the Google Groups "re2j-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to re2j-discuss...@googlegroups.com.
To post to this group, send email to re2j-d...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/re2j-discuss/62f5768e-60c6-48ed-9cfa-bb9b3f6e85b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jj...@yottaa.com

unread,
Jul 27, 2015, 12:51:25 PM7/27/15
to re2j-discuss, eatnu...@google.com
Hi,

I changed, but the re2j is still much slower than the JDK. (Java 8)

Alan Donovan

unread,
Jul 27, 2015, 5:25:31 PM7/27/15
to jj...@yottaa.com, re2j-discuss, eatnu...@google.com
The Java implementation may be slower for many inputs, but it shouldn't be this much slower than the same algorithm written in Go, which takes only 1.1s (Go 1.4.1) or 2.1s (1.5rc2).

Thanks for reporting this.  I'll take a look.


jj...@yottaa.com

unread,
Jul 28, 2015, 10:43:19 AM7/28/15
to re2j-discuss, jj...@yottaa.com
Update: my previous testing result have a type, the re2j takes 1400ms.

Here is the updated version:

pattern is :  ".*d|e|cart|jinjian|kk.*"

String is: "aajinjianaksdjflaajinjianaksdjflaajinjianaksdjfl"

Java takes: 330ms
re2j  takes: 4257ms

Thanks for your time to take a look at it.

bba...@gmail.com

unread,
Sep 5, 2015, 1:00:18 PM9/5/15
to re2j-discuss, jj...@yottaa.com
Hi, I was also able to reproduce the issue.
Though I'm not sure is this a symptom of an edge case in regular expression, or maybe a bug in the library.


Full code, to reproduce it:


```
@State(Scope.Benchmark)
public class JmxCollectorBenchmark {

    @Benchmark
    @BenchmarkMode({Mode.AverageTime})
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public void testRe() throws InterruptedException {

        com.google.re2j.Pattern re = com.google.re2j.Pattern.compile(".*d|e|cart|jinjian|kk.*");

        for (int i = 0; i < 10000; i++) {
            re.matches("aajinjianaksdjfl");
        }

        for (int i = 0; i < 100000; i++) {
            re.matches("aajinjianaksdjflaajinjianaksdjflaajinjianaksdjfl");
        }



    }
    
    @Benchmark
    @BenchmarkMode({Mode.AverageTime})
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public void testJavaRegex() throws InterruptedException {

        java.util.regex.Pattern re = java.util.regex.Pattern.compile(".*d|e|cart|jinjian|kk.*");

        for (int i = 0; i < 10000; i++) {
            re.matcher("aajinjianaksdjfl").matches();
        }

        for (int i = 0; i < 100000; i++) {
            re.matcher("aajinjianaksdjflaajinjianaksdjflaajinjianaksdjfl").matches();
        }


    }
}

```

brian....@gmail.com

unread,
Sep 16, 2015, 5:08:26 AM9/16/15
to re2j-discuss, jj...@yottaa.com
I've opened https://github.com/google/re2j/issues/12 to track this.

Brian


On Monday, 27 July 2015 03:42:01 UTC+1, jj...@yottaa.com wrote:

Felix Lee

unread,
Oct 12, 2015, 7:15:35 PM10/12/15
to re2j-discuss
Is anyone working on this? I've been doing some re2j vs java.util.regex comparisons, and there seem to be a few obvious points of attack:

* j.u.regex will sometimes use Boyer-Moore search. I think that doesn't apply to this particular pattern, but it's a big win when it can be used.

* re2j does a lot more allocs. I haven't narrowed this down yet.

* in my experiments, re2j seems to be spending most of its time in System.arrayCopy. This is copying the capture bounds in the NFA cursors, and it happens even if there are no () in the regex, since it's also used to capture group 0 (the bounds of the whole match). I suspect this is unnecessary work, since in a typical NFA there are only a few points where capture bounds get updated, so using shared immutable capture bounds might be a big win.

Alan Donovan

unread,
Oct 12, 2015, 7:21:38 PM10/12/15
to Felix Lee, re2j-discuss
I spent a day or so trying to profile the inner loop.  Long story short, I did notice two things: (1) the Java profiling tools are terrible, and (2) the allocation-free hot trace really is 15-40x slower than the equivalent Go program.  I dropped this when I went on vacation and haven't touched it since.  RE2/J is very low on my list of priorities since I never even use Java any more.  Any volunteers to take over this package from me? :)



--
You received this message because you are subscribed to the Google Groups "re2j-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to re2j-discuss...@googlegroups.com.
To post to this group, send email to re2j-d...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages