Preliminary Gherkin parser benchmarks

aslak hellesoy

unread,

Oct 9, 2009, 8:31:14 AM10/9/09

to cu...@googlegroups.com

Mike Sassak has been working on a benchmark suite for the new
Ragel-based Gherkin parser and Cucumber's current Treetop parser. I
just added the C Gherkin parser to the benchmark suite and the results
are fantastic. The benchmark compares the time it takes to parse
feature files. It parses 500 feature generated files with 2562
scenarios and 15298 steps.

So if any of you are in that ballpark and have experienced looong
parse times (I'm sure you have), then you have something to look
forward to. Here are the results:

                  user     system      total        real
c_gherkin:    0.020000   0.020000   0.040000 ( 0.036335)
rb_gherkin:   8.400000   0.130000   8.530000 ( 8.681294)
cucumber:    56.820000   0.590000 57.410000 ( 58.562938)
tt:          51.940000   0.500000 52.440000 ( 53.302512)

(cucumber is treetop parsing+AST building, tt is only treetop
parsing). The good news here is that building the AST only accounts
for 5 seconds, or currently 10% of the time with the current Treetop
architecture.

The c_gherkin parser is approx. 1500 times (!!) faster than the
Treetop parser, and as you can see, parsing time is almost
negligeable, even with a huge number of features, scenarios and steps.

So once we replace the treetop parser with the ragel-based gherkin
parser, parsing 500 feature files with 2562 scenarios and 15298 steps,
and then building Cucumbers AST will take approx. 5 seconds instead of
a minute.

Cucumber on JRuby is going to need a pure Java based parser, and
that's fairly easy to add. It will probably be almost as fast as the C
parser.

Thanks again Gregory and Mike for the great work so far.

Aslak

Jari Bakken

unread,

Oct 9, 2009, 8:57:05 AM10/9/09

to cu...@googlegroups.com

On Fri, Oct 9, 2009 at 2:31 PM, aslak hellesoy <aslak.h...@gmail.com> wrote:
>
> The c_gherkin parser is approx. 1500 times (!!) faster than the
> Treetop parser, and as you can see, parsing time is almost
> negligeable, even with a huge number of features, scenarios and steps.
>

Looks awesome!

Luke Melia

unread,

Oct 9, 2009, 9:29:38 AM10/9/09

to cu...@googlegroups.com

On Oct 9, 2009, at 8:31 AM, aslak hellesoy wrote:

> Mike Sassak has been working on a benchmark suite for the new
> Ragel-based Gherkin parser and Cucumber's current Treetop parser. I
> just added the C Gherkin parser to the benchmark suite and the results
> are fantastic.

> Thanks again Gregory and Mike for the great work so far.

+1! I'm stoked for this -- it will make a big difference to our
productivity. Thanks, guys!

Cheers,
Luke

--
Luke Melia
lu...@lukemelia.com
http://www.lukemelia.com/

Matt Wynne

unread,

Oct 9, 2009, 5:58:01 PM10/9/09

to cu...@googlegroups.com

On 9 Oct 2009, at 13:31, aslak hellesoy wrote:

>
> Mike Sassak has been working on a benchmark suite for the new
> Ragel-based Gherkin parser and Cucumber's current Treetop parser. I
> just added the C Gherkin parser to the benchmark suite and the results
> are fantastic. The benchmark compares the time it takes to parse
> feature files. It parses 500 feature generated files with 2562
> scenarios and 15298 steps.
>
> So if any of you are in that ballpark and have experienced looong
> parse times (I'm sure you have), then you have something to look
> forward to. Here are the results:
>
> user system total real
> c_gherkin: 0.020000 0.020000 0.040000 ( 0.036335)
> rb_gherkin: 8.400000 0.130000 8.530000 ( 8.681294)
> cucumber: 56.820000 0.590000 57.410000 ( 58.562938)
> tt: 51.940000 0.500000 52.440000 ( 53.302512)
>
> (cucumber is treetop parsing+AST building, tt is only treetop
> parsing). The good news here is that building the AST only accounts
> for 5 seconds, or currently 10% of the time with the current Treetop
> architecture.
>
> The c_gherkin parser is approx. 1500 times (!!) faster than the
> Treetop parser, and as you can see, parsing time is almost
> negligeable, even with a huge number of features, scenarios and steps.

This is really superb. Great work guys.

cheers,
Matt Wynne

http://www.songkick.com
http://blog.mattwynne.net

Mike Sassak

unread,

Oct 9, 2009, 6:26:03 PM10/9/09

to cu...@googlegroups.com

Thanks everyone. I think I should dampen some expectations on the
speed though. Because of the way the Ragel state machine works, you
can't really fake the results like we had hoped. The Ruby parser is
more or less feature-complete, however, and that's anywhere from 4 to
10 times faster depending on the Ruby interpreter used, so judging
from that I won't be surprised at all to see an increase of 40 times
for the C parser, but 1500 times is definitely a red-herring. :-)

Whatever the case, the difference is going to be quite noticeable on
large feature suites.

Reply all

Reply to author

Forward