Gherkin Ragel parser - ready to be ported to C# and used in SpecFlow

741 views
Skip to first unread message

aslak hellesoy

unread,
Nov 30, 2009, 8:27:22 PM11/30/09
to specflow, ghnatiuk, msassak
Hi all,

The Gherkin project (http://github.com/aslakhellesoy/gherkin) is
nearing a 1.0 release. This is a super fast lexer/parser that supports
almost 40 i18n languages. (It parses 500 feature files with 15.000
steps in less than a second using the C version). Cucumber will soon
be using Gherkin, and I'm inviting SpecFlow to do the same.

Currently the lexer is available in Ruby, C and Java, and the Parser
in Ruby and Java. In other words - it should be fairly easy to port
the Java code to C# and we'll have a pure C# lexer and parser without
any dependencies on Ruby.

Ragel needs some glue code for each target language. The Java one is here:
http://github.com/aslakhellesoy/gherkin/blob/master/ragel/lexer.java.rl.erb
This should be fairly easy to port to C#.

Once the C# lexer is ported, building it is just a matter of adding
some ragel -J to
http://github.com/aslakhellesoy/gherkin/blob/master/tasks/ragel_task.rb

and an extra :msbuild or :nant target to
http://github.com/aslakhellesoy/gherkin/blob/master/tasks/compile.rake
This target would compile the generated C# sources and create dll(s).

The Java parser is here:
http://github.com/aslakhellesoy/gherkin/blob/master/java/src/gherkin/Parser.java
A parser instance is passed to the constructor of a lexer instance,
and the lexer fires events to the parser while scanning.

The parser's state machine is based on a state transition table:
http://github.com/aslakhellesoy/gherkin/blob/master/lib/gherkin/parser/root.txt

When a parser is initialized, it reads that table (using an internal
lexer instance actually) and sets up an internal state machine. This
statemachine will validate that events from the lexer occur in the
right order, and raise an exception if they don't (parse error).

Is this something you would be interested in porting to C# and use in
SpecFlow? I think it would be a great way to have better interop
between Cucumber and SpecFlow. (I have also started on the new
cukes.info website, and the plan is to let the user select a
programming language, and have all of the code snippets on the site be
in the selected language - including C#).

If you have any questions about this I'd be glad to help.

Aslak

Gáspár Nagy

unread,
Dec 15, 2009, 10:24:54 AM12/15/09
to SpecFlow
Hi,

Thanks for the info. We have started to work on the parser
integration. We will update you and the community here about the
progress.

Br,
Gaspar

On Dec 1, 2:27 am, aslak hellesoy <aslak.helle...@gmail.com> wrote:
> Hi all,
>
> The Gherkin project (http://github.com/aslakhellesoy/gherkin) is
> nearing a 1.0 release. This is a super fast lexer/parser that supports
> almost 40 i18n languages. (It parses 500 feature files with 15.000
> steps in less than a second using the C version). Cucumber will soon
> be using Gherkin, and I'm inviting SpecFlow to do the same.
>
> Currently the lexer is available in Ruby, C and Java, and the Parser
> in Ruby and Java. In other words - it should be fairly easy to port
> the Java code to C# and we'll have a pure C# lexer and parser without
> any dependencies on Ruby.
>
> Ragel needs some glue code for each target language. The Java one is here:http://github.com/aslakhellesoy/gherkin/blob/master/ragel/lexer.java....
> This should be fairly easy to port to C#.
>
> Once the C# lexer is ported, building it is just a matter of adding
> some ragel -J tohttp://github.com/aslakhellesoy/gherkin/blob/master/tasks/ragel_task.rb
>
> and an extra :msbuild or :nant target tohttp://github.com/aslakhellesoy/gherkin/blob/master/tasks/compile.rake
> This target would compile the generated C# sources and create dll(s).
>
> The Java parser is here:http://github.com/aslakhellesoy/gherkin/blob/master/java/src/gherkin/...
> A parser instance is passed to the constructor of a lexer instance,
> and the lexer fires events to the parser while scanning.
>
> The parser's state machine is based on a state transition table:http://github.com/aslakhellesoy/gherkin/blob/master/lib/gherkin/parse...

aslak hellesoy

unread,
Dec 15, 2009, 10:28:14 AM12/15/09
to specflow
> Hi,
>
> Thanks for the info. We have started to work on the parser
> integration. We will update you and the community here about the
> progress.
>

Excellent! Will you do this in a fork of the gherkin project? Just ask
here on the specflow list if you have any questions about the gherkin
codebase.

Aslak

Gáspár Nagy

unread,
Dec 15, 2009, 10:36:03 AM12/15/09
to SpecFlow
> Excellent! Will you do this in a fork of the gherkin project? Just ask
> here on the specflow list if you have any questions about the gherkin
> codebase.
>
> Aslak

OK/Thx.

sztupi

unread,
Jan 26, 2010, 11:56:46 AM1/26/10
to SpecFlow
hi,

the first version of the c# port is done, forked and checked in as
http://github.com/techtalk/gherkin/tree/dotnet-port. Included are c#
core files and the c# template (of course), some changes in the rake
files to generate the language-specific lexers, and also the tests
ported from the lexer_spec.rb to .net (all of them pass).

Of course, there're quite a few tasks remaining to be done for a full
port and to let the parser to be integrated into specflow. Some of
these I'd like to discuss first before I move on, and I couldn't find
any groups for the gherkin development itself (I didn't want to 'spam'
the cucumber group for development issues, as far as I see that is
more for the end-users, and the same applies for this one too). Any
suggestion where should we move those discussions?

thx,
sztupi

On Dec 15 2009, 4:28 pm, aslak hellesoy <aslak.helle...@gmail.com>
wrote:

Aslak Hellesøy

unread,
Jan 26, 2010, 1:58:11 PM1/26/10
to SpecFlow, Gregory Hnatiuk, Mike Sassak

On Jan 26, 5:56 pm, sztupi <attila.sztu...@techtalk.at> wrote:
> hi,
>

> the first version of the c# port is done, forked and checked in ashttp://github.com/techtalk/gherkin/tree/dotnet-port. Included are c#


> core files and the c# template (of course), some changes in the rake
> files to generate the language-specific lexers, and also the tests
> ported from the lexer_spec.rb to .net (all of them pass).
>

Excellent work!! I haven't tried out the code yet, just looked at the
diff. I hope the Java port was helpful (looks like it was). I saw you
ported a lot of tests. As you have probably noticed, there are a lot
of RSpec specs and Cucumber features. For simpler maintenance, do you
think it would be possible to run these on IronRuby?

I'll try to get this running on OS X/Mono. Do you have any suggestions
for packaging and releasing? I would like to have a scripted release
process that makes it easy to release the .NET .dll when I make a
release of the gems (and in the future - a pure Java release). Is it
time to put up a website that can host releases? Or is it good enough
to commit binaries to Git and have a direct download link to the .dll
in the Wiki?

> Of course, there're quite a few tasks remaining to be done for a full
> port and to let the parser to be integrated into specflow. Some of
> these I'd like to discuss first before I move on, and I couldn't find
> any groups for the gherkin development itself (I didn't want to 'spam'
> the cucumber group for development issues, as far as I see that is
> more for the end-users, and the same applies for this one too). Any
> suggestion where should we move those discussions?
>

I have copied Mike and Greg who wrote most of Gherkin. I'm not sure if
they are on this list, but I am. Just keep Gherkin/SpecFlow
discussions on this list for the time being. If there is something
more general to be discussed (changes to the grammar or other non-
SpecFlow related issues) then the Cucumber list is the best list to
use.

Cheers,
Aslak

Mike Sassak

unread,
Jan 26, 2010, 3:02:53 PM1/26/10
to SpecFlow
Great news about the C# port! Both Greg and I are on the SpecFlow ML, so you can assume we're following along.

Mike

sztupi

unread,
Jan 28, 2010, 11:42:31 AM1/28/10
to SpecFlow
hi,

In the meanwhile I've managed to port all the specs & implementation
(inlcuding the I18NLexer and Parser), and added support for invoking
the build and unit tests from rake.
For the build support, I'm using the albacore gem. I don't know
whether that'd help you in porting it to Mono. I'd be quite glad if
someone could check if I didn't introduce any problem (even if your're
not building for .NET) in a non-Windows environment (and maybe update
the dependencies in the gemspec file with which I didn't dare to
touch :))

I've also found a bug that affects (at least) the ruby parser too: if
a step line ends with CRLF, the whole step'll be pushed to the
listener twice (just change the input of the first Background spec to
end with '\r\n' to reproduce it). Someone with more experience with
ragel may have a look on it.

> Excellent work!! I haven't tried out the code yet, just looked at the
> diff. I hope the Java port was helpful (looks like it was). I saw you
> ported a lot of tests. As you have probably noticed, there are a lot
> of RSpec specs and Cucumber features. For simpler maintenance, do you
> think it would be possible to run these on IronRuby?

Yes, I considered reusing the specs/features instead of porting them,
just the last time I've tried to work with IR, it was quite slow and
painful to debug anything that run through it, so I decided to go for
the easier path for now, but generally I agree that later we should
migrate it to IR.

> I'll try to get this running on OS X/Mono. Do you have any suggestions
> for packaging and releasing? I would like to have a scripted release
> process that makes it easy to release the .NET .dll when I make a
> release of the gems (and in the future - a pure Java release). Is it
> time to put up a website that can host releases? Or is it good enough
> to commit binaries to Git and have a direct download link to the .dll
> in the Wiki?

Personally I'd prefer to set a project site where we'd store the
releases, storing it in the source repository feels a bit... unclean
for me. You anyway mentioned that having a separate site for
documenting Gherkin itself might be a good idea later, It could go to
the same place.

br,
sztupi

Gregory Hnatiuk

unread,
Jan 28, 2010, 5:19:01 PM1/28/10
to spec...@googlegroups.com
On Thu, Jan 28, 2010 at 11:42 AM, sztupi <attila....@techtalk.at> wrote:
hi,

In the meanwhile I've managed to port all the specs & implementation
(inlcuding the I18NLexer and Parser), and added support for invoking
the build and unit tests from rake.
For the build support, I'm using the albacore gem. I don't know
whether that'd help you in porting it to Mono. I'd be quite glad if
someone could check if I didn't introduce any problem (even if your're
not building for .NET) in a non-Windows environment (and maybe update
the dependencies in the gemspec file with which I didn't dare to
touch :))

I've also found a bug that affects (at least) the ruby parser too: if
a step line ends with CRLF, the whole step'll be pushed to the
listener twice (just change the input of the first Background spec to
end with '\r\n' to reproduce it). Someone with more experience with
ragel may have a look on it.

Thanks for pointing this one out.  We're taking a look at it and writing some tests to expose the extent of \r\n messing with parsing (it's definitely more than just Steps).

Greg

aslak hellesoy

unread,
Feb 3, 2010, 8:31:50 AM2/3/10
to specflow
On Thu, Jan 28, 2010 at 5:42 PM, sztupi <attila....@techtalk.at> wrote:
hi,

In the meanwhile I've managed to port all the specs & implementation
(inlcuding the I18NLexer and Parser), and added support for invoking
the build and unit tests from rake.
For the build support, I'm using the albacore gem. I don't know
whether that'd help you in porting it to Mono.

In order to be able to "rake dotnet" on OSX/Linux+Mono, I think we need to (monkey)patch albacore so that it can invoke "mono xbuild" instead of the hardcoded MSBuild.exe. I'll see if I can make that work. Not sure if any additional porting is needed: http://www.mono-project.com/Microsoft.Build

I'll close the 2 .NET related issues in the gherkin issue tracker and register new ones as i come across them.

Cheers,
Aslak
 

aslak hellesoy

unread,
Feb 3, 2010, 8:33:36 AM2/3/10
to specflow


2010/2/3 aslak hellesoy <aslak.h...@gmail.com>



On Thu, Jan 28, 2010 at 5:42 PM, sztupi <attila....@techtalk.at> wrote:
hi,

In the meanwhile I've managed to port all the specs & implementation
(inlcuding the I18NLexer and Parser), and added support for invoking
the build and unit tests from rake.
For the build support, I'm using the albacore gem. I don't know
whether that'd help you in porting it to Mono.

In order to be able to "rake dotnet" on OSX/Linux+Mono, I think we need to (monkey)patch albacore so that it can invoke "mono xbuild" instead of the hardcoded MSBuild.exe. I'll see if I can make that work. Not sure if any additional porting is needed: http://www.mono-project.com/Microsoft.Build


If I run into problems with xbuild, are you open to moving to Nant? (Might be easier to get workig on Mono)

Aslak
 

aslak hellesoy

unread,
Feb 4, 2010, 9:48:00 AM2/4/10
to specflow
On Wed, Feb 3, 2010 at 2:33 PM, aslak hellesoy <aslak.h...@gmail.com> wrote:


2010/2/3 aslak hellesoy <aslak.h...@gmail.com>



On Thu, Jan 28, 2010 at 5:42 PM, sztupi <attila....@techtalk.at> wrote:
hi,

In the meanwhile I've managed to port all the specs & implementation
(inlcuding the I18NLexer and Parser), and added support for invoking
the build and unit tests from rake.
For the build support, I'm using the albacore gem. I don't know
whether that'd help you in porting it to Mono.

In order to be able to "rake dotnet" on OSX/Linux+Mono, I think we need to (monkey)patch albacore so that it can invoke "mono xbuild" instead of the hardcoded MSBuild.exe. I'll see if I can make that work. Not sure if any additional porting is needed: http://www.mono-project.com/Microsoft.Build


If I run into problems with xbuild, are you open to moving to Nant? (Might be easier to get workig on Mono)


I have merged in your branch, sztupi. I did a couple of quick hacks to compile.rake to be able to build on OS X/Mono. The only thing that fails for me is running the NUnit tests: http://github.com/aslakhellesoy/gherkin/issues/issue/44

Any chance you could take a look at that?

I have started to tag issues, so you can keep an eye on C# related ones here: http://github.com/aslakhellesoy/gherkin/issues/labels/c%23

Aslak
 

sztupi

unread,
Feb 5, 2010, 10:26:21 AM2/5/10
to SpecFlow
hi,

I've merged your changes and checked the mono build on linux, actually
it seems that monobuild handles embedded resources differently, but I
think I've fixed them now. At least I could make all the test run, and
just pushed the changes back to our repo - please try whether it's
working on OSX as well.

thx,
sztupi

On Feb 4, 3:48 pm, aslak hellesoy <aslak.helle...@gmail.com> wrote:
> On Wed, Feb 3, 2010 at 2:33 PM, aslak hellesoy <aslak.helle...@gmail.com>wrote:
>
>
>
>
>
>
>
> > 2010/2/3 aslak hellesoy <aslak.helle...@gmail.com>

aslak hellesoy

unread,
Feb 5, 2010, 10:57:09 AM2/5/10
to specflow
On Fri, Feb 5, 2010 at 4:26 PM, sztupi <attila....@techtalk.at> wrote:
hi,

I've merged your changes and checked the mono build on linux, actually
it seems that monobuild handles embedded resources differently, but I
think I've fixed them now. At least I could make all the test run, and
just pushed the changes back to our repo - please try whether it's
working on OSX as well.


Thanks sztupi. All NUnit tests passing on OS X. Pushed.

I guess the last thing now is to run the Cucumber and RSpec suite against the .NET port (through IronRuby) just to make sure there are no holes.
(We've made some changes to crlf/lf detection lately, and still have a few corner cases to fix). I'll make a separate ticket for that.

Cheers,
Aslak
 

László Zabb (Vari)

unread,
Feb 11, 2010, 4:31:44 AM2/11/10
to SpecFlow
Hi!

I have a question.
I see so, that if I made a syntactic mistake (mistype), than I get a
really common error message like:
"Lexing error on line X" and the parsing stopped.

Two thing what I want to know:

1) Is it possible, to get some error message with more information?
2) Is it possible, to let the parser go forward if there was an error
and don't stop?

Is there another port Java, Rubi etc, where it is better?

Br,
(Vari)

On Feb 5, 4:57 pm, aslak hellesoy <aslak.helle...@gmail.com> wrote:

aslak hellesoy

unread,
Feb 11, 2010, 6:41:43 AM2/11/10
to spec...@googlegroups.com
On Thu, Feb 11, 2010 at 5:31 AM, László Zabb (Vari) <varaz...@gmail.com> wrote:
Hi!

I have a question.
I see so, that if I made a syntactic mistake (mistype), than I get a
really common error message like:
"Lexing error on line X" and the parsing stopped.

Two thing what I want to know:

1) Is it possible, to get some error message with more information?
2) Is it possible, to let the parser go forward if there was an error
and don't stop?


Since you are replying in a thread about Gherkin, I assume the question is about the Gherkin parser. (SpecFlow isn't using the Gherkin parser yet, it has its own).

Gherkin does lexing with Ragel and parsing with a custom parser based on a state transition table. During lexing, only keywords are recognised (Given, When, Then, Scenario, Examples etc). If the lexer finds a keyword that is misspelt it will just tell you the location and exit.

The parsing happens after lexing. This is to ensure that keywords come in the right order. For example, you can't have a tag before a comment, only after. You can't have a table unless a step or examples section comes before it etc.

This means you can get both lexing errors and parsing errors. Lexing errors raise an exception immediately, telling you the line number. The parser can be configured to either raise an exception or continue with an error message.
 
Is there another port Java, Rubi etc, where it is better?


The Gherkin code can be built for pure ruby, ruby with a fast C extension, pure java or pure C#. Which one is better depends on what you're trying to accomplish. Are you writing a tool that will parse feature files?

Aslak

László Zabb (Vari)

unread,
Feb 12, 2010, 5:55:15 AM2/12/10
to SpecFlow
Hi!

Thanks for the answer.
That was what I want to know.

Br.
Vari

On Feb 11, 12:41 pm, aslak hellesoy <aslak.helle...@gmail.com> wrote:
> On Thu, Feb 11, 2010 at 5:31 AM, László Zabb (Vari)

> <varazslo...@gmail.com>wrote:

Gáspár Nagy

unread,
Mar 9, 2010, 9:48:57 AM3/9/10
to SpecFlow
Hello!

Had some time to finishing the gherkin parser integration to specflow
started by sztupi. I want to see if there is any breaking changes for
the specflow users if they start to use the new version with the new
parser. It's going generally well, however I have discovered tree
problematic points. Aslak, it would be great if you could comment
these.

1. Small incompatibility: @tag is not allowed now in the line of the
"Scenario" keyword.
As far as I see, the old cucumber parser was allowing tagging, like:
@mytag Scenario: Add two numbers
but the new parser does not allow this anymore. I just wanted to ask
whether this change is on purpose or a bug in the new parser (or was a
bug in the old one).
Actually I think no one used this syntax anyway so this is not really
a problematic breaking change.

2. Comment handling.
We finally realized that our parser in specflow were more tolerant for
the comments. Actually specflow allowed comments everywhere in the
file, which turned to be useful and many of our projects have used it.
The new gherkin parser allows comments only in special points and not
everywhere. E.g. you cannot comment a row of a table:
| color |
| purple | # i don't like it...
| red |
As this is used already quite extensively in our projects, this
breaking change is quite painful.
Since this is only about comments, I have written a fix that removes
all comments from the file before handing it over to the gherkin
parser and this works, but I wanted to discuss this with you. Do you
have any plans to make the comment handling more tolerant in the
gherkin parser? If not, we could introduce a config option in
specflow, where the users could control whether they want to have a
full gherkin compatible comment handling or the more tolerant one. Or
shall we just simply use the more tolerant comment handling? What do
you think?

3. Language names
I have seen in the i18n file, that your goal is to use the ISO 639-2
language codes. But as far as I see, even the current codes are not
really fitting to the standards.
In specflow, we have a little bit more problematic situation. When we
convert the scenario step arguments to the parameters of the step
definition methods, we need to specify a culture that is used to
convert the parameters (e.g. if there is a float parameter, we have to
parse the float string so we need a culture to decide what the decimal
separator is). In the .NET world the culture is usually represented
with the CultureInfo class, that can be constructed using the xx-yy
form, where xx is an ISO 639-1 code and yy is a region code based on
ISO 3166. The full list of supported languages can be found here:
http://msdn.microsoft.com/en-us/goglobal/bb896001.aspx
I'm quite sure that you have to keep the existing language codes for
backwards compatibility, but in specflow using the xx-yy codes is much
more convenient for the users.
My idea was that we create a translation table that gives a matching
between the gherkin language codes and the .NET language codes.
Specflow would primarily recommend the .NET codes but would allow to
use the gherkin language codes as well. What do you think on that?
Could we add this language code mapping directly into the i18n.yml
file for each language to keep this mapping consistent? (e.g.
net_code: en-US)

Br,
Gaspar

aslak hellesoy

unread,
Mar 9, 2010, 10:27:51 AM3/9/10
to spec...@googlegroups.com


On Tue, Mar 9, 2010 at 3:48 PM, Gáspár Nagy <gaspa...@gmail.com> wrote:
Hello!


Hi Gáspár!
 
Had some time to finishing the gherkin parser integration to specflow
started by sztupi. I want to see if there is any breaking changes for
the specflow users if they start to use the new version with the new
parser. It's going generally well, however I have discovered tree
problematic points. Aslak, it would be great if you could comment
these.

1. Small incompatibility: @tag is not allowed now in the line of the
"Scenario" keyword.
As far as I see, the old cucumber parser was allowing tagging, like:
  @mytag Scenario: Add two numbers
but the new parser does not allow this anymore. I just wanted to ask
whether this change is on purpose or a bug in the new parser (or was a
bug in the old one).
Actually I think no one used this syntax anyway so this is not really
a problematic breaking change.


Greg, Mike and I had a discussion about whether or not to support tags on the same line, and decided not to.
We believe enforcing some opinions on formatting is good (we believe tags on the same line is ugly) and like you point out, I haven't seen anyone using it. (Some Java people might want to try it since the Java syntax allows something similar - too bad for them hehe).
 
2. Comment handling.
We finally realized that our parser in specflow were more tolerant for
the comments. Actually specflow allowed comments everywhere in the
file, which turned to be useful and many of our projects have used it.
The new gherkin parser allows comments only in special points and not
everywhere. E.g. you cannot comment a row of a table:
 | color    |
 | purple  |  # i don't like it...
 | red      |
As this is used already quite extensively in our projects, this
breaking change is quite painful.

I believe this is something we should support. The tricky part here is pretty printing. In addition to parsing Gherkin source code, the Gherkin gem also has a pretty printer. You can try it out with:

gem install gherkin
gherkin reformat path/to/a.feature

If we allow comments to exist in more places we also need to make the pretty printer smart enough to output them like they were in the source (just prettier, which mostly means sane indentation). Taking your example, we probably need to change the listener event from:

  listener.table(rows, current_line)

to:

  listener.table_row(rows, eol_comment, current_line)

-and invoke that several times. The eol_comment would be nil when there are no comments at the end of the line. This would also open up for comments between table rows.

Since this is only about comments, I have written a fix that removes
all comments from the file before handing it over to the gherkin
parser and this works, but I wanted to discuss this with you. Do you
have any plans to make the comment handling more tolerant in the
gherkin parser?

We hadn't planned to, but now that you propose it I'm not at all opposed to it. I think it is a good idea actually.

If we allow end-of-line comments for tables we should also allow it in other places (scenario lines, step lines etc) and also update the listener API accordingly.
 
If not, we could introduce a config option in
specflow, where the users could control whether they want to have a
full gherkin compatible comment handling or the more tolerant one.

I'd much rather we have one grammar than dialects. Gherkin should be more tolerant.
 
Or
shall we just simply use the more tolerant comment handling? What do
you think?

3. Language names
I have seen in the i18n file, that your goal is to use the ISO 639-2
language codes. But as far as I see, even the current codes are not
really fitting to the standards.

Are you saying that the current codes in i18n.yml are not in accordance with ISO 639-2? If so, which ones? I know there are a couple of fun(invented languages like LOLCAT, Texan, Australian etc, but are there any others?
 
In specflow, we have a little bit more problematic situation. When we
convert the scenario step arguments to the parameters of the step
definition methods, we need to specify a culture that is used to
convert the parameters (e.g. if there is a float parameter, we have to
parse the float string so we need a culture to decide what the decimal
separator is). In the .NET world the culture is usually represented
with the CultureInfo class, that can be constructed using the xx-yy
form, where xx is an ISO 639-1 code and yy is a region code based on
ISO 3166. The full list of supported languages can be found here:
http://msdn.microsoft.com/en-us/goglobal/bb896001.aspx
I'm quite sure that you have to keep the existing language codes for
backwards compatibility, but in specflow using the xx-yy codes is much
more convenient for the users.

I'm not at all opposed to using slightly different i18n names if there is a better standard for it. Ruby (or Cuke4Duke) currently doesn't use the language codes for anything Locale specific (like parsing floats etc) so changing some language codes isn't a huge problem. Even if it would break backwards compatibility it would be a small change for people.

Looking at Java's Locale: http://java.sun.com/javase/6/docs/api/java/util/Locale.html it's using the same ISO standards as .NET, so this looks like a wise thing to go with.
 
My idea was that we create a translation table that gives a matching
between the gherkin language codes and the .NET language codes.

Nah, let's just change to ISO-639 + ISO-3166.
 
Specflow would primarily recommend the .NET codes but would allow to
use the gherkin language codes as well. What do you think on that?
Could we add this language code mapping directly into the i18n.yml
file for each language to keep this mapping consistent? (e.g.
net_code: en-US)


All in all - great news on the progress to get Gherkin into Specflow - it looks like you're going to beat us (Cucumber) to it! This is great though - your suggestions to change the grammar to accomodate for more flexible comments - and a change of i18n to align more with the "big" platforms is exactly the sort of synergy effect I was hoping to get by sharing the parser.

For the i18n.yml file - just fork and send me a pull request when you have changed it. (I will backport the changes to Cucumber). The grammar changes may require some Ragel fu from Mike and/or Greg - I'm sure they'll chip in.

Aslak
 
Br,
Gaspar

aslak hellesoy

unread,
Mar 9, 2010, 11:33:19 AM3/9/10
to spec...@googlegroups.com
On Tue, Mar 9, 2010 at 3:48 PM, Gáspár Nagy <gaspa...@gmail.com> wrote:
Hello!

Had some time to finishing the gherkin parser integration to specflow
started by sztupi. I want to see if there is any breaking changes for
the specflow users if they start to use the new version with the new
parser. It's going generally well, however I have discovered tree
problematic points. Aslak, it would be great if you could comment
these.

1. Small incompatibility: @tag is not allowed now in the line of the
"Scenario" keyword.
As far as I see, the old cucumber parser was allowing tagging, like:
  @mytag Scenario: Add two numbers
but the new parser does not allow this anymore. I just wanted to ask
whether this change is on purpose or a bug in the new parser (or was a
bug in the old one).
Actually I think no one used this syntax anyway so this is not really
a problematic breaking change.

2. Comment handling.
We finally realized that our parser in specflow were more tolerant for
the comments. Actually specflow allowed comments everywhere in the
file, which turned to be useful and many of our projects have used it.
The new gherkin parser allows comments only in special points and not
everywhere.

Have you identified other places where you'd like to put comments, but Gherkin disallows?

One issue about eol comments that I have thought of is Cucumber's output formatting.
Cucumber outputs the location (file:line) as a comment after scenarios, steps (and in the future, also scenario outline example tables).
If we allow comments at the eol, this would result in rather mangled output:

  Given I have 5 cukes # Yum, I like cukes # features/step_definitions/cuke_steps.rb:42

Not very pretty, and potentially more line breaks, making output harder to read.

It would be interesting to know a little more about what you use eol comments for. Can it be replaced by better use of Gherkin?

Aslak

Gáspár Nagy

unread,
Mar 9, 2010, 11:57:55 AM3/9/10
to SpecFlow
Hi,

> > 1. Small incompatibility: @tag is not allowed now in the line of the
> > "Scenario" keyword.

> Greg, Mike and I had a discussion about whether or not to support tags on
> the same line, and decided not to.

Ok. I'm also fine with this.

> > 2. Comment handling.


> We hadn't planned to, but now that you propose it I'm not at all opposed to
> it. I think it is a good idea actually.
>

> I'd much rather we have one grammar than dialects. Gherkin should be more
> tolerant.

Great, I absolutely agree... I can produce a list of examples where
the comment placing were problematic (the end of table row was just an
example, but there are a few more).

The only thing that the more tolerant comment handling brings up is
whether we need to mask the # somehow (for cases where the user wants
to write down the # itself). Have you considered this?

> > 3. Language names


> Are you saying that the current codes in i18n.yml are not in accordance with
> ISO 639-2? If so, which ones? I know there are a couple of fun(invented
> languages like LOLCAT, Texan, Australian etc, but are there any others?

Yes, you are right. Most of them are like this. Actually I have found
only two that does not fit:
- Swedish: se (ISO: sv)
- Catalan: cat (ISO: ca)

> Looking at Java's Locale:http://java.sun.com/javase/6/docs/api/java/util/Locale.htmlit's using the


> same ISO standards as .NET, so this looks like a wise thing to go with.

Oh, that's even better.

> > My idea was that we create a translation table that gives a matching
> > between the gherkin language codes and the .NET language codes.
>
> Nah, let's just change to ISO-639 + ISO-3166.

Good. The other benefit is, that this pair also supports
differentiation of writing variants, like Serbian Latin (sr-Latn) and
Serbian Cyrillic (sr-Cyrl). But unfortunately it does not support
differentiation of the accent/non-accent writings, like the ro and ro2
in the i18n file.

> For the i18n.yml file - just fork and send me a pull request when you have
> changed it. (I will backport the changes to Cucumber).

OK. I'll do that.

> The grammar changes may require some Ragel fu from Mike and/or Greg - I'm sure they'll chip in.

Fine. Thanks for your comments.

Br,
Gaspar

Gáspár Nagy

unread,
Mar 9, 2010, 12:05:24 PM3/9/10
to SpecFlow
And what I forgot: I have also compared the performance of the new
parser with specflow's ANTLR-based parser and the performance seems to
be nearly the same (both are fast enough). So this is a good news.

> > Looking at Java's Locale:http://java.sun.com/javase/6/docs/api/java/util/Locale.htmlit'susing the

aslak hellesoy

unread,
Mar 9, 2010, 12:30:34 PM3/9/10
to spec...@googlegroups.com
On Tue, Mar 9, 2010 at 5:57 PM, Gáspár Nagy <gaspa...@gmail.com> wrote:
Hi,

> > 1. Small incompatibility: @tag is not allowed now in the line of the
> > "Scenario" keyword.
> Greg, Mike and I had a discussion about whether or not to support tags on
> the same line, and decided not to.

Ok. I'm also fine with this.

> > 2. Comment handling.
> We hadn't planned to, but now that you propose it I'm not at all opposed to
> it. I think it is a good idea actually.
>
> I'd much rather we have one grammar than dialects. Gherkin should be more
> tolerant.

Great, I absolutely agree... I can produce a list of examples where
the comment placing were problematic (the end of table row was just an
example, but there are a few more).


That would be very helpful. The best would be to add some features that you wish Gherkin could parse to spec/gherkin/fixtures/*.feature
 
The only thing that the more tolerant comment handling brings up is
whether we need to mask the # somehow (for cases where the user wants
to write down the # itself). Have you considered this?


I hadn't in the first reply, but Greg brought up the same as you. People wouldn't be able to use # in their steps, which I think is a bit of a downer. The parser would probably be more complex if we allow comments at the end of a line. Furthermore, output would look ugly as I pointed out in my other email.

So having thought a little more about comments, here is what I propose:

* Comments can start on the beginning of a line anywhere, except inside multiline strings (pystrings).
* Comments at the end of the line are not allowed anywhere.

> > 3. Language names
> Are you saying that the current codes in i18n.yml are not in accordance with
> ISO 639-2? If so, which ones? I know there are a couple of fun(invented
> languages like LOLCAT, Texan, Australian etc, but are there any others?

Yes, you are right. Most of them are like this. Actually I have found
only two that does not fit:
- Swedish: se (ISO: sv)

I think moving from sv to se would be better. That's what Swedish domain names use :-)
 
- Catalan: cat (ISO: ca)

> Looking at Java's Locale:http://java.sun.com/javase/6/docs/api/java/util/Locale.htmlit's using the
> same ISO standards as .NET, so this looks like a wise thing to go with.

Oh, that's even better.

> > My idea was that we create a translation table that gives a matching
> > between the gherkin language codes and the .NET language codes.
>
> Nah, let's just change to ISO-639 + ISO-3166.

Good. The other benefit is, that this pair also supports
differentiation of writing variants, like Serbian Latin (sr-Latn) and
Serbian Cyrillic (sr-Cyrl). But unfortunately it does not support
differentiation of the accent/non-accent writings, like the ro and ro2
in the i18n file.


I suppose we could add an extra line in the YAML for Gherkin languages that don't exist in the ISO standards. Example:

"ro2":
  name: Romanian (diacritical)
  native: română (diacritical)
  locale: ro

...

"en-lol":
  name: LOLCAT
  native: LOLCAT
  locale: en

That way, if a language has a locale, use that instead of the key. This will make it possible to use non-standard languages.
 
> For the i18n.yml file - just fork and send me a pull request when you have
> changed it. (I will backport the changes to Cucumber).

OK. I'll do that.

> The grammar changes may require some Ragel fu from Mike and/or Greg - I'm sure they'll chip in.

Fine. Thanks for your comments.


Great news about the speed BTW, good to know we're on par with ANTLR (although I had hoped we'd be faster). Maybe you can tweak some of the ragel flags to improve speed for .NET if you need it.

Aslak
 
Br,
Gaspar

Reply all
Reply to author
Forward
0 new messages