Debugging a new language runtime (Ruby) and need guidance

Chad Slaughter

unread,

Nov 19, 2014, 7:24:17 PM11/19/14

to antlr-di...@googlegroups.com

Hello,

I created an implementation of Ruby language support in antlr4. The code generation works. I am having an issue with debugging the runtime. The runtime mirrors the structure of the Python3 & Java runtimes so I am hoping that someone might point me towards the right part of the runtime to look. This all assumes someone might have seen a similar runtime bug or could imagine where that bug might live.

The bug is the Lexer token matching is too greedy. It seems the .*? construct is matching to the last ending character string instead of the first one it finds. The example below is illustrative.

I am using the Creole grammar: https://github.com/antlr/grammars-v4/tree/master/creole

This rule in the creole grammar:

NOWIKI
: '{{{' .*? '}}}'
;

Given an Input text:

{{{ text }}}
{{{ text }}}
{{{ text }}}

Instead of returning 7 tokens, it is only return 3.

The java runtime using TestRig with the -tokens produces the expected output:

[@0,0:11='{{{ text }}}',<16>,1:0]
[@1,12:12='\n',<15>,1:12]
[@2,13:24='{{{ text }}}',<16>,2:0]
[@3,25:25='\n',<15>,2:12]
[@4,26:37='{{{ text }}}',<16>,3:0]
[@5,38:38='\n',<15>,3:12]
[@6,39:38='<EOF>',<-1>,4:13]

Using my ruby runtime and a ruby version of testrig with -tokens produces;

[@0,0:37='{{{ text }}}\n{{{ text }}}\n{{{ text }}}',<16>,1:0]
[@1,38:38='\n',<15>,3:12]
[@2,39:38='<EOF>',<-1>,4:13]

The ruby runtime works for simple grammars and tokenizes fine in simple cases. I know this is a shot in the dark but I am out of ideas and am looking for a better understanding of the runtime structure that is doing this work or a hint of where to look for my bug or any ideas to help.

Thanks for your time,

Chad

Loring Craymer

unread,

Nov 19, 2014, 8:15:57 PM11/19/14

to antlr-di...@googlegroups.com

The antlr behavior is correct; in previous versions of ANTLR, it was possible to set greedy=false (and it might be possible in ANTLR 4, if undocumented). Otherwise, the .*? should probably be replaced with

( ~'}' | '}' ~'}' | '}' '}' ~'}' )*

Loring

Chad Slaughter

unread,

Nov 19, 2014, 9:54:37 PM11/19/14

to antlr-di...@googlegroups.com

On Wednesday, November 19, 2014 7:15:57 PM UTC-6, Loring Craymer wrote:

The antlr behavior is correct; in previous versions of ANTLR, it was possible to set greedy=false (and it might be possible in ANTLR 4, if undocumented). Otherwise, the .*? should probably be replaced with

I am not disputing the behavior. I agree the Antlr Java runtime is correct. The grammar as given is correct and I am not looking to change the grammar.

My new antlr runtime implemented in Ruby is exhibiting the incorrect behavior and I am looking for clues as to how to find the problem.

Thank you,

Chad

Jim Idle

unread,

Nov 19, 2014, 10:16:18 PM11/19/14

to antlr-di...@googlegroups.com

Stop using Ruby - that's synonymous with "bug" right? ;)

Seriously though, if the generated code, when visually compared with the Java equivalent does not offer any clues, then I would first single step through the generated Java code and understand how it is deciding that the token is complete. If that does not tell you why your Ruby code is not doing the same, then single step through the Ruby code (perhaps even side by side with Java) and that should show you the difference.

Based on the information you provided so far, it is difficult to suggest what might be different but it is likely some difference between what Java does with operator precedence vs Ruby, or comparison differences, such as null, or ordinal/cardinal values, etc. Single stepping is your friend here.

Jim

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chad Slaughter

unread,

Nov 20, 2014, 1:13:26 PM11/20/14

to antlr-di...@googlegroups.com

On Wednesday, November 19, 2014 9:16:18 PM UTC-6, Jim Idle wrote:

Stop using Ruby - that's synonymous with "bug" right? ;)

Languages aren't bugs just there implementations.

Seriously though, if the generated code, when visually compared with the Java equivalent does not offer any clues, then I

Yes the generated code is equivalent.

would first single step through the generated Java code and understand how it is deciding that the token is complete. If that does not tell you why your Ruby code is not doing the same, then single step through the Ruby code (perhaps even side by side with Java) and that should show you the difference.

I don't have the Java tools to do this and, with my question, I was trying to avoid a brute force approach

Based on the information you provided so far, it is difficult to suggest what might be different but it is likely some difference between what Java does with operator precedence vs Ruby, or comparison differences, such as null, or ordinal/cardinal values, etc. Single stepping is your friend here.

Operator precedence was one of the things I checked at the start of the project. I did do an extra pass and found on wrong operator but it was in unrelated code.

Thanks for your help.

- Chad

Chad Slaughter

unread,

Nov 20, 2014, 7:49:28 PM11/20/14

to antlr-di...@googlegroups.com

I found and fixed the problem.

The class LexerATNConfig instance variable passedThroughNonGreedyDecision was sometimes nil/null after calling method

checkNonGreedyDecision. So the non-greedy decision was not being passed through and resulting in greedy behavior.

Thanks,

- Chad

Jim Idle

unread,

Nov 20, 2014, 10:59:01 PM11/20/14

to antlr-di...@googlegroups.com

Good catch :)

--

Terence Parr

unread,

Jan 2, 2015, 12:48:02 PM1/2/15

to antlr-di...@googlegroups.com

hi Chad. Sorry for the delay. Have you been able to make progress on the Ruby target? I think the most successful approach is to first get all of the unit tests working. Eric Vergnaud has altered the test rig mechanism to handle multiple targets in the latest master.

Ter

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.