Hello,
I created an implementation of Ruby language support in antlr4. The code generation works. I am having an issue with debugging the runtime. The runtime mirrors the structure of the Python3 & Java runtimes so I am hoping that someone might point me towards the right part of the runtime to look. This all assumes someone might have seen a similar runtime bug or could imagine where that bug might live.
The bug is the Lexer token matching is too greedy. It seems the .*? construct is matching to the last ending character string instead of the first one it finds. The example below is illustrative.
This rule in the creole grammar:
NOWIKI
: '{{{' .*? '}}}'
;
Given an Input text:
{{{ text }}}
{{{ text }}}
{{{ text }}}
Instead of returning 7 tokens, it is only return 3.
The java runtime using TestRig with the -tokens produces the expected output:
[@0,0:11='{{{ text }}}',<16>,1:0]
[@1,12:12='\n',<15>,1:12]
[@2,13:24='{{{ text }}}',<16>,2:0]
[@3,25:25='\n',<15>,2:12]
[@4,26:37='{{{ text }}}',<16>,3:0]
[@5,38:38='\n',<15>,3:12]
[@6,39:38='<EOF>',<-1>,4:13]
Using my ruby runtime and a ruby version of testrig with -tokens produces;
[@0,0:37='{{{ text }}}\n{{{ text }}}\n{{{ text }}}',<16>,1:0]
[@1,38:38='\n',<15>,3:12]
[@2,39:38='<EOF>',<-1>,4:13]
The ruby runtime works for simple grammars and tokenizes fine in simple cases. I know this is a shot in the dark but I am out of ideas and am looking for a better understanding of the runtime structure that is doing this work or a hint of where to look for my bug or any ideas to help.
Thanks for your time,
Chad