ANTLR4 Parsing of deeply nested if/else statements in C#

1,507 views
Skip to first unread message

gautha...@o9solutions.com

unread,
Jun 30, 2017, 11:57:18 AM6/30/17
to antlr-discussion

I have a grammar with if/elseif/else statements (C# code generation), and I've run into an issue whereby a deeply nested if/else block (24 levels deep) that parses successfully in ANTLR 4.0 (albeit very slowly), now hangs on ANTLR 4.6.1 - it ran for over 2 hours without completing before I killed it. 

While trying to debug this issue, I ran across this thread and tried the sll = true option in the else-if and else rule elements. However, the parsers generated with and without the sll=true option are identical. Is this option supported for C# code generation?

Eric Vergnaud

unread,
Jun 30, 2017, 10:08:26 PM6/30/17
to antlr-discussion
HI,
SLL is a runtime option, not a generation option.
Eric

gautha...@o9solutions.com

unread,
Jul 1, 2017, 7:44:14 AM7/1/17
to antlr-di...@googlegroups.com

Eric,

I added the option as shown in the thread I referenced.  I would have expected that this would change the code of the generated parser to switch the prediction-mode for the if/else blocks.
If not, what effect should this directive have on the generated code?


Thanks,
Gautham

Mike Lischke

unread,
Jul 1, 2017, 10:33:34 AM7/1/17
to antlr-di...@googlegroups.com

I have a grammar with if/elseif/else statements (C# code generation), and I've run into an issue whereby a deeply nested if/else block (24 levels deep) that parses successfully in ANTLR 4.0 (albeit very slowly), now hangs on ANTLR 4.6.1 - it ran for over 2 hours without completing before I killed it. 

Can you show us the grammar? I just ran a quick test with IF THEN ELSEIF and ENDIF from an SQL grammar, 120 levels deep, that parses in 6ms (C++ target). I think that is comparable, even though a different language to parse. The relevant rules in the (My)SQL grammar are:

ifStatement:
IF_SYMBOL ifBody END_SYMBOL IF_SYMBOL
;

ifBody:
expr thenStatement (ELSEIF_SYMBOL ifBody | ELSE_SYMBOL compoundStatementList)?
;

thenStatement:
THEN_SYMBOL compoundStatementList
;


OK, SQL has a THEN keyword. Could that make such a big difference? Have you tried with a simpler grammar, which only supports the if-then-else construct? Does it require the same amount of time?

And for the SLL option: that PR was never merged, so I doubt it is supported at all.


gautha...@o9solutions.com

unread,
Jul 1, 2017, 1:41:49 PM7/1/17
to antlr-discussion

Mike,
   The entire grammar file is over 2000 lines, so I can't post it, but below are screen-shots of relevant sections.  There are over 70 rule-elements for the "statement" rule.
Below the grammar excerpt is the nested if-else condition that hangs with ANTLR 4.6.1 (only a portion is shown, the condition is 24 levels deep).

I'm surprised about the not-merged comment... the option was accepted by the code-generator, so there is at least some level of support i think.  Just to be sure, when I used a nonsensical option name, an exception was thrown during code-generation that the option was unknown.

Another observation - the grammar used to have an "end if" rule element following "elseStat" in the first rule below.  None of our queries actually need or use it, so I removed it.  With this change, ANTLR 4.0, parsing time for the nested if came down from almost 10 minutes to under 4 minutes.  Version 4.6.1 still got hung up for almost 2 hours before I killed it.  In production usage, no one really uses such deep nesting, but this is one our test cases, and this issue has stopped us  from upgrading to the latest ANTLR version. 

Any suggestions on things I can try are most welcome.

Thanks,
Gautham


Mike Lischke

unread,
Jul 2, 2017, 6:22:37 AM7/2/17
to antlr-di...@googlegroups.com
   The entire grammar file is over 2000 lines, so I can't post it, but below are screen-shots of relevant sections.

You could put it online somewhere and post a link (no Github project for this?). But the images work as well (mostly).

I'm surprised about the not-merged comment... the option was accepted by the code-generator, so there is at least some level of support i think.  Just to be sure, when I used a nonsensical option name, an exception was thrown during code-generation that the option was unknown.

When I use this option I get a warning:



Any suggestions on things I can try are most welcome.

Is `ifStatement` part of the the `statement` rule? If so you could simplify this to:

ifStatement
    : ifStat (ELSE (expression | THEN block))?
;

and remove `ifStat`, `elseIfStat` and  `elseStat`.

Your `ifStatement` rule probably is creating ambiguities which cost time. Not only do you have that kleene loop and the optional `elseStat` part (both start with ELSE), the entire rule also can match nothing (empty alt at the end). This empty alt is the first thing I'd remove. Try not to duplicate things if not necessary (e.g. `elseIfStat` is `ifStat` just with a leading ELSE).

In general I recommend to experiment with your grammar. The ALL(*) algorithm is very sensitive to the input and can easily consume a lot of time if a grammar drives it to that (e.g. via leading predicates, lots of ambiguities etc.). So, if you cannot figure out why a certain part is slow, start over. Use a simple version for the start and see if that is good enough, then add to it until you see times going up unacceptably.




gautha...@o9solutions.com

unread,
Jul 2, 2017, 8:10:00 AM7/2/17
to antlr-discussion

Mike,
    Thanks for the response - I will rewrite the if/else rules along the lines you suggested and see if that helps with performance.

Gautham

gautha...@o9solutions.com

unread,
Jul 5, 2017, 8:27:31 AM7/5/17
to antlr-discussion

An update - I tried the simplified version of the if-else rules - it did not make any difference.  ANTLR 4.0 still parses the nested rule in 4 minutes, while 4.6.1 hangs.  For now I'll refrain from upgrading and stay with 4.0.  Maybe I'll try again when NuGet packages for 4.7 become available.

Thanks again for your help Mike,
Gautham

Jim Idle

unread,
Jul 6, 2017, 1:03:15 AM7/6/17
to antlr-di...@googlegroups.com
I think that your formulation is problematic. It will lead to a lot of ambiguity. I think that you want this:

statement:
    IF e=expression THEN b=block (ELSE t=THEN? block)?
   | other statements
 ...

Which will handle all your nests. The ELSE THEN construct seems redundant and a mistake in the language. If it is your own design then you can take it out completely, but if not and it has some significance then specify it as above and look for presence in the walker.

There is no way nested IF should take 4 minutes to parse. Should be instant.

You don't need the special rules for IF. You don't even need "block" but it might help if you are doing code gen etc. Also, I would not use @after and so on. Use a parse tree walker - you will find it very much easier.

Give it a try at least eh? 

Basically, don't try to over-specify it in rules, parse anything that is more or less valid, then reject in the walker as a semantic error (which will have much more information available so you can give your users a better error message).

Jim


This Electronic Mail (e-mail) contains confidential and privileged information intended only for the use of the individual or entity to which it is sent.  If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is STRICTLY PROHIBITED.  If you have received this communication in error, please immediately notify the sender by reply e-mail or telephone. 

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussion+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gautha...@o9solutions.com

unread,
Jul 9, 2017, 10:38:01 AM7/9/17
to antlr-discussion

Jim,
   Thanks for your suggestion.  The grammar we have is currently in production with lots of usage, so modifying language is not an easy option.  I tried some  alternative rule-rewriting from an earlier response without much luck.  I'll try your suggestion when I test out the 4.7 version of the parser as soon as it's NuGet package comes out.

Gautham
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.

Eric Vergnaud

unread,
Jul 10, 2017, 11:24:12 AM7/10/17
to antlr-discussion
The 4.7.1 Nuget package is already out, see https://www.nuget.org/packages/Antlr4.Runtime.Standard/

gautha...@o9solutions.com

unread,
Jul 11, 2017, 7:12:45 AM7/11/17
to antlr-discussion

Eric,
   I recall trying this out a couple of weeks ago - however, the code-generator package was still at 4.6.1 (now it's at 4.6.2), and I had to generate the parser using it.  Then I got linker errors about missing symbols in client projects using the parser.  I figured this was due to the mismatch between generator library and run-time, so I abandoned the effort and went back to 4.6.1.

Thanks,
Gautham  

Eric Vergnaud

unread,
Jul 12, 2017, 2:10:26 PM7/12/17
to antlr-discussion
The  Antlr4.Runtime.Standard and  Antlr4.Runtime.Core nugget packages published by the ANTLR organization are meant to run parsers generated using the official ANTLR tool.
If you're using the VS integration from tunnelvisionlabs, then indeed you can only rely on their runtime.
Reply all
Reply to author
Forward
0 new messages