Astonishingly poor performance from a simple grammar change

Ross Patterson

unread,

Sep 13, 2022, 8:17:39 PM9/13/22

to antlr-di...@googlegroups.com

As I noted yesterday on the "No reserved words" Discussion, I attempted to use what I believe is the standard fix to the keywords-can-also-be-identifiers grammar problem - listing the keyword tokens as alternatives in the identifier parser rules for the grammars-v4/rexx parser. Boy, was I surprised! With the full list of 64 Rexx keywords, parsing an 11K line program (via "grun Rexx file_ -gui b2h.cmd"), which had previously taken abou 15 seconds, became seemingly interminable. I killed the process after it had consumed 7 minutes of CPU time. I'm going to let one run as long as it needs overnight tonight, just to see how long it takes.

This change in performance is inexplicable to me. Yes, I added 64 terminals to 2 productions. But adding just the 6 that the test program required had no measurable impact. I diffed the 6-keyword and 64-keyword versions of the generated Java parser, and there was one significant difference. The fast one with just a few keywords used quite a few "switch (_input.LA(1))" statements with a "case" clause for each keyword. The slow one with all the keywords instead used "switch ( getInterpreter().adaptivePredict(_input,10,_ctx) )" (with different values in place of the "10"), and "case" clauses like "case 1" and "case 2". I don't know what "getInterpreter().adaptivePredict()" does, but it must be much more involved that just getting a single lookahead token.

Anybody understand what's wrong here? Or have a better suggestion for letting identifiers have the same names as keywords?

Ross

Ross Patterson

unread,

Sep 13, 2022, 11:19:01 PM9/13/22

to antlr-discussion

OK, I didn't let it run overnight, but I let grun run until it had consumed 2 hours of CPU time, and it still hadn't launched the view window. How does one go about debugging a glacially-slow parser?

Ross

Terence Parr

unread,

Sep 13, 2022, 11:27:35 PM9/13/22

to antlr-di...@googlegroups.com

Hi just to be sure, this is the Java target?

Can I get a grammar and input string?

Ter

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/efa3b9bf-a251-4545-8bfc-aa0d89602986n%40googlegroups.com.

--

Dictation in use. Please excuse homophones, malapropisms, and nonsense.

Ross Patterson

unread,

Sep 15, 2022, 9:51:30 PM9/15/22

to antlr-discussion

I'm working on a proper demostration. This is more complex to prove than I expected.

As they used to say on TV, "Film at 11." :-)

Ross

Ross Patterson

unread,

Oct 8, 2022, 8:32:37 PM10/8/22

to antlr-discussion

I'm back to working on this problem. I've got 4.11.2-SNAPSHOT running on my machine, and it demostrates the issue in both the Python3 and Java runtimes. The Python version of the parser crashes with "RecursionError: maximum recursion depth exceeded", and the tracback shows the following sequence. The Java version, in debug under Eclipse, just runs forever, but any time I stop it, the stack for the TestRig thread looks the same.

Traceback (most recent call last):
File "C:\Ross\Source\antlr\pyenv\rexx\build\mockit.py", line 218, in <module>
main(sys.argv)
File "C:\Ross\Source\antlr\pyenv\rexx\build\mockit.py", line 24, in main
parse_tree = parser.file_()
File "C:\Ross\Source\antlr\pyenv\rexx\build\RexxParser.py", line 690, in file_
self.program_()
File "C:\Ross\Source\antlr\pyenv\rexx\build\RexxParser.py", line 755, in program_
self.instruction_list()
File "C:\Ross\Source\antlr\pyenv\rexx\build\RexxParser.py", line 1071, in instruction_list
self.instruction()
File "C:\Ross\Source\antlr\pyenv\rexx\build\RexxParser.py", line 1120, in instruction
la_ = self._interp.adaptivePredict(self._input,10,self._ctx)
File "C:\Ross\Source\antlr\pyenv\lib\site-packages\antlr4\atn\ParserATNSimulator.py", line 346, in adaptivePredict
alt = self.execATN(dfa, s0, input, index, outerContext)
File "C:\Ross\Source\antlr\pyenv\lib\site-packages\antlr4\atn\ParserATNSimulator.py", line 446, in execATN
alt = self.execATNWithFullContext(dfa, D, s0_closure, input, startIndex, outerContext)
File "C:\Ross\Source\antlr\pyenv\lib\site-packages\antlr4\atn\ParserATNSimulator.py", line 575, in execATNWithFullContext
reach = self.computeReachSet(previous, t, fullCtx)
File "C:\Ross\Source\antlr\pyenv\lib\site-packages\antlr4\atn\ParserATNSimulator.py", line 734, in computeReachSet
self.closure(c, reach, closureBusy, False, fullCtx, treatEofAsEpsilon)
File "C:\Ross\Source\antlr\pyenv\lib\site-packages\antlr4\atn\ParserATNSimulator.py", line 1093, in closure
self.closureCheckingStopState(config, configs, closureBusy, collectPredicates,
File "C:\Ross\Source\antlr\pyenv\lib\site-packages\antlr4\atn\ParserATNSimulator.py", line 1136, in closureCheckingStopState
self.closure_(config, configs, closureBusy, collectPredicates, fullCtx, depth, treatEofAsEpsilon)
File "C:\Ross\Source\antlr\pyenv\lib\site-packages\antlr4\atn\ParserATNSimulator.py", line 1187, in closure_
self.closureCheckingStopState(c, configs, closureBusy, continueCollecting, fullCtx, newDepth, treatEofAsEpsilon)
File "C:\Ross\Source\antlr\pyenv\lib\site-packages\antlr4\atn\ParserATNSimulator.py", line 1136, in closureCheckingStopState
self.closure_(config, configs, closureBusy, collectPredicates, fullCtx, depth, treatEofAsEpsilon)
File "C:\Ross\Source\antlr\pyenv\lib\site-packages\antlr4\atn\ParserATNSimulator.py", line 1187, in closure_
self.closureCheckingStopState(c, configs, closureBusy, continueCollecting, fullCtx, newDepth, treatEofAsEpsilon)
...

After that, it's alternating calls to closureCheckingStopState and closure_. In Eclipse, I put a breakpoint in execATN() after the call to execATNWithFullContext(), and it never returned.

I can provide a grammar and input string, but this doesn't fail on any case I have except for a 11,702 line input file. I was trying the most complex Rexx program I know of. If you want, I'll ZIP it up with my working copy of the grammar and send it along.

Ross

Terence Parr

unread,

Oct 9, 2022, 4:45:23 PM10/9/22

to antlr-di...@googlegroups.com

This is no doubt one of the land mines that can occur with this ALL(*) parsing algorithm. Some grammars require super deep look ahead and a lot of context, which ken mean ridiculous execution times. Maybe look at the ANTLR grammar Profiler and see if you can find out which decisions are ambiguous in which require a lot of look ahead. Then rewrite those and a simpler form. Ridiculous speed ups can occur.

Ter

To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/27b48b5d-c12e-4104-b57a-2c48a9686c0en%40googlegroups.com.

adr...@sutherlandonline.org

unread,

Oct 9, 2022, 5:15:32 PM10/9/22

to antlr-discussion

I have experience with parsing REXX (note however that currently my cREXX initiative is defining a "level b" rexx language that does not allow identifiers to be keywords - frankly for the sake of sanity but that is a different argument!) - level c (classic rexx) is for the future.

Anyway there are other areas of difficulty including the concatenation operator (a space or no space at all) that make even expression parsing troublesome.

The parsers I am aware of are handcrafted.

For cREXX (apologies in advance - I wanted a public domain solution) we use re2c and lemon. Lemon is a simple soul but has the ability to fallback a keyword to an identifier. Also the ANSI REXX Draft standard has quite a few comments on (for example) when THEN can be an identifier - for me this is a symptom of how troublesome they found it too.

My suggestions are

- Prioritise keywords over identifiers, use gating if and when possible - that will surely get rid of some ambiguities.

- I stage the parsing - get an intermediate Parse / AST Tree and then manipulate it into the right shape.

I think it is feasible with ANTLR - but it is unfair to expect it to work it out alone - you need to craft the grammar - and (I'm sorry) it will be a nightmare :-)