ANTLR maximum recursion depth exceeded error when parsing a C file with large array

236 views
Skip to first unread message

Gautam Thaker

unread,
Nov 25, 2020, 9:36:53 AM11/25/20
to antlr-discussion
I tried to ask this on stackoverflow but them immediately closed the question for "lack of details/clarity." (I had hoped that a simple example that leads to core dump would have been deemed worthy, but I will try to ask here.)

I am using ANTLR 4.8 with its python binding to parse C source file(s). A problem that I have been seeing is a max recursion depth error when testing on array’s with  large amounts of data within them.

Content of a file that produces an error looks like:

<code>

    const unsigned char foo[] = {
        99,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,
        0,64,0,0,0,115,248,1,0,0,100,0,90,0,100,91,
        90,1,100,92,90,2,101,2,101,1,23,0,90,3,100,4,
        100,5,132,0,90,4,100,6,100,7,132,0,90,5,100,8,
        100,9,132,0,90,6,100,10,100,11,132,0,90,7,100,12,
        100,13,132,0,90,8,100,14,100,15,132,0,90,9,100,16,
        100,17,132,0,90,10,100,18,100,19,132,0,90,11,100,20,
        100,21,132,0,90,12,100,93,100,23,100,24,132,1,90,13,
        101,14,101,13,106,15,131,1,90,16,100,25,106,17,100,26,
        100,27,131,2,100,28,23,0,90,18,101,19,106,20,101,18,
        100,27,131,2,90,21,100,29,90,22,100,30,90,23,100,31,
        103,1,90,24,100,32,103,1,90,25,101,25,4,0,90,26,
        90,27,100,94,100,33,100,34,156,1,100,35,100,36,132,3,
        90,28,100,37,100,38,132,0,90,29,100,39,100,40,132,0,
        90,30,100,41,100,42,132,0,90,31,100,43,100,44,132,0,
        90,32,100,45,100,46,132,0,90,33,100,47,100,48,132,0,
        90,34,100,95,100,49,100,50,132,1,90,35,100,96,100,51,
        100,52,132,1,90,36,100,97,100,54,100,55,132,1,90,37,
        ...
        //  A total of 2,432 lines of data, nothing else in the file.
    };

</code>

The single file contains one array with 2,432  lines of data and nothing else.
The outputted error when parsing the array is:

<code>

    ...
    File "../antlr4/tree/Tree.py", line 147, in walk
        self.walk(listener, child)
      File "../antlr4/tree/Tree.py", line 147, in walk
        self.walk(listener, child)
      File "../antlr4/tree/Tree.py", line 145, in walk
        self.enterRule(listener, t)
      File "../antlr4/tree/Tree.py", line 159, in enterRule
        ctx.enterRule(listener)
      File "../Parser.py", line 14461, in enterRule
        listener.enterInitializerlist(self)
    RuntimeError: maximum recursion depth exceeded

</code>

ANTLR VERSION = 4.8

Does anyone know why this is occurring or any advice/fixes for the problem? I am using Python bindings for ANTLR4. I will appreciate any comments or hints.

Mike Lischke

unread,
Nov 25, 2020, 11:12:07 AM11/25/20
to antlr-discussion

I tried to ask this on stackoverflow but them immediately closed the question for "lack of details/clarity." (I had hoped that a simple example that leads to core dump would have been deemed worthy, but I will try to ask here.)

The answer here is the same like on SO: add your relevant grammar rules! We cannot guess them.

Mike

Gautam Thaker

unread,
Nov 25, 2020, 1:57:11 PM11/25/20
to antlr-discussion
Yes, I should have made this clear. I am using  (what I believe to be) the latest version of file C.g4

https://raw.githubusercontent.com/antlr/grammars-v4/master/c/C.g4

I then do:

java -cp ./etc/antlr-4.8-complete.jar org.antlr.v4.Tool -Dlanguage=Python2 -o ./src/autogen ./grammars/C.g4 -visitor

Which produces the following files that I use 

src/autogen/grammars/C.tokens
src/autogen/grammars/CLexer.py
src/autogen/grammars/CLexer.tokens
src/autogen/grammars/CListener.py
src/autogen/grammars/CParser.py
src/autogen/grammars/CVisitor.py

Bill Dickenson

unread,
Nov 25, 2020, 2:07:10 PM11/25/20
to antlr-discussion
So I assume you sliced this down to a smaller array to test it ? We had a similar problem that was solved by increasing the amount of memory and swap the machine had. So you may not be having a "rule" problem as much as a size issue.
Cut it down to 100 lines of array def and try it again. Or if you are are AWS bump to the 8 core machine for a while. AWS says it can dynamically burst but ANTLR doesn't seem to like it. Also increase your stack size parm.

Oddly enough, one of our test programs that a client gave us from GitHub looked like this.

Gautam Thaker

unread,
Nov 25, 2020, 2:52:42 PM11/25/20
to antlr-discussion
We tested  by repeatedly reducing the file size to see where was the breaking point.  At 58 lines  is the threshold, more than this we get a max recursion error.  Our "ulimit -a" output is shown below.

It seems counter intuitive that a C grammer would need deep recursion to handle a simple array. (But I am not grammers person, I admit that readily.)   GCC for CLANG/LLVM run thru this file in milliseconds, for what that is worth. Sure I can always go for a super beefy machine, but I just wanted to ask if this seemed reasonable.


  core file size          (blocks, -c) 0
   data seg size           (kbytes, -d) unlimited
   scheduling priority             (-e) 0
   file size               (blocks, -f) unlimited
   pending signals                 (-i) 256934
   max locked memory       (kbytes, -l) 65536
   max memory size         (kbytes, -m) unlimited
   open files                      (-n) 1024
   pipe size            (512 bytes, -p) 8
 POSIX message queues     (bytes, -q) 819200
 real-time priority              (-r) 0
 stack size              (kbytes, -s) 8192
 cpu time               (seconds, -t) unlimited
 max user processes              (-u) 256934
 virtual memory          (kbytes, -v) unlimited
 file locks                      (-x) unlimited

Martin Mirchev

unread,
Nov 25, 2020, 2:56:05 PM11/25/20
to antlr-discussion
Hi, the issue is in the grammar itself. It is recursive as that is how it is defined in the standard, I suggest a little rework at some parts of it. I can make it such if you want to.

Gautam Thaker

unread,
Nov 25, 2020, 6:04:44 PM11/25/20
to antlr-discussion
If you think there is fix that can be made to C.g4 that will overcome this it would be great and very much appreciated by us. We live in fear that we are building some capabilities on ANTLR but at some unknown point in future ,working with some given source tree our tool may fail entirely!

THanks again.

Gautam

Loring Craymer

unread,
Nov 25, 2020, 9:23:00 PM11/25/20
to antlr-discussion
initializerList is recursive rather than iterative.  Change it to 
initializerList
        :
        designation? initializer
        ( ','   designation?  initializer )?
       ;

and it might work.

John B Brodie

unread,
Nov 25, 2020, 9:43:42 PM11/25/20
to antlr-di...@googlegroups.com, Loring Craymer

Greetings!

i think that just allows at most a pair of initializer in the initialiazerList

maybe

initializerList
        :
        designation? initializer
        ( ','   designation?  initializer )*
       ;

would be better.

Note that i have NOT actually looked at the grammar, so maybe i am off-base....

Hope this helps
   -jbb
--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/8088b0f7-e152-4fd4-a7a3-9d007688ecban%40googlegroups.com.

Loring Craymer

unread,
Nov 25, 2020, 10:06:18 PM11/25/20
to antlr-discussion
Yes, that happens when writing quickly--you start to question your stars.

Gautam Thaker

unread,
Nov 26, 2020, 9:38:30 AM11/26/20
to antlr-discussion
I have been able to confirm that changes suggested by john.b  fixes my issues. I tested the changes against entire 2432  line input file (it was previously getting "max recursion exceeded" with just 60 lines). And I have noticed (though have not carefully measured) that parsing seems faster now as well.

If this is indeed a good change would we need anything similar in CPP14.g4   grammer file? There the relevant section seems to read:

(from: https://github.com/antlr/grammars-v4/blob/master/cpp/CPP14Parser.g4)
initializerList:
initializerClause Ellipsis? (
Comma initializerClause Ellipsis?
)*;
bracedInitList: LeftBrace (initializerList Comma?)? RightBrace; 
 
My test case file looks like:

const unsigned char _Py_M__importlib_external[] = {
    99,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,
    0,64,0,0,0,115,248,1,0,0,100,0,90,0,100,91,
    90,1,100,92,90,2,101,2,101,1,23,0,90,3,100,4,
    100,5,132,0,90,4,100,6,100,7,132,0,90,5,100,8,
    100,9,132,0,90,6,100,10,100,11,132,0,90,7,100,12,
  ... 
 ....
};   //  a total of 2432 lines of integers.

John B Brodie

unread,
Nov 26, 2020, 12:11:36 PM11/26/20
to antlr-di...@googlegroups.com

Greetings!

there is no direct recursion in the rules you cite, e.g. no rule mentions itself on the right hand side of the colon.

however there is circular mutual recursion between the initializerClause, initializerList, and bracedInitList rules.

my C++ is very rusty, but I believe these rules allow for the initialization of deeply nested structures. Like for a silly example {{{{{{{{{{0}}}}}}}}}} as an initial value.

So, in theory, it could be a problem. But, in practice, is probably not. I don't think a human would code a collection of struct that are soooooo deeply nested. But then maybe a code generation tool that produces C++ code might. But I doubt it....

Hope this helps...

   -jbb

Gautam Thaker

unread,
Nov 27, 2020, 2:48:18 PM11/27/20
to antlr-di...@googlegroups.com
What next step should I take? Should I open an Issue at:

https://github.com/antlr/grammars-v4/issues/new

and document this thread and some one with authority can make and commit this change? Change is just 3-4 lines, but of course important.  (I can create a PULL request if needed, but I am hoping that for this small edit someone can just do it. But I can generate a PULL request if that is the common procedure.)

Also, from John Brodie's response it seems like we can live without changing anything in CPP14.g4 file.

G

You received this message because you are subscribed to a topic in the Google Groups "antlr-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antlr-discussion/AouKqFLP-8M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/f8ea8a8a-ec64-4fc1-89e0-b4fa8f258eccn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages