Hi,
I'm using the attached grammar.
The parser is entering an infinite loop and eventually I get java.lang.OutOfMemoryError.
(I also created this as a github issue - https://github.com/antlr/antlr4/issues/2815 ).
When trying to parse the text:releaseDate GE 2020-04-28T00:00:00
(without double quotes surrounding the date), the parser enters infinite loop.
When I parse the text:releaseDate "GE 2020-04-28T00:00:00"
(with double quotes surrounding the date), everything is OK.
I'm using latest antlr-4.8 and Java 11.
The infinite loop is in DefaultErrorStrategy.consumeUntil
:
while (ttype != Token.EOF && !set.contains(ttype) ) { //System.out.println("consume during recover LA(1)="+getTokenNames()[input.LA(1)]); // recognizer.getInputStream().consume(); recognizer.consume(); ttype = recognizer.getInputStream().LA(1); }
I'm quite new to Antlr, so it's hard for me to locate the root cause. It seems like a bug,
Can anyone assist in resolving that?
Thank you,
Ori.
Greetings!
your "2020-04-28T00:00:00" input string is a mal-formed date-time-string because the trailing 'Z' is missing
i am surprised you did not see errors reported from the lexer when the ':' were encountered...
and so the parser tries to discard tokens in order to attempt to re-synchronize with the input stream
but it can not
because DATE_TIME_STRINGS recognize the EMPTY string. so we try continuously try to discard (possibly) erroneous input characters. And DATE_TIME_STRINGS represent ZERO characters that must be discarded, so we discard them. And then DATE_TIME_STRINGS represent ZERO characters that must be discarded, so we discard them. And then DATE_TIME_STRINGS represent ZERO characters ------ and so the infinite loop. I see the loop as a stack overflow rather than out of memory, again maybe check your error reporting.
I modified your DATE_TIME_STRING stuff alot. Making it Parser rules, mostly, rather than Lexer rules.
Also while I was playing with your grammar, I thought that
maybe ENCODED_STRING token was being toooo agresive in consuming
characters. So I added the LineTerminator and \t characters to
its exclusion.
Attached please find the grammar file I used in my tests.
Hope this helps...
-jbb
--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/b59c1d9c-749d-4808-979b-067aafe5f876%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/fc2b1f03-5e1a-e5f4-e832-9564ae1b0ef6%40gmail.com.
<Query.g4>
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-di...@googlegroups.com.
the second date is scanned as an ENCODED_STRING token because the comma is not in its exclusion set.
it is usually a really good idea to dump the token stream in
order to be really sure of the Lexer's activity...
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/8fb8b487-691f-4d2b-a1e0-afad7677fec7%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/8fb8b487-691f-4d2b-a1e0-afad7677fec7%40googlegroups.com.
the only infinite loop i am aware of is the one you tripped upon --- a mal-formed lexer that wants to match the zero-length input string.
so i guess your best protection against loops is to have a well-formed grammar.
not much help, i suspect...
-jbb
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/d2b199a9-12e9-473d-886d-29e8ee4cb193%40googlegroups.com.