Antlr4 - Parser rule specific exception handling not kicking in?

155 views
Skip to first unread message

Ashish Shanker

unread,
Jan 3, 2018, 9:54:34 PM1/3/18
to antlr-discussion

Let's say I have a grammar that looks like this in Antlr4:
start:
stmts EOF
;

stmts
:
(sqlCommand)+
;
catch[RecognitionException e] { ...stmts specific error processing...}

sqlCommand
:
deleteCommand
| insertCommand
...
...
;
catch[RecognitionException e] { ... sqlCommand specific error processing ...}

deleteCommand:
   ( withExpressions )?
   DELETE_TOKEN
   ( topClause )?
   deleteFromClause
   ( outputClause )*
   ( deleteWhereFromClause )?
   ( whereClause )?
   ( optionClause )?
   semicolon
;
catch[RecognitionException e] { ... deleteCommand specific error processing ... }


Assume for a moment that all grammars are appropriately defined along with the lexer rules as necessary. A valid input like the following produces a well formed ParseTree without errors:

delete schemaA.tableA where colA=20

However, if there is a typo in the input like below, the parse errors out at stmts level:

delete schemaA.tableA whre colA=20

I get a NoViableAltException like so:

line 1:27 no viable alternative at input 'delete schemaA.tableA whre colA'

Note that I say "it errors out at stmts level" because I observe that only the catch block of stmts() method in generated parser code gets hit so I only see { ...stmts specific error processing ...} execute.

Upon stepping through Antlr generated code, it appears adaptivePredict() method fails in stmts() method and hence the catch block of stmts grammar gets exercise and you see {... stmts specific error processing...} execute. I stepped through code in Antlr4's ParserATNSimulator and it appears the reason adaptivePredict() fails is because when it tries to compute reach set, it is unable to  get to a unique alt even after consuming more tokens. So at some point it gives up and by that time it had consumed till colA.

My questions are:
(a) Is this expected behavior? Shouldn't stmts()'s adaptivePredict() have found sqlCommand as a unique alt upon seeing "delete schemaA.tableA". From there, it should have gone into deleteCommand() method of generated code based on the DELETE_TOKEN and ideally it should have hit the catch block of deleteCommand() and should have done deleteCommand specific error processing. I understand what I am describing isn't LL*, but I was hoping it would give me more intelligence instead of simply erroring out at the top level.

(b) My ultimate need is to be able to identify the exact location of the error i.e. I would like the parser to say that the error is at 'whre' token instead of colA token. It seems to me that I am unable to pin point 'whre' as the error token because unlike older version, Antlr4 separates out "right path determination" from "grammar rule execution". So adaptivePredict() essentially discovers the right path by hunting through a graph (as many levels as necessary) which is an in memory data structure. Then it applies the outcome to generated parser code. In old Antlr, it appears, path determination as well as grammar rule execution happened through the same code flow in generated parser code. So it was easier to pin point the starting token where error started and you could even get a partial AST. 

I am tempted to simply look at the position of error as reported by NoViableException (1:27) and take the token just before it (ignoring whitespace) to be the starting point of error. But if LL* is going to look through an arbitrary depth, is that token guaranteed to be where the error started? Is it possible that there would be more than 1 trailing tokens before the adaptivePredict() gives up?

I wanted to get thoughts from the community if there is a better way or the right Antlr way of doing this?


Reply all
Reply to author
Forward
0 new messages