I'm trying to improve the error recovery handling for my particular use case, and one of the things I'm attempting to do is figure out what the "best" recovery action would be at any given time by calculating the number of tokens that would match if I took a particular recovery action. For example, I calculate the number of matches I might be able to achieve if I inserted a "missing" token, deleted an "extraneous" token, or even substituted an "invalid" token. Then using these counts, I select the one with the highest match count as the recover action to perform.
My current code is based on recursively evaluating the ATNState#getTransitions(), similar to the limited token look ahead the DefaultErrorStrategy uses. While it mostly works, I don't think I have it 100% correct, guessing something is off around epsilon/rule transitions.
At the moment I'm only using this logic in the recoverInline() function of my custom error strategy. However I'd like to be able to do something similar in sync(), which is a much more complex case. Due to the current short comings of the code, I haven't yet tried integrating with sync().
Code:
int getBestMatchCount(
int tokenIndex, ATNState stateToEval, int currentDepth,
ATN atn, ParserRuleContext ruleContext, TokenStream tokenStream) {
int tokenType = tokenStream.LA(tokenIndex);
IntervalSet expectingAtState = atn.nextTokens(stateToEval, ruleContext);
if (!expectingAtState.contains(tokenType)) {
return 0; // No more matches. Return current count
}
// Increment the token index (so when we recurse we eval the next token)
int matchCount = 1;
tokenIndex++;
// If we haven't hit max depth, loop through each transition available from our current state and recurse
if (currentDepth <= RECOVERY_EVAL_MAX_DEPTH && tokenType != Token.EOF) {
currentDepth++; // Increment current depth so we have an ending
var bestCount = 0;
for (var transition : stateToEval.getTransitions()) {
int transitionMatchCount = getBestMatchCount(
tokenIndex, transition.target, currentDepth, atn, ruleContext, tokenStream);
bestCount = Math.max(bestCount, transitionMatchCount);
}
matchCount += bestCount;
}
return matchCount;
}
Two Questions:1. Are there any API's in the ANTLR Java runtime that would make calculations of this sort easier or more straight forward?
2. Is it even possible to do something like this in sync()? Not real clear on how I would insert a missing/invalid token.
In case it's relevant, my use case is for parsing textual aviation notices, AIRMET's specifically. These tend to be short with fairly well defined format, and with zero chance of fixing any errors as they are produced by various goverment agencies around the world. Further, I fully realize that what I suggest above has a performance cost, but given the shortness and relative frequency, I'm pretty sure the cost will be doable.
Example bulletin:
WAIY32 LIIB 190526
LIRR AIRMET 5 VALID 190530/190830 LIIB-
LIRR ROMA FIR MOD TURB FCST ENTIRE FIR SFC/FL150 STNR NC
Thanks,
Mark Russell