Java: Improving error recovery by doing a look-ahead match counting?

49 views
Skip to first unread message

Mark Russell

unread,
Feb 14, 2024, 12:38:25 PMFeb 14
to antlr-discussion
I'm trying to improve the error recovery handling for my particular use case, and one of the things I'm attempting to do is figure out what the "best" recovery action would be at any given time by calculating the number of tokens that would match if I took a particular recovery action.  For example, I calculate the number of matches I might be able to achieve if I inserted a "missing" token, deleted an "extraneous" token, or even substituted an "invalid" token.  Then using these counts, I select the one with the highest match count as the recover action to perform.

My current code is based on recursively evaluating the ATNState#getTransitions(), similar to the limited token look ahead the DefaultErrorStrategy uses.  While it mostly works, I don't think I have it 100% correct, guessing something is off around epsilon/rule transitions.

At the moment I'm only using this logic in the recoverInline() function of my custom error strategy.  However I'd like to be able to do something similar in sync(), which is a much more complex case.  Due to the current short comings of the code, I haven't yet tried integrating with sync().

Code:
    int getBestMatchCount(
        int tokenIndex, ATNState stateToEval, int currentDepth,
        ATN atn, ParserRuleContext ruleContext, TokenStream tokenStream) {

        int tokenType = tokenStream.LA(tokenIndex);
        IntervalSet expectingAtState = atn.nextTokens(stateToEval, ruleContext);
        if (!expectingAtState.contains(tokenType)) {
            return 0; // No more matches.  Return current count
        }

        // Increment the token index (so when we recurse we eval the next token)
        int matchCount = 1;
        tokenIndex++;

        // If we haven't hit max depth, loop through each transition available from our current state and recurse
        if (currentDepth <= RECOVERY_EVAL_MAX_DEPTH && tokenType != Token.EOF) {
            currentDepth++; // Increment current depth so we have an ending

            var bestCount = 0;
            for (var transition : stateToEval.getTransitions()) {
                int transitionMatchCount = getBestMatchCount(
                        tokenIndex, transition.target, currentDepth, atn, ruleContext, tokenStream);
                bestCount = Math.max(bestCount, transitionMatchCount);
            }
            matchCount += bestCount;
        }
        return matchCount;
    }

Two Questions:
1. Are there any API's in the ANTLR Java runtime that would make calculations of this sort easier or more straight forward?
2. Is it even possible to do something like this in sync()?  Not real clear on how I would insert a missing/invalid token.

In case it's relevant, my use case is for parsing textual aviation notices, AIRMET's specifically. These tend to be short with fairly well defined format, and with zero chance of fixing any errors as they are produced by various goverment agencies around the world.  Further, I fully realize that what I suggest above has a performance cost, but given the shortness and relative frequency, I'm pretty sure the cost will be doable.

Example bulletin:
    WAIY32 LIIB 190526
    LIRR AIRMET 5 VALID 190530/190830 LIIB-
    LIRR ROMA FIR MOD TURB FCST ENTIRE FIR SFC/FL150 STNR NC

Thanks,
Mark Russell

Wanadoo

unread,
Feb 14, 2024, 2:06:55 PMFeb 14
to antlr-di...@googlegroups.com
Hi,

May I suggest that you open a discussion in the antlr5-specs GitHub repository, such that we can account for this ask ?

Eric
Envoyé de mon iPhone

Le 14 févr. 2024 à 18:38, Mark Russell <russell...@gmail.com> a écrit :

I'm trying to improve the error recovery handling for my particular use case, and one of the things I'm attempting to do is figure out what the "best" recovery action would be at any given time by calculating the number of tokens that would match if I took a particular recovery action.  For example, I calculate the number of matches I might be able to achieve if I inserted a "missing" token, deleted an "extraneous" token, or even substituted an "invalid" token.  Then using these counts, I select the one with the highest match count as the recover action to perform.
--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/4eb2e9b0-aeb7-4458-a393-b649d6015cben%40googlegroups.com.

Mark Russell

unread,
Feb 14, 2024, 2:12:17 PMFeb 14
to antlr-discussion

Just didn't want to cross post unless it was requested.

Mark

Mark Russell

unread,
Feb 14, 2024, 3:02:25 PMFeb 14
to antlr-discussion
I just re-read your response (multi-tasking), and wanted to be clear.  

I'm not looking for code changes in ANTLR4/5, I'm simply trying to determine the best approach for implementing a custom ErrorStrategy.  I doubt what I'm looking for would be applicable to the general language parsing cases due to likely performance problems

Mark
Reply all
Reply to author
Forward
0 new messages