ANTLR code generator placing calls to predicates where they'll never be called.

17 views

Skip to first unread message

avidt...@gmail.com

unread,

Feb 6, 2013, 5:28:51 PM2/6/13

to antlr-di...@googlegroups.com

I'm following the ANTLR v3 text examples on creating various predicates, having the problem mentioned in the title above.

My goal: ignore text unless it is explicitly defined by a grammar rule.

For example, my input file has "sections":

SECTION ONE---------------

SECTION TWO----------------

etc.

Ultimately, I want to skip over any text within a section that is not something I'm explicitly looking for.

I have that working as long as the text does *not* have a token explicitly defined in the grammar (tokens section or literal).

For example, the text in section one is ignored:

SECTION ONE---------------
foo bar

SECTION TWO----------------

However, this text in section one causes a mismatched token exception when TWO is encountered:

SECTION ONE---------------
foo bar TWO

SECTION TWO----------------

Examining the ANTLR-generated code (C# target) I see that the section one rule/function does the following:

(first .Match on expected literals, 'SECTION', 'ONE' and '---------------'

then, loop until an alternative cannot be determined:

Loop Step 1. an input.LA(1)

Loop Step 2. determine an alternative path

Loop Step 3. take that path (.Match or function call)

Since 'foo' and 'bar' are not defined in my grammar, all is well.

Since TWO is defined in the tokens section (or as a literal, 'TWO') Step 1 above returns a value that causes step 2 to determine the rule should exit. (Because there's some predictive checks based on rule ignoreText, see the grammar below).

If I define a predicate to determine if my text-to-ignore is a section header or not, that predicate is never called, because step 2 above *AND's* the call to the predicate token checking for token values.

I've tried the predicate in two locations: the section's rule. And, in the 'ignoreText' rule. The same thing happens: the predicate is never called.

My grammar:

ignoreTextTest: sectionOne
sectionTwo
EOF;

sectionOne: 'SECTION' 'ONE' '---------------'
ignoreText*;

sectionTwo: 'SECTION' 'TWO' '---------------'
ignoreText*;

ignoreText: (token | EOL)*;

token: (TOKEN | NUMBER | symbol);

symbol: (
TICK |
TILDE |
BANG |
AT |
... all the special symbols on the keyboard, lexer rules for each, but left out of this post).

NUMBER : '0' .. '9'+;
TOKEN: ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9') ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9')*;

EOL : '\r'? '\n' | '\r';
WS : (' '|'\r'|'\t'|'\n') {$channel=HIDDEN;};

To test ignored text for section headers, I defined an @members function:

protected bool isSectionHeader() // Yes, could be optimized. This is an example.
{
if (input.LT(1) == "SECTION") && input.LT(2) == "ONE" && input.LT(3) == "---------------"))
{
return(true);
}
if (input.LT(1) == "SECTION") && input.LT(2) == "TWO" && input.LT(3) == "---------------"))
{
return(true);
}
return(false);
}

If I place the predicate in the sectionOne rule or in the ignoreText rule, the situation I described above is the same: ANTLR-generated code is testing for (TOKEN | NUMBER | symbol) ***AND'ing**** that test with a call to my predicate. Thus, it never gets called.

Per the ANTLR text, gated predicates are meant to determine whether or not an alternative should be taken. So, what am I doing wrong when I can't get the predicate in the auto-generated code location to do just that?