ANTLR4 include implementation in the Lexer

342 views
Skip to first unread message

TR

unread,
Feb 13, 2017, 6:46:56 AM2/13/17
to antlr-discussion
Hi,

I am trying to implement an include macro that will work in the lexer stage in ANTLR4.
I have initially implemented it in the parser and that worked just fine, however my requirements had forced me to support things like:

file1.txt:     > 5
file2.txt:     if ( x #include file1.txt ) { ... }

This requires evaluating the included file within the parse context of the calling parser (afaik this cannot be trivially resolved resolved within the parser stage).
I have been experimenting with some implementation (see below), however I have my doubts about it since I cannot be certain I didn't miss anything, which is why I seek your knowledge & experience to suggest corrections/improvements.


I based it on an implementation from here for ANTLR3.

This is the lexer pattern definition:

INCLUDE : '#include' WS+ IDENT {
    final String importText = getText();
    final String filename   = extractName(importText);
    final Reader reader     = getReaderForFile(filename);
    if ( reader != null ) {
        try {
            final CompileContext ss = new CompileContext(getInputStream());
            includes.push(ss);

            final ANTLRInputStream antlrStream = new ANTLRInputStream(reader);
            setInputStream(antlrStream);

        } catch ( final Throwable e ) {
            throw new Error("Cannot import source=[" + sourceName + "]");
        }
    }
} -> skip;


@lexer::members {
   private final Stack<CompileContext> includes = new Stack<CompileContext>();


   private static class CompileContext {
       public final IntStream input;
       public final int       marker;

       private CompileContext(final CharStream input) {
          this.input  = input;
          this.marker = input.mark();
       }
   }


   public Token nextToken() {
       Token token = super.nextToken();

       if ( token.getType() == Token.EOF && !includes.empty() ) {
          final CompileContext ss = includes.pop();
          setInputStream(ss.input);
          getInputStream().release(ss.marker);

          token = this.nextToken();
       }

       if ( ((CommonToken) token).getStartIndex() < 0 ) {
          token = this.nextToken();
       }

       return token;
   }


   // This was unfortunate that we had to override & duplicate it, however, we had to avoid resetting the DFA's state.
   public void reset() {
      // wack Lexer state variables
      if ( _input !=null ) {
         _input.seek(0); // rewind the input
      }

      _token = null;
      _type = Token.INVALID_TYPE;
      _channel = Token.DEFAULT_CHANNEL;
      _tokenStartCharIndex = -1;
      _tokenStartCharPositionInLine = -1;
      _tokenStartLine = -1;
      _text = null;

      _hitEOF = false;
      _mode = Lexer.DEFAULT_MODE;
      _modeStack.clear();

      getInterpreter().setLine(1);
      getInterpreter().setCharPositionInLine(0);

      // Do not reset() the interpreter, since it resets the DFA state when calling to prevAccept.reset().
      // getInterpreter().reset();
   }
}

I'd appreciate any insight, ideas, suggestions, improvements and bug fixes.


Thanks!
  -- TR



Eric Vergnaud

unread,
Feb 13, 2017, 10:01:23 AM2/13/17
to antlr-discussion
Hi,

to my knowledge, all languages like C, C++ etc... which support includes rely on a preprocessor.
In short you need a rather simple grammar to detect includes and produce a new input.
From there you can fallback to the regular case

Eric
Reply all
Reply to author
Forward
0 new messages