ANTLR4 include implementation in the Lexer

342 views

antlr4includelexer

Skip to first unread message

TR

unread,

Feb 13, 2017, 6:46:56 AM2/13/17

to antlr-discussion

Hi,

I am trying to implement an include macro that will work in the lexer stage in ANTLR4.

I have initially implemented it in the parser and that worked just fine, however my requirements had forced me to support things like:

file1.txt: > 5
file2.txt: if ( x #include file1.txt ) { ... }

This requires evaluating the included file within the parse context of the calling parser (afaik this cannot be trivially resolved resolved within the parser stage).

I have been experimenting with some implementation (see below), however I have my doubts about it since I cannot be certain I didn't miss anything, which is why I seek your knowledge & experience to suggest corrections/improvements.

I based it on an implementation from here for ANTLR3.

This is the lexer pattern definition:

INCLUDE : '#include' WS+ IDENT {

    final String importText = getText();

    final String filename   = extractName(importText);

    final Reader reader     = getReaderForFile(filename);

    if ( reader != null ) {

        try {

            final CompileContext ss = new CompileContext(getInputStream());

            includes.push(ss);

            final ANTLRInputStream antlrStream = new ANTLRInputStream(reader);

            setInputStream(antlrStream);

        } catch ( final Throwable e ) {

            throw new Error("Cannot import source=[" + sourceName + "]");

} -> skip;

@lexer::members {

   private final Stack<CompileContext> includes = new Stack<CompileContext>();

   private static class CompileContext {

       public final IntStream input;

       public final int       marker;

       private CompileContext(final CharStream input) {

          this.input  = input;

          this.marker = input.mark();

   public Token nextToken() {

       Token token = super.nextToken();

       if ( token.getType() == Token.EOF && !includes.empty() ) {

          final CompileContext ss = includes.pop();

          setInputStream(ss.input);

          getInputStream().release(ss.marker);

          token = this.nextToken();

       if ( ((CommonToken) token).getStartIndex() < 0 ) {

          token = this.nextToken();

       return token;

   // This was unfortunate that we had to override & duplicate it, however, we had to avoid resetting the DFA's state.

   public void reset() {

      // wack Lexer state variables

      if ( _input !=null ) {

         _input.seek(0); // rewind the input

      _token = null;

      _type = Token.INVALID_TYPE;

      _channel = Token.DEFAULT_CHANNEL;

      _tokenStartCharIndex = -1;

      _tokenStartCharPositionInLine = -1;

      _tokenStartLine = -1;

      _text = null;

      _hitEOF = false;

      _mode = Lexer.DEFAULT_MODE;

      _modeStack.clear();

      getInterpreter().setLine(1);

      getInterpreter().setCharPositionInLine(0);

      // Do not reset() the interpreter, since it resets the DFA state when calling to prevAccept.reset().

      // getInterpreter().reset();

I'd appreciate any insight, ideas, suggestions, improvements and bug fixes.

Thanks!

  -- TR

Eric Vergnaud

unread,

Feb 13, 2017, 10:01:23 AM2/13/17

to antlr-discussion

Hi,

to my knowledge, all languages like C, C++ etc... which support includes rely on a preprocessor.

In short you need a rather simple grammar to detect includes and produce a new input.

From there you can fallback to the regular case

Eric

Reply all

Reply to author

Forward

0 new messages