ANTLR 4 CPP Target (Memory and Slow)

524 views
Skip to first unread message

Cleverson Ledur

unread,
Oct 11, 2016, 1:31:34 PM10/11/16
to antlr-discussion
I am parsing C++ using ANTLR4 with CPP target. The problem is that this takes too long to parse medium size files (>2Mbytes) and consumes too much memory (>2GB).

I read in The Definitive ANTLR 4 Reference (page 243) that it is possible to use the SLL(*) algorithm or LL(*) to speed up.

I tried inserting the following line of code in my main code to enable SLL parser:

parser.getInterpreter().setSLL(true);

This gives me the following error:

error: no matching function for call to antlrcpptest::cincleParser::getInterpreter()’
     parser
.getInterpreter().setSLL(true);
                           
^
In file included from ../libraries/antlr4/include/Lexer.h:34:0,
                 
from ../libraries/antlr4/include/antlr4-runtime.h:56,
                 
from ./generated/cincleLexer.h:7,
                 
from generated/main.cpp:31:
../libraries/antlr4/include/Recognizer.h:99:8: note: candidate: template<class T> T* antlr4::Recognizer::getInterpreter() const
     T
* getInterpreter() const {
       
^~~~~~~~~~~~~~
../libraries/antlr4/include/Recognizer.h:99:8: note:   template argument deduction/substitution failed:
generated
/main.cpp:156:27: note:   couldn't deduce template parameter ‘T’
     parser.getInterpreter().setSLL(true);


Does the ANTLR for CPP target have support for this? If not, do you suggest anything to speed up my parsing and save memory?

Thank you in advance

Mike Lischke

unread,
Oct 12, 2016, 4:08:12 AM10/12/16
to antlr-di...@googlegroups.com
I am parsing C++ using ANTLR4 with CPP target. The problem is that this takes too long to parse medium size files (>2Mbytes) and consumes too much memory (>2GB).

I'm not sure we can do much about the memory consumption, it's dictated by the way the runtime stores its data. For the parsing time: did you try multiple parse runs in a row? ANTLR4 has a significant warmup phase,(e.g. for my test suite warm up time is 6s, while all following runs take only ~0.8s. And I have seen much worse numbers (I wrote something about this in an earlier mail here). For a big expression query the warmup is ~8s while all following runs only take ~10ms (so the relation is almost 1000:1).


I read in The Definitive ANTLR 4 Reference (page 243) that it is possible to use the SLL(*) algorithm or LL(*) to speed up.

I tried inserting the following line of code in my main code to enable SLL parser:

parser.getInterpreter().setSLL(true);

Seems you have an older version of this book. The API has changed and it should be clear that the book is using Java, so you cannot copy the code 1:1. In C++ the call is:

parser.getInterpreter<ParserATNSimulator>()->setPredictionMode(PredictionMode::SLL);

Here is a parse function I actually use:

void parse(const std::string &sql, bool dumpTokenStream, bool dumpParseTree) {
  ANTLRInputStream input(sql);
  MySQLLexer lexer(&input);
  CommonTokenStream tokens(&lexer);
  MySQLParser parser(&tokens);

  parser.setBuildParseTree(true);

  // First parse with the bail error strategy to get quick feedback for correct queries.
  parser.setErrorHandler(std::make_shared<BailErrorStrategy>());
  parser.getInterpreter<ParserATNSimulator>()->setPredictionMode(PredictionMode::SLL);
  parser.removeErrorListeners();

  try {
    tokens.fill();
  } catch (IllegalStateException &) {
    std::cout << "Error: illegal state found, probably unfinished string." << std::endl;
  }

  if (dumpTokenStream) {
    for (auto token : tokens.getTokens())
      std::cout << token->toString() << std::endl;

    std::cout << std::endl;
  }

  tree::ParseTree *tree;
  auto start = std::chrono::steady_clock::now();
  try {
    tree = parser.query();
  } catch (ParseCancellationException &pce) {
    // If parsing was cancelled we either really have a syntax error or we need to do a second step,
    // now with the default strategy and LL parsing.
    tokens.reset();
    parser.reset();
    parser.setErrorHandler(std::make_shared<DefaultErrorStrategy>());
    parser.getInterpreter<ParserATNSimulator>()->setPredictionMode(PredictionMode::LL);
    parser.addErrorListener(&ConsoleErrorListener::INSTANCE);
    tree = parser.query();
  }
  auto duration = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::steady_clock::now() - start);

  if (parser.getNumberOfSyntaxErrors() > 0 || lexer.getNumberOfSyntaxErrors() > 0) {
    std::cout << "Errors encountered: " << parser.getNumberOfSyntaxErrors() + lexer.getNumberOfSyntaxErrors()<< std::endl;
    std::cout << "Query: " << sql << std::endl;
  }

  std::cout << "Parse time: " << duration.count() / 1000.0 << " ms" << std::endl;

  if (dumpParseTree && tree != nullptr) {
    std::cout << std::endl << "Parse tree: " << tree->toStringTree(&parser) << std::endl;
  }
}

Side note: the separate token.fill() call is not necessary normally and only used to here for timing reasons.


Mike Lischke

unread,
Oct 12, 2016, 8:02:03 AM10/12/16
to antlr-di...@googlegroups.com


  tree::ParseTree *tree;
  auto start = std::chrono::steady_clock::now();
  try {
    tree = parser.query();

I'm sorry, I used code here that is not yet usable except on my local machine. The tree var must be a Ref<ParseTree> currently.
Reply all
Reply to author
Forward
0 new messages