Probably a pretty noob issue...

50 views
Skip to first unread message

Ben Griffin

unread,
Aug 9, 2018, 4:17:40 AM8/9/18
to antlr-discussion
So we bit the bullet and after a couple of decades of using our own hand-rolled parser, opted for Antlr4 (Cpp) to replace our parsers.
What we were surprised to see was a 2000% drop in performance.  
The language is not complex: the lexer is about 100loc, and the parser 30loc .  I have enclosed them.
Most of the parses (90%+) are over small input strings - a few hundred bytes or so.

Maybe it's the way we've implemented it?  

The code itself is involved in parameterised macro expansion - similar to Latex macros, though the syntax is different* - and we are generating (approx 50 kb) files from 'seed' macros.
The macros call macros, which call macros (and can be iterative, recursive, etc - just like Latex macros).
Each process involves about 2,000 macros, and then maybe 1,000 seeds - each with slightly different contexts (generating unique results) even though they all use the same set of 2,000 macros.  

We are using the visitor approach, because each macro will be visited many times (some hundreds of thousands of times).
So what we are currently doing is loading all the macros into the application, and for each macro,

(1) creating a new ANTLRInputStream from the definition
(2) Attaching a new Lexer to that,
(3) creating a new CommonTokenStream with that.
(4) creating a new Parser from that.
(5) creating and attaching a parserVisitor
(6) setting setBuildParseTree(true)
(7) parsing it into a parseTree.

Then, for each instance of a macro being called we use the visitor to run through the parseTree, and when a parameter reference (an 'injection' is invoked) we pass the expansion up to the parent to evaluate that part of the expression.  Macros are divided into user macros, which have an expansion component, and 'internal' macros which do something (eg iEq compares the first two parameters, and returns either the third or fourth parameter depending if they match or not).  Although we preparse user macro parameters, we cannot preparse internal macro parameters as they cause side-effects.  So we cannot pre-expand all the macros, as some of them are context dependent.

A potential solution might be to do some partial pre-expansion - this would reduce the number of macro calls, but at the expense of having much larger macros to visit on each instance - when we tried this about a decade ago, there were no significant benefits with our old parser.

Profiling reveals a huge time hit caused by memory management and dynamic casting
1.26 min   12.7% 1.26 min szone_malloc_should_clear
55.30 s       9.3% 55.30 s std::__1::__shared_weak_count::__release_shared()
43.85 s       7.4% 43.85 s free_tiny
43.07 s       7.2% 43.07 s __dynamic_cast
38.38 s       6.4% 38.38 s std::__1::__shared_weak_count::__add_shared()
31.06 s       5.2% 31.06 s tiny_free_no_lock

.... when charged by non-system callers...
1.88 min   19.0% 1.88 min   ParserATNSimulator::closure_(shared_ptr<ATNConfig> const&, ATNConfigSet*, unordered_set<shared_ptr<ATNConfig>, ATNConfig::Hasher, ATNConfig::Comparer, allocator<shared_ptr<ATNConfig> > >&, bool, bool, int, bool)
44.07 s    7.4% 44.07 s ParserATNSimulator::getEpsilonTarget(shared_ptr<ATNConfig> const&, Transition*, bool, bool, bool, bool)
29.49 s    4.9% 29.49 s PredictionContext::merge(shared_ptr<PredictionContext> const&, shared_ptr<PredictionContext> const&, bool, PredictionContextMergeCache*)
28.21 s    4.7% 28.21 s ArrayPredictionContext::ArrayPredictionContext(shared_ptr<SingletonPredictionContext> const&)
21.94 s    3.7% 21.94 s ATNConfigSet::add(shared_ptr<ATNConfig> const&, PredictionContextMergeCache*)
19.82 s    3.3% 19.82 s ParserATNSimulator::closureCheckingStopState(shared_ptr<ATNConfig> const&, ATNConfigSet*, unordered_set<shared_ptr<ATNConfig>, ATNConfig::Hasher, ATNConfig::Comparer, allocator<shared_ptr<ATNConfig> > >&, bool, bool, int, bool)
16.95 s    2.8% 16.95 s ParserATNSimulator::ruleTransition(shared_ptr<ATNConfig> const&, RuleTransition*)
16.61 s    2.8% 16.61 s PredictionContext::mergeSingletons(shared_ptr<SingletonPredictionContext> const&, shared_ptr<SingletonPredictionContext> const&, bool, PredictionContextMergeCache*)
14.80 s    2.4% 14.80 s ArrayPredictionContext::~ArrayPredictionContext()
13.17 s    2.2% 13.17 s SingletonPredictionContext::create(shared_ptr<PredictionContext> const&, unsigned long)

Having said all that, frankly I am now lost.  It maybe that I am missing something obvious...
Any advice on what to look for would be welcome!

 -Ben.


* Syntax example - FYI, and otherwise not interesting.
⌽foo(⌽bar(⍟1),⎡a,b⎤)

⌽ - invoke a macro (by name, eg 'foo') with a comma-delimited set of parameters enclosed in either 'standard brackets' or 'fat' brackets... so, ⌽foo❪bar❫ === ⌽foo(bar)

⍟1 - injection point for a called parameter (in this case, parameter 1).

⎡...⎤ - braces (for passing parameters that have commas in them).

⎣...⎦ - literals (for passing anything as text)

We also identify whitespace as tabs and newlines, which are optionally stripped in a macro definition.


Here is a user macro 'rec'  (which is recursive)

⌽iEq(⍟1,,,[⌽iLeft(⍟1,1)]⌽rec(⌽iRight(⍟1,-1)))

so rec(ABC) would expand to "[a][b][c]"



aLexer.g4
aParser.g4

Ben Griffin

unread,
Aug 9, 2018, 10:32:13 AM8/9/18
to antlr-discussion
Resolved: The construction of the parse tree is slow whereas the visitor is fast, so it's important to ensure that parse-trees are used for future visits.
Reply all
Reply to author
Forward
0 new messages