It is surprising that the Python runtime is so much slower than the Java runtime.
Has there been any work done to get the Python runtime faster?
The Python code doesn't look that pythonic to me, which might be a contributing factor to the lack of speed. For example in ListTokenSource.py
ListTokenSource.nextToken()
has the following lines:
if len(self.tokens) > 0:
previousStop = self.tokens[len(self.tokens) - 1].stop
A more pythonic (and faster) version of this would be:
if self.tokens:
previousStop = self.tokens[-1].stop
The index access via -1
is three times faster:
Python 3.9.1 (default, Dec 10 2020, 10:36:35)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
🐍 — 1 » x = list("foobarbaz")
🐍 — 2 » x
🐍 — 2 « ['f', 'o', 'o', 'b', 'a', 'r', 'b', 'a', 'z']
🐍 — 3 » %timeit x[len(x)-1]
188 ns ± 3.39 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
🐍 — 4 » %timeit x[-1]
66.4 ns ± 0.821 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Searching for [len(
reveals several spots in the code where that change might be applied. I don't know if these are performance critical spots and if these micro optimisations would change that much, but they should be safe to apply.
There might be many more of these low-hanging fruits.
Would there be interest in a patch that applies these kind of changes? Where does the source code of the Python runtime live (https://pypi.org/project/antlr4-python3-runtime/ only points to https://www.antlr.org/).
Servus,
Walter
Several clients have asked us over time to help them improving the performance of their parsers, so we put together a list of things we normally look into and wrote an article on that:https://tomassetti.me/improving-the-performance-of-an-antlr-parser/
It would be very useful for us to receive comments and suggestions to improve the article and learn more about optimizing parsers.
The section "Ensuring you are parsing the whole input"
- Tbh. I don't understand why this section exists in this blog post. EOF at a rule's end doesn't affect performance at all.
It may well be out of place in the article but forgetting about the EOF bit me pretty hard and it is important to know.
[snipped]
- A missing EOF doesn't parse input differently, it only ignores parts that follow what's defined by your parser rules. Up to this point the parser run is exactly the same with and w/o EOF. It might fooling you into believing your input is correct, though, even if it contains more than what is allowed.
In my case the absence of EOF did cause it to parse the input differently (or seemed to) when it failed . It reported some very strange failures of the input which did not correspond to the grammar. It was very misleading. Adding the EOF brought back sanity.
From memory it would report errors earlier in the input that it 'should' have. It wasn't antlr's fault, mine only, but it was several hours wasted.
Hi Frederico, Gabriele,
Several clients have asked us over time to help them improving the performance of their parsers, so we put together a list of things we normally look into and wrote an article on that:https://tomassetti.me/improving-the-performance-of-an-antlr-parser/
It would be very useful for us to receive comments and suggestions to improve the article and learn more about optimizing parsers.
That's an idea I also have in mind since quite some time. ANTLR4 parser optimisation is something that frequently comes up, and be it only for your own software. As such your blog post is very welcome!
Below are some thoughts I collected while reading it:- In general you tend to write pretty lengthy articles. That's not per se bad, but it could use some more structure. You could either split the individual sections into own parts (pages) of the overall posting or at least provide a way to easily skip over certain areas (e.g. a table of contents with linked titles, collapsible sections or similar).
- At least from my standpoint the discussion of different runtimes is something I would not want in a posting about performance optimisation. Of course you will see big differences in execution, depending on the used target language (and if you mention it at all then it should definitely state what the the fastest runtimes are: C++ and C#). But first, you rarely can change the runtime at will, so it is usually not an option for performance tuning and second, every other optimisation you can do (mostly grammar tuning) applies to all runtimes equally.
- The C++ runtime never had memory-related performance problems that later got resolved. It's a fact the C++ runtime has to use certain memory management functionality that can be slow (relatively seen of course, even the slowest approach is orders of magnitudes faster than any scripting language like Javascript or Python).
- Talking about ASTs in the context of ANTLR4 based parsers is a bit misleading. IMO there's enough confusion about parse trees vs. ASTs, e.g. on Stackoverflow. ASTs are neither needed to optimise parser performance, nor are they even created by ANTLR4 based parsers.
- This sentence: "That is to say, we cannot tell you how using a different runtime will impact your grammar." is probably wrong. I guess you wanted to write "parser" instead of "grammar".
- The "Some Example Numbers" section is a bit strange, as if you want to prove that Python is slower than Kotlin. But that's not the subject of this article. I believe the vast majority of your readers will believe that Python is slower. It just makes the entire article longer with no benefit for the reader.
After that we come to the meat of the article, which I enjoyed to read.- It would be interesting to see real numbers that prove that reducing ambiguity really improves performance. The All(*) algorithm, as I understood it, just picks one alt, depending on certain rules. There's no time penalty when resolving ambiguities. I'd be happy to learn something new here, if I'm wrong.
- Complexity of the grammar is certainly a point! When you mention a path the parser chooses, you should probably mention which path you actually mean: the path through the ATN (which requires additional explanation in your article, which would be more interesting for hard core optimisation than, say, switching the runtime).
- I also think the article could benefit from an abstract description of the parsing process (unlimited look ahead, possible guided by predicates), which would help in the section further down where you mention the effect of predicates.
- This sentence "If the two alternatives are an over-complicated rule or a semantic predicate, the semantic predicate can be the better option." is a bit confusing. A predicate alone cannot be an alternative. It always is just a gateway to an alternative. Additionally, if a guarded alternative appears after an unguarded one, it will not speed up the parsing process, regardless of the complexity of the unguarded alt, because the alt evaluation happens top down in a linear fashion.
- The section "Are semantic predicates good or bad?" could use some additional content, because predicates are a very important aspect when it comes to parsing speed.
- I would expect that you answer the question you asked in the title.
- In parser rules a predicate must appear on the left hand side to have any impact in the parsing process (see also https://github.com/antlr/antlr4/blob/master/doc/predicates.md#semantic-predicates). But that has only as much performance impact as time is needed to evaluate the predicate. However, alternatives containing a left-hand predicate will not be stored in the DFA cache, to be able to re-evaluate them on each parse run. That could have an additional impact if a rule with gated alternatives has many alternatives. Would be good to do measurements with and w/o predicates to find reliable numbers.
- For lexer rules it's absolut essential to put the predicate at the end, because here the effect of no caching, is really heavy. All lexer rules are handled as if they were alts in a single big outer rule. So, if only one lexer rule contains a predicate on the left hand side, no lexer rule is cached at all! When you consider that all your keywords also are lexer rules, you can easily imagine how thousands of lexer rules behave, performance-wise, if they are not cached. Side note: the DFA cache was one of the major performance improvements in ANTLR4.
For the C++ runtime this became an extra burden because of the way memory is handled. I described this in my announcement blog post: https://soft-gems.net/the-antlr4-c-target-is-here/. But that is a runtime specific issue, that can easily be avoided by using predicates as recommended (as you have mentioned in your blog post).
- This sentence "The downside is that it is practically impossible to know beforehand whether moving the semantic predicate inside the lexer rule will improve performance or worsen it." can be improved by considering that the left hand side is really bad.- I doubt that "It can also improve it because it can quickly kill the exploration of a rule that is not valid in most cases." is true, given the above information. You can use gated alts (or conditional lexer rules), but the shortcut benefit is far less than the no-caching penalty. Also you should make more clear that lexer predicates are executed for each iteration in a lexer rule and should therefore especially quick to evaluate. Example: "ID: [a-zA-Z]+ {isKeyword($text)}?;" If you have an ID with 1000 characters, the predicate "isKeyword" is executed 1000 times.
The section "Handling Expressions":- Expression rules are especially sensitive because usually expressions can be nested with no limit. This actually has some severe implications:- ANTLR4 based parsers enter a new stack frame for each evaluation of a rule. If you have a large set of expression rules (implementing precedence as explained in the article), which call each other recursively, you might quickly reach the maximum stack depth of your environment. The worst result of that effect is with large enough expressions your entire app can crash unpredictably! Depending on the target language you have not much control over how to deal with such a situation. Increasing the stack depth is possible in some languages (e.g. C++), but not on all platforms. Threads (except of the main thread) usually have a pretty small stack size (IIRC only 256KB or 512KB).- Because of that effect it might sometimes not be possible to parse in a separate thread. This might cause performance problems too.
- Rewriting nested expression rules in a (direct left recursive) single rule is only possible for pretty simple expression types. As soon as you have more that a few operators things really get messy and your argument that hierarchical expression rules are ugly and/or cumbersome is something I dare to disagree with :-) Of course it also depends on the person working with the grammar, but IMO it's much better understandable how precedence is organised when you use hierarchical expression rules.
- Direct left recursive rules are no improvement from a memory or speed standpoint, because internally they are converted to non-left-recursive rules and enter a new stack frame on each invocation. Additionally, it might be more difficult to understand operator precedence with them.
- "The more complicated design has a better performance because it is not ambiguous." is a confusing sentence Where's ambiguity involved here? Direct left recursive rules are not ambiguous. Ambiguity is the situation where more than a single alt can be taken with the same input.
- The example taken from the documentation doesn't help the grammar writer to find a better way here, as it demonstrates how operator precedence is implemented in left recursive rules. If you want to explain what happens internally with left-recursive rules you could instead provide the algorithm how to manually convert direct-left-recursive rules to non-recursive rules. I have described this before on SO: https://stackoverflow.com/questions/41788100/antlr4-mutual-left-recursion-grammar/41789097#41789097. But again: this won't help much to improve performance.
The section "Ensuring you are parsing the whole input"- Tbh. I don't understand why this section exists in this blog post. EOF at a rule's end doesn't affect performance at all.
- The hint about the lexer that could slow down because of a missing EOF is highly misleading, since EOF is itself a (built-in) token and only used in the parser grammar.
- A missing EOF doesn't parse input differently, it only ignores parts that follow what's defined by your parser rules. Up to this point the parser run is exactly the same with and w/o EOF. It might fooling you into believing your input is correct, though, even if it contains more than what is allowed.
- Adding a catch-all lexer rule can also have an adverse effect, because it will change error handling. Without that you might get an error like "invalid input [" (if you normally don't handle square brackets). With the "ANY" rule, this will change to "unexpected ANY". Which one helps better to diagnose the error is pretty clear. Note also: the catch-all rule must not be used in your parser, or it might not show an error actually.The section "Changing the Prediction Mode"This is mostly fine, still I have a few comments:- You may want to make clearer that BailoutStrategy and prediction mode are two different optimisation measures:- The bail-out strategy is quicker because it throws an exception on error that is not catched in the runtime (all other parse exceptions are catched and forwarded as error callbacks via the error listeners). That way the parser will not do any recovery attempt and directly stop the parsing process. Important here is that you have to catch the ParseCancellationException in your code or it might bubble up further and might crash your application (if there's no global catch all exception handler by default, like in C++). But with that special exception parsing is stopped as quickly as possible if there's an error.- The prediction mode allows to do simpler parsing if your grammar allows that. You will get faster results in case of correct input. If you mostly have to deal with correct input then the two step approach (first SLL then LL prediction mode) is actually faster. SLL can report a syntax error, even if there's none (because of the simpler handling), in which case you have to confirm with LL prediction mode, if there's really an error. That means if you mostly have errors in the input then directly hopping to LL (w/o first trying SLL) is the better option. But also in this case you get away with SLL, if your grammar has no ambiguities.
- Your last sentence in this section makes no sense. Parse strategy and prediction mode have nothing to do with input sizes. Both are mostly related to error handling.The sections "Profiling the parser" and "Profiling the lexer" are a great addition to this article. Maybe you can go into more detail in a separate blog post about all the values you get from profiling? Would be really helpful to understand their meaning, the usual values you get, the limits that are still ok etc.The section "Caching"- This seems to be a wishy-washy section as it doesn't provide much info to actually solve a performance problem. If you had shown an example for, say, how to save the parse state for each line (aka. Incremental parsing) it would have been an absolutely great addition (because incremental parsing is currently not supported with ANTLR4 based parser and the Github patch which would add this is still pending IIRC).
- Also there you mention the AST again, which is not helping with performance and is rather noise, since ANTLR4 doesn't generate an AST.
Some additional optimisations and good styleBelow are some thoughts I collected over time, which I wanted to make into an article about what you should do to improve your grammar, what you better should not do and what is good grammar style. Maybe you want to enhance your article to become the resource I had in mind originally?
⦁ It's not so obvious when you only look at your grammar, but with the right consolidation of rule sub parts you can greatly improve speed. That idea helped me to remove a bottleneck in my grammar where many keywords where mentioned as alternatives for an identifier (the well known keyword-as-identifier problem). Look at the following rules:indexTypeClause:USING_SYMBOL indexType| TYPE_SYMBOL indexType;indexTypeClauseOpt:(USING_SYMBOL | TYPE_SYMBOL) indexType;The first shows a typical case where different keywords define alternatives with a common tail. For the first rule you get this ATN:
⦁ Avoid leading optional values. What this means is that you should not start a rule or alt with an optional value. Check this rule:simpleIdentifier:identifier (dotIdentifier dotIdentifier?)?;That results in this ATN:You can see there's a check for both optional dotIdentifiers. If there's no dot then no further check for the second optional part is done.You could also write that as:simpleIdentifier:(identifier DOT_SYMBOL)? (identifier DOT_SYMBOL)? identifier;However, the second form requires 2 checks instead of one with some optional parts and one more cached path in the DFA cache:Not so much a performance hog, but can save you from ambiguity trouble, especially when this rule is used in another rule where also an identifier could be matched.⦁ Speaking of ambiguities: there's a case that can get you into trouble which is sometimes hard to solve: rules that can match nothing:test:identifier*;You should rather make the call to it optional instead. Never combine both: a rule with potential empty match and an optional call. However, ANTLR4 will warn about that.That's what I had to add from my experience.
First of all, let me say that I am extremely grateful for your answer. It is amazing, very useful, and I appreciate the time you have dedicated to it: thank you!
- At least from my standpoint the discussion of different runtimes is something I would not want in a posting about performance optimisation. Of course you will see big differences in execution, depending on the used target language (and if you mention it at all then it should definitely state what the the fastest runtimes are: C++ and C#). But first, you rarely can change the runtime at will, so it is usually not an option for performance tuning and second, every other optimisation you can do (mostly grammar tuning) applies to all runtimes equally.I see your point, and I agree that very often there are constraints that prevent switching to a different runtime. However for some of our clients, this was the option that made the most sense, because they were at the beginning of their projects, so they did not have a lot of investment in code using the parse tree and they made the choice of the runtime based on the language some developer preferred, without giving it much of a thought. In those cases, changing runtime was possible and made a lot of difference. Perhaps if we make it easier to navigate through the article it will be less annoying to have these rarely used options.
- The "Some Example Numbers" section is a bit strange, as if you want to prove that Python is slower than Kotlin. But that's not the subject of this article. I believe the vast majority of your readers will believe that Python is slower. It just makes the entire article longer with no benefit for the reader.Well, you would be surprised by what people do not know :D
The section "Ensuring you are parsing the whole input"- Tbh. I don't understand why this section exists in this blog post. EOF at a rule's end doesn't affect performance at all.I will double-check that with Gabriele
Very good point, we will add your refined explanation, if that is ok for you
Some additional optimisations and good styleBelow are some thoughts I collected over time, which I wanted to make into an article about what you should do to improve your grammar, what you better should not do and what is good grammar style. Maybe you want to enhance your article to become the resource I had in mind originally?We would love that! Thank you! Obviously we would make it clear you are the author of these contributions
How did you get this visualization of the ATN?
These comments have been extremely useful for us. We would like to add an acknowledgment section to the article to thank you. We are very thankful for what you have taught us and for all the time you have invested in this. We hope that not just us, but many others, can benefit from your work.
Thank you for your great contributions, in particular Mike.
We just published the updated version of the articles, addressing your feedback and including your contributions. It is available here:
https://tomassetti.me/improving-the-performance-of-an-antlr-parser/
Talking about ASTs in the context of ANTLR4 based parsers is a bit misleading. IMO there's enough confusion about parse trees vs. ASTs, e.g. on Stackoverflow. ASTs are neither needed to optimise parser performance, nor are they even created by ANTLR4 based parsers.
> Avoid leading optional values. What this means is that you should not start a rule or alt with an optional value. Check this rule:
Hi! It seems like this part is written incorrectly, or at least suggested samples are not correct.
Consider the following samples and their ATNs:
trailing: ID (dotId dotId?)?;<trailing.png>leading: (idDot idDot?)? ID;
<leading.png>
Their ATNs are almost the same except for the extra ε in the second case. And these rules are equivalent unlike yours.
Your second samplesimpleIdentifier:(identifier DOT_SYMBOL)? (identifier DOT_SYMBOL)? identifier;is not equivalent to the first:simpleIdentifier:identifier (dotIdentifier dotIdentifier?)?;
Consider the name of the rule "Avoid leading optional values" it should be:simpleIdentifier:(identifierDot identifierDot?)? identifier;As I wrote before, ATNs for such rules are almost the same and it hardly affects performance.