If your ANTLR parser is too slow, try writing one by hand

1,501 views
Skip to first unread message

Adam

unread,
Mar 20, 2021, 9:05:37 AM3/20/21
to antlr-discussion
Hello,

I'd like to share these two nice articles:


They helped me realize that parsing is not magic, and if you have performance issues with ANTLR, you can actually write a decent parser for any sensible language by hand.

In my case, a recursive-descent parser and a simple lexer written in Python took not many more lines of code than ANTLR grammar with some code that produced AST out of it -- but it's around 50 times faster.

Of course, there are downsides: you don't have a nice and readable grammar, some cases can be tricky to handle, and it's easier to make a mistake. But in general, writing a parser is no different than writing any other program. It can also help you understand your grammar and the process of parsing itself much better.

If anyone is interested, here is the parser for my language, Pyxell, which is a mix of Python and C++: https://github.com/adamsol/Pyxell/blob/master/src/parser.py. Even though it may look complicated at first glance, it has a simple structure. Actually, you can have a basic working expression parser in an hour, and then it's quite easy to extend such a parser to your needs. I think the performance gain is worth it.

Mike Lischke

unread,
Mar 20, 2021, 9:59:40 AM3/20/21
to 'rtm...@googlemail.com' via antlr-discussion
I'd like to share these two nice articles:


They helped me realize that parsing is not magic, and if you have performance issues with ANTLR, you can actually write a decent parser for any sensible language by hand.

In my case, a recursive-descent parser and a simple lexer written in Python took not many more lines of code than ANTLR grammar with some code that produced AST out of it -- but it's around 50 times faster.

Sure, that's how it all started, with handwritten parsers, before parser generators had been invented. Likewise, you can write all your code in assembler and it will certainly be a thousand times faster, but productivity will probably suffer by doing that ;-)


Message has been deleted

rtm...@googlemail.com

unread,
Mar 20, 2021, 7:48:51 PM3/20/21
to antlr-discussion
I wrote the lexer/parser for my first language by hand so I knew how to. After that, no thanks. Flex+Bison is much better than doing it by hand, and Antlr adds much extra nice flexibility on top. Parsing indeed isn't magic, but it is just dull so let the machine do the boring stuff.

jan

eric vergnaud

unread,
Mar 20, 2021, 11:42:25 PM3/20/21
to antlr-discussion
May I add that if you need that parser in more than 1 language then the productivity issue and the compatibility concern will be show stoppers.

rtm...@googlemail.com

unread,
Mar 21, 2021, 8:18:24 AM3/21/21
to antlr-discussion
> a simple lexer written in Python took not many more lines of code than ANTLR grammar with some code that produced AST out of it -- but it's around 50 times faster

have we still got this python performance problem then?

jan

On Saturday, March 20, 2021 at 1:05:37 PM UTC Adam wrote:

Mike Lischke

unread,
Mar 21, 2021, 8:53:54 AM3/21/21
to 'rtm...@googlemail.com' via antlr-discussion

> a simple lexer written in Python took not many more lines of code than ANTLR grammar with some code that produced AST out of it -- but it's around 50 times faster 

have we still got this python performance problem then?

Recently we had a discussion about target performance and some interesting numbers in this ANTLR4 issue: https://github.com/antlr/antlr4/issues/3111


eric vergnaud

unread,
Mar 21, 2021, 10:09:01 AM3/21/21
to antlr-discussion
Python has a very well known performance problem i.e. pure Python is on average 30 times slower than Java. This often comes as a total surprise to Python developers because a lot of their computer intensive stuff relies on native libraries wrapped in Python (such as numpy)

The Python ANTLR runtimes perform roughly the same (that is: not great). I’m not sure it’s possible to significantly improve this, and I’d encourage anyone who has a performance requirement to not use Python in the first place.

JavaScript is faster but still too slow. I think differently about it because it’s the web language, so developers have no other option. I’m considering moving parts of the runtime to web assembly, hoping it will improve performance. Anyone willing to help with that is welcome.

rtm...@googlemail.com

unread,
Mar 21, 2021, 3:04:00 PM3/21/21
to antlr-discussion
It's the relative not absolute speeds that concern me, sure python's slow but what @Adam is saying is a hand-written lexer/parser in python is ~50 times faster than the antlr equivalent also in python. They're both python. That's (prima facie) nuts.

I don't have time to look into it now but something's not right.

jan

rtm...@googlemail.com

unread,
Mar 21, 2021, 3:33:33 PM3/21/21
to antlr-discussion
Ah, thanks Mike, didn't see this initially. Checking it out.

ta

jan

eric vergnaud

unread,
Mar 21, 2021, 10:03:41 PM3/21/21
to antlr-discussion
I don’t find this ~50 ratio surprising at all, although I'd be interested in knowing whether it was measured before or after warmup.

In order to execute the algorithm, the runtime populates dozens of thousands of dicts, for which it has to constantly compute hash codes.
A hand-written parser doesn’t have to do any of that, it relies on hard coded switches (or if/elif in Python). 

I’m pretty sure you’d observe a similar hit between an antlr and a hand written parser in Java or C++.

The primary goals of ANTLR are ease of use (left recursion is far from trivial to implement manually, ease of change (it takes minutes to alter a grammar and regenerate parsers) and cross-target compatibility.
Another important ANTLR feature is that parsing is always successful, which makes it possible to handle complex error management tasks separately from the complex parsing task.
These come with a cost i.e. performance hit, which is generally acceptable, except when performance matters more than $.

rtm...@googlemail.com

unread,
Mar 22, 2021, 3:00:39 PM3/22/21
to antlr-discussion
I don't understand this properly, if you can fill me in that would be helpful.

> In order to execute the algorithm, the runtime populates...

I understood this was to deal with ambiguities or multiple branches in the grammar.    By that, if the grammar is unambiguous they should be just a single track through any legal program, so large structures don't have to be built and that high cost doesn't *have* to be paid if you invested time in cleaning up the grammar.


> left recursion is far from trivial to implement manually

I thought that left recursion was removed by a pretty standard transformation, and unrelated to building additional data structures?

I clearly don’t know too much about the innards of antlr.

thanks

jan

Adam Sołtysik

unread,
Mar 23, 2021, 7:50:24 PM3/23/21
to antlr-di...@googlegroups.com
> I don’t find this ~50 ratio surprising at all, although I'd be interested in knowing whether it was measured before or after warmup.

It was measured without warm-up. Subsequent runs on the same file are indeed faster, something like 5 times, so my parser becomes only ~10 times faster. It's for rather small files, around 5 kB, with some real-life code. When I tested much larger files with repeated instructions, the performance difference was 10-15 times in the first run and it wouldn't improve after warming up.

However, a compiler needs to parse a file only once, so the warm-up isn't of much help in this case. And even if we assume the ratio of 10 for large files, it's still significant enough -- consider e.g. parsing a 500 kB library in 1 second vs 10 seconds. Though, of course, Python is not the best tool for parsing files of this size. Anyway, what annoyed me was a 5 kB file taking 0.5 seconds to parse ;-)

> The primary goals of ANTLR are ease of use (left recursion is far from trivial to implement manually [...]

The second article that I've linked to (about Pratt parsers) shows that it's actually surprisingly easy to parse expressions with correct precedence and associativity without any left recursion elimination.




--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/f54adadf-8476-4fce-8eb2-ba3ef00d4017n%40googlegroups.com.

rtm...@googlemail.com

unread,
Mar 24, 2021, 11:39:32 AM3/24/21
to antlr-discussion
I'm with you up to here

> The second article that I've linked to (about Pratt parsers) shows that it's actually surprisingly easy to parse expressions with correct precedence and associativity without any left recursion elimination.

Parsing is a solved problem (someone else solved it by writing a parser generator, and that's good enough for me). Learning a new parsing method is uninteresting to me, and un-useful also as it won't pay the bills. There are far more interesting - and genuinely unsolved - problems out there.

All that said, I am concerned too about the Antlr speed issue. I tried chugging through the ALL(*) paper but it's over my head, and I don't have time now. I'm sure (based on nothing at all, of course!) that this overhead can be eliminated.

jan
Reply all
Reply to author
Forward
0 new messages