Performance of lexer generated for python2

70 views
Skip to first unread message

Reshma Godse

unread,
Dec 31, 2014, 7:09:36 AM12/31/14
to antlr-di...@googlegroups.com
Hi,

I am working on the grammar to parse input file of our application. I found that the lexer generated for python2 takes around 3+ minutes to tokenize a 600 line file. The parser's performance has improved a lot after setting PredictionMode to 'SLL'. But the lexer is becoming bottleneck. Is there any way to improve performance of lexer in python2?

Thanks for help!
Reshma

Terence Parr

unread,
Jan 2, 2015, 12:29:12 PM1/2/15
to antlr-di...@googlegroups.com
I think a fairly literal translation was made from the Java code to Python and so it is likely to be less efficient because it might not use the fast part of Python, if that makes sense. That sounds kind of slow; hopefully we can improve speed in the future.
Ter

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Dictation in use. Please excuse homophones, malapropisms, and nonsense. 

Sam Harwell

unread,
Jan 2, 2015, 2:36:09 PM1/2/15
to antlr-di...@googlegroups.com

Hi Reshma,

 

The translation to Python did not preserve the semantics of certain performance-critical aspects of the Java code. However, since the final result was a correct parse tree, the author of the Python target ruled that any resulting impact on runtime performance did not constitute a bug, and will not be modified. The following issue and pull request contain information about one of these cases, where it was determined that the Python target is known to exhibit exponential complexity (O(k^n), k is a function of the grammar and n is the input size).

 

https://github.com/antlr/antlr4-python2/issues/23

https://github.com/antlr/antlr4-python2/pull/24

 

If you are putting your code into production with unknown inputs, you should take one of the following steps to make sure your application does not encounter these performance issues:

 

1.       Use one of the Java releases (either the reference release or my memory-optimized fork)

2.       Use the C# release (which is derived from the memory-optimized Java fork)

3.       Build your own copy of the Python runtime and manually apply the applicable patches submitted by @bdkearns (this will address some, but not all of the known performance issues in the Python target)

 

Sam

Reshma Godse

unread,
Jan 5, 2015, 2:50:02 AM1/5/15
to antlr-di...@googlegroups.com
Hello Sam,

Thank you very much for your help. Your options will help us in deciding next set of actions.

Thanks,
Reshma

Eric Vergnaud

unread,
Jan 24, 2015, 12:26:38 AM1/24/15
to antlr-di...@googlegroups.com
Hi Reshma,

I am the author of the Python and JavaScript targets.
I believe some of the comments in this thread are misleading.
The approach for improving performance is test-driven, and is on-going - see #36.
Anyone interested in improving performance based on facts rather than opinions is welcome.

Eric
 


Reply all
Reply to author
Forward
0 new messages