Help Identifying Python2 Bottlenecks

30 views
Skip to first unread message

Mitchell

unread,
Oct 20, 2017, 1:12:45 PM10/20/17
to antlr-discussion
Hi, I'm interested in using ANTLR's Python2 runtime for real-time parsing of files being edited in an IDE, but I'm running into a performance problem that may be attributed to the Python2 runtime (v4.7). As a test case, I'm using the simple Lua grammar (https://github.com/antlr/grammars-v4/tree/master/lua) in an unmodified state.

My Python test script is very simple:

import sys

from antlr4 import *
from LuaLexer import LuaLexer
from LuaParser import LuaParser
from LuaListener import LuaListener

class PythonScanner(LuaListener):
    pass

if __name__ == '__main__':
    lexer = LuaLexer(FileStream(sys.argv[1]))
    tokens = CommonTokenStream(lexer)
    parser = LuaParser(tokens)
    tree = parser.chunk()
    walker = ParseTreeWalker()
    scanner = PythonScanner()
    walker.walk(scanner, tree)

I ran this code on about ~650 lines of Lua code, and it ran in 3.514 seconds. For me, this is unacceptably slow. The Java runtime processes the same file on the order of 10 times faster.

The first handful of lines from the Python profiler output are as follows:

         6417204 function calls (5663872 primitive calls) in 3.514 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
188161/3322    0.461    0.000    1.948    0.001 ParserATNSimulator.py:1130(closure_)
   212496    0.224    0.000    0.224    0.000 ATNConfig.py:20(__init__)
196172/3322    0.188    0.000    1.952    0.001 ParserATNSimulator.py:1088(closureCheckingStopState)
   243660    0.169    0.000    0.704    0.000 ParserATNSimulator.py:1360(getEpsilonTarget)
546577/260252    0.143    0.000    0.273    0.000 {hash}
   886067    0.136    0.000    0.136    0.000 {isinstance}
    56518    0.112    0.000    0.397    0.000 ATNConfigSet.py:68(add)
   142028    0.106    0.000    0.259    0.000 ParserATNSimulator.py:1351(<lambda>)
    56518    0.081    0.000    0.199    0.000 ATNConfigSet.py:93(getOrAdd)
   436871    0.074    0.000    0.257    0.000 {method 'get' of 'dict' objects}
37387/26948    0.071    0.000    0.175    0.000 PredictionContext.py:545(getCachedPredictionContext)
     5663    0.061    0.000    0.304    0.000 LexerATNSimulator.py:127(execATN)
     1818    0.057    0.000    2.000    0.001 ParserATNSimulator.py:654(computeReachSet)
   188158    0.057    0.000    0.062    0.000 ParserATNSimulator.py:1271(canDropLoopEntryEdgeInLeftRecursiveRule)
95095/42607    0.050    0.000    0.058    0.000 PredictionContext.py:133(__eq__)

Unfortunately, nothing jumps out at me that demonstrates the bottleneck. At first glance, it appears that the runtime is already pretty optimized. It's just inherently slow. Now I did notice that the "ATNConfig" class in ATNConfig.py could benefit from a "__slots__" attribute. After adding one I was able to bring its cumulative time down from 0.224 to 0.184. However, total run time is still over 3s.

I did a bit of research earlier on this topic of Python runtime slowness, and saw mention that optimizing the grammar may help. However, the Lua grammar is quite simple and I'm not sure how to better optimize it.

I hope someone can advise. I have some background in Python and can delve deeper if someone can point me in the direction of where to look. Thanks.

Eric Vergnaud

unread,
Dec 5, 2017, 10:22:52 AM12/5/17
to antlr-discussion
Hi,
Python is on average 30 times slower than Java (when it’s ont juste wrapping c++ code)
So you can’t expect the antlr runtime to do any better
Reply all
Reply to author
Forward
0 new messages