Similar problem here.
I am using Python 3.9. Wtih ANTLR 4.10
There are two issues in generated code.
(Note PyPi only has 4.10, not 4.10.1)
The issue is with other code, but to validate this issue I used the 01-Hello example.
Issue 1:
The import line for the HelloParser and HelloLexer is incorrect.
The line
Fix: should read
from typing import TextIO
Issue 2:
The generated code puts a stack of binary data in a StringIO buffer, wrapped in a function serializedATN()
I wonder why it's not just a UTF-8 string (e.g. "".join(lines)). Though a StringIO buffer is used it's accessed as an array rather than a file so gains nothing
def serializedATN():
with StringIO() as buf:
buf.write("\3\u608b\ua72a\u8133\ub9ed\u417c\u3be7\u7786\u5964\3\5")
buf.write("\b\4\2\t\2\3\2\3\2\3\2\3\2\2\2\3\2\2\2\2\6\2\4\3\2\2\2")
buf.write("\4\5\7\3\2\2\5\6\7\4\2\2\6\3\3\2\2\2\2")
return buf.getvalue()
could read
def serializedATN():
buf = (
"\3\u608b\ua72a\u8133\ub9ed\u417c\u3be7\u7786\u5964\3\5",
"\b\4\2\t\2\3\2\3\2\3\2\3\2\2\2\3\2\2\2\2\6\2\4\3\2\2\2",
"\4\5\7\3\2\2\5\6\7\4\2\2\6\3\3\2\2\2\2")
return "".join(buf)
As this is a string buffer object, the later getvalue() method returns a string, and this is passed to the ATNDeserializer().deserialize(data) method.
This method (like the java basis) expects an array of integers. Hence it fails to parse.
Fix:
return tuple(ord(x) for x in buf.getvalue())
or even better
return tuple(ord(x) for x in "".join(buf))
According to the docs, the Java library has overloaded functions for string and int, but Python does not support overloaded functions.
Should the deserialize use duck typing and fix it internally?
Issue 3:
Even if I patch the generated code it does not work. "Exception: Could not deserialize ATN with version 3 (expected 4)."
(You can see above that it IS 3)
The generated code puts the 1st byte with value 3 - that surely makes this a ANTLR3 ATN structure rather than ANTLR4?
Am I calling ANTLR wrong. The docs say nothing about . Why is it generating version 3 ATN?
Issue 4:
I've not actually seen this yet, but surely if the character " (double quote) happens to appear in the binary data then it will break, causing a syntax error.
Or is that handled by encoding any dodgy characters as \xxx?
Issue 5:
The type hint for the data parameter to the deserialize method in ATNDeserializer.py is wrong.
Fix:
It is currently an 'int' but it should be Iterable[int] (from typing import Iterable)
How to proceed?
I can fix the python each time I build but how to correct the code generator to save having to do that?
Is there going to be a new (corrected) PyPi package any time soon?