I generated a python3 parser using for the java target with the antlr-4.7-complete.jar and the python3.g4 from the grammar repository. In my own listener I override enterStmt, enterClassDef, and enterFuncDef, along with the corresponding exit methods.
Inside my listener I provide this function to use it:
@Override
public Map<ConstructId, Construct> getConstructs() throws FileAnalysisException {
byte[] array = {};
try {
final String env = isVirtualEnv ? "vulas-virtualenv/" : "";
final String path = env + this.module.replace(".", "/") + ".py";
array = Files.readAllBytes(Paths.get(path));
} catch (IOException e) {
e.printStackTrace();
}
final ByteBuffer bytes = ByteBuffer.wrap(array);
final CodePointBuffer buffer = CodePointBuffer.withBytes(bytes);
final CodePointCharStream stream = CodePointCharStream.fromBuffer(buffer);
final Python3Lexer lexer = new Python3Lexer(stream);
final CommonTokenStream tokens = new CommonTokenStream(lexer);
final Python3Parser parser = new Python3Parser(tokens);
final ParseTree classTree = parser.classdef();
final ParseTreeWalker walker = new ParseTreeWalker();
walker.walk(this, classTree);
constructs.putAll(stmts);
return this.constructs;
}
The following input does not properly parse, it only prints a red error message (not an exception).
import test2
import subdir.test3
#comment
var8 = 8
var9 = 0
print('useless print 1')
print('useless print 2')
def outFunc():
return var8
var10 = None
var11 = var10
var12 = ''
var13 = var12
class class0:
var0 = ''
def __init__(self):
var1 = 0
def fun1():
return var0
def fun2():
print('useless print 3')
return var1
class class3:
def fun1():
return var0
class class2:
var0 = ''
def __init__(self):
var1 = 0
def fun1():
return var0
def fun2():
print('useless print 4')
return var1
print('useless print 5')
print('useless print 6')
I keep a stack of classes in my enterStmt method.
public void enterStmt(Python3Parser.StmtContext ctx) {
System.out.println("stmt");
if (classes.size() > 0 || isInsideFunc) {
return;
}
System.out.println("stmt2");
}
Only stmt is ever printed, and it is obviously only printed for the statements inside classes. The only output for the above file I get is:
line 1:0 mismatched input 'import' expecting 'class'
If I remove the import statements it complains about the statement after the comment, which is the first statement it encounters.
I have other test input files that properly parse a file with nothing outside of the classdefs but this is not how real world python looks across an entire project so it is not very usable in this state. I do not know what is causing this.