Python3 grammar does not parse code outside of a class

31 views
Skip to first unread message

Grayden Hormes

unread,
Jun 13, 2017, 8:15:10 PM6/13/17
to antlr-discussion

I generated a python3 parser using for the java target with the antlr-4.7-complete.jar and the python3.g4 from the grammar repository. In my own listener I override enterStmt, enterClassDef, and enterFuncDef, along with the corresponding exit methods.


Inside my listener I provide this function to use it:


@Override
 public Map<ConstructId, Construct> getConstructs() throws FileAnalysisException {
   byte[] array = {};
   
   try {
     final String env = isVirtualEnv ? "vulas-virtualenv/" : "";
     final String path = env + this.module.replace(".", "/") + ".py";
     array = Files.readAllBytes(Paths.get(path));
   } catch (IOException e) { 
     e.printStackTrace(); 
   }
   
   final ByteBuffer bytes = ByteBuffer.wrap(array);
   final CodePointBuffer buffer = CodePointBuffer.withBytes(bytes);
   final CodePointCharStream stream = CodePointCharStream.fromBuffer(buffer);
   
   final Python3Lexer lexer = new Python3Lexer(stream);
   final CommonTokenStream tokens = new CommonTokenStream(lexer);
   final Python3Parser parser = new Python3Parser(tokens);
   
   final ParseTree classTree = parser.classdef();
   
   final ParseTreeWalker walker = new ParseTreeWalker();
   
   walker.walk(this, classTree);
   constructs.putAll(stmts);
   
   return this.constructs;
 }

The following input does not properly parse, it only prints a red error message (not an exception).

import test2
import subdir.test3

#comment

var8 = 8
var9 = 0

print('useless print 1')
print('useless print 2')

def outFunc():
    return var8

var10 = None
var11 = var10

var12 = ''
var13 = var12

class class0:
	var0 = ''

	def __init__(self):
		var1 = 0

	def fun1():
		return var0

	def fun2():
		print('useless print 3')
		return var1

    class class3:
        def fun1():
            return var0

class class2:
	var0 = ''

	def __init__(self):
		var1 = 0

	def fun1():
		return var0

	def fun2():
		print('useless print 4')
		return var1

print('useless print 5')
print('useless print 6')

I keep a stack of classes in my enterStmt method.

  public void enterStmt(Python3Parser.StmtContext ctx) {
    System.out.println("stmt");
    if (classes.size() > 0 || isInsideFunc) {
      return;
    }
    
    System.out.println("stmt2");
  }


Only stmt is ever printed, and it is obviously only printed for the statements inside classes. The only output for the above file I get is:


line 1:0 mismatched input 'import' expecting 'class'


If I remove the import statements it complains about the statement after the comment, which is the first statement it encounters.

I have other test input files that properly parse a file with nothing outside of the classdefs but this is not how real world python looks across an entire project so it is not very usable in this state. I do not know what is causing this.

Eric Vergnaud

unread,
Jun 17, 2017, 12:32:37 PM6/17/17
to antlr-discussion
If you want to parse file, why are trying to parse a classdef?
Have you tried parse.file_input() ?
Reply all
Reply to author
Forward
0 new messages