Listener performance - Building a linter for C - style50

64 views
Skip to first unread message

tob...@cs50.net

unread,
Aug 26, 2016, 1:16:57 AM8/26/16
to antlr-di...@googlegroups.com
Hi,

I'm building a linter in C using the Java ANTLR target.
https://github.com/tobiasbueschel/style50

Currently, we are using a MainListener class that extends CBaseListener to check whether the parsed C code adheres to our style guide. However, as the class is getting too big, I'd like to split each style rule group to its own classes and thereby write more focused functions in order to improve testing & maintainability.


To do so, I see two options:
  1. Create classes for each style group and write methods to check each style rule - these methods would then be invoked through the tree walk in the MainListener i.e. all @ "overriding" would happen in MainListener.

    public class Variable{

    public static void checkVarNameLength(CParser.VarNameContext ctx){
    if(15 > ctx.getTextLength()){
    CStyleListener.addError(new StyleError(ctx.getText(), line, col));
    }
    }
    }
public class MainListener{

public static ArrayList<StyleErrors> errors;

static{
    errors
= new ArrayList<StyleErrors();
}

@Override
public static void enterDirectDeclarator(CParser.DirectDeclaratorContext ctx) {
    Variable.checkVarNameLength(ctx);
}

 
Start walking the tree using MainListener:

this.mainListener = new MainListener(tokens, 0);
ParseTreeWalker.DEFAULT.walk(mainListener, tree);


2. Create separate listeners for each style group and combine @Override & method implementations into one class such as:

public class VariableListener{

public ArrayList<StyleErrors> errors;

public 
VariableListener () {
    this.errors 
= new ArrayList<StyleErrors();
}

@Override
public static void enterDirectDeclarator(CParser.DirectDeclaratorContext ctx) {
    Variable.checkVarNameLength(ctx);
}

public void checkVarNameLength(CParser.VarNameContext ctx){
if(15 > ctx.getTextLength()){
this.errors.add(new StyleError(ctx.getText(), line, col));
}
}

And for each listener walk the AST:
 

ParseTreeWalker walker = new ParseTreeWalker();
listeners.forEach(listener -> walker.walk(listener, tree));
 
I have seen that Tailor, a popular linter for Swift, uses this approach: https://github.com/sleekbyte/tailor/tree/master/src/main/java/com/sleekbyte/tailor/listeners
 
For option 2, I would need to create a listener for each style group (Comments, Conditions, Switches, Functions, Variables, Indentation, Whitespace, Pointers, Loops). Hence, I am thinking that it would cause the CLI program to be dramatically slower as it would need to walk the tree not only once but 9+ times (for each SubListener) that I create.
With that said, I would prefer option two from a design / architecture point of view, but I am curious whether anyone has some recommendations in this regard?

Is option 2 indeed noticeably slower? And is there a best-practice / recommended architecture for such implementations with ANTLR?

Thanks & best.   

 
 
 
 

Gerald Rosenberg

unread,
Aug 26, 2016, 4:38:51 PM8/26/16
to antlr-discussion
On Thursday, August 25, 2016 at 10:16:57 PM UTC-7, tob...@cs50.net wrote:
 
To do so, I see two options:
  1. Create classes for each style group and write methods to check each style rule - these methods would then be invoked through the tree walk in the CStyleListener i.e. all @ "overriding" would happen in CStyleListener.
 
2. Create separate listeners for each style group and combine @Override & method implementations into one class such as:


Is option 2 indeed noticeably slower? And is there a best-practice / recommended architecture for such implementations with ANTLR?

Whether it would be "noticeably" different is subjective.  In general terms, on a warm-VM, the different is likely to be very small.

As an empirical point of comparison, I use an approach similar to your option 1. I have tested the performance difference between doing a single tree-walk, calling a context-type specific class for each node, versus performing multiple sequential walks, each walk only calling context classes for a subset of the nodes.  With all relevant things being equal, the performance difference was not really measurable: well under 10ms (typically just 2 or 3ms) for parse trees of ~30K+ nodes and ~60 node types largely independent of the number of walks implemented.

Not a rigorous test, to be sure, but well sufficient to support the design preference to use as many tree-walks as desirable to separate operational concerns and best position for revision and maintainability. 

As between your options 1 and 2, and any others, there does not appear to be any one best practice.

FWIW, GenPackage is a generator of context-type specific class files, and relating support classes, that eases implementation of an option 1 style approach.

Mike Lischke

unread,
Aug 27, 2016, 4:20:14 AM8/27/16
to antlr-di...@googlegroups.com

Am 26.08.2016 um 22:38 schrieb Gerald Rosenberg <gbrose...@gmail.com>:

As an empirical point of comparison, I use an approach similar to your option 1. I have tested the performance difference between doing a single tree-walk, calling a context-type specific class for each node, versus performing multiple sequential walks, each walk only calling context classes for a subset of the nodes.  With all relevant things being equal, the performance difference was not really measurable: well under 10ms (typically just 2 or 3ms) for parse trees of ~30K+ nodes and ~60 node types largely independent of the number of walks implemented.

What I do often is to create local listeners, which only handle a small subpart of the entire parse tree. I then use them as local walkers, e.g. when I have to parse out an identifier with varying parts in a larger context, I create a local identifier listener and let it walk over the identifier subtree only (e.g. in one of the listener methods that handle the larger context). That way you don't have to walk over the full tree multiple times and avoid trouble at the same time if the same rule needs to be handled differently depending on the context.

tob...@cs50.net

unread,
Sep 6, 2016, 4:18:19 PM9/6/16
to antlr-di...@googlegroups.com
Thanks for both replies! Really appreciate it!

I decided to forge ahead with the notion of having multiple listeners and regarding performance it does not seem to make a noticeable difference! I also further extracted some of the logic (style checking) that was happening inside the listeners and moved it to individual handler classes. The result can be found here: https://github.com/tobiasbueschel/style50

Thanks again & best,
Tobias

Mike Lischke

unread,
Sep 7, 2016, 3:14:10 AM9/7/16
to antlr-di...@googlegroups.com
> I decided to forge ahead with the notion of having multiple listeners and regarding performance it does not seem to make a noticeable difference! I also further extracted some of the logic (style checking) that was happening inside the listeners and moved it to individual handler classes. The result can be found here: https://github.com/cs50/style50/tree/master/src/c



To be honest: I didn't expect much speed win with the multi listener approach, as walking the tree is a very cheap operation. It's mostly to structure the post parsing process better, which often is actually much more work than creating the parser.

Mike
--
www.soft-gems.net

Reply all
Reply to author
Forward
0 new messages