Sablecc generates a slow parser for some AST rules

32 views
Skip to first unread message

Sebastian Biallas

unread,
Oct 22, 2013, 5:00:32 AM10/22/13
to sab...@googlegroups.com
Hi all,

I have the following small grammar:

-------------------
Helpers

letter = ['a' .. 'z'] | ['A' .. 'Z'];                      
digit = ['0' .. '9'];
identifier = letter (letter|digit)*;

Tokens

whitespace = (' '|9|10|13)+;
begin = 'begin';
end = 'end';
semi = ';';
identifier = identifier;

Ignored Tokens

whitespace;

Productions

start = block*;

block {-> block} =
begin
   tok_semi+
end
{-> New block.tok([tok_semi.tok])}
;

tok_semi {-> tok} =
   tok semi  {-> tok.tok}
;

tok {-> tok} = identifier {-> New tok.identifier(identifier)};


Abstract Syntax Tree

start = block*;

block = {tok} tok*;

tok = {identifier} identifier;
-------------------

And the following test program:
-------------------
public class ParseTest {
    public static void main(String[] args) {
        String input = "begin\n";
        for (int i = 0; i < 30000; i++) {
            input += "bla;\n";
        }
        input += "end\n";
       
        Parser parser = new Parser(new Lexer(new PushbackReader(new StringReader(input), 1024)));
        try {
            Lexer l = new Lexer(new PushbackReader(new StringReader(input), 1024));
            long begin = System.nanoTime();
            while (true) {
                Token t = l.next();
                if (t instanceof EOF) {
                    break;
                }
            }
            long end = System.nanoTime();
            System.out.println("lexing: " + (end - begin) / 1000 / 1000 + " ms");
            begin = System.nanoTime();
            Start start = parser.parse();
            end = System.nanoTime();
            System.out.println("lexing+parsing: " + (end - begin) / 1000 / 1000 + " ms");
        } catch (ParserException e) {
            e.printStackTrace();
        } catch (LexerException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
-------------------

Running this program results in:

lexing: 150 ms
lexing+parsing: 4846 ms

which is really slow (I'd expect the parsing time in the same order of magnitude as the lexing time).

The culprit here seems to be the following operation in /* reduce ANonTerminal$TokSemi */:
      listNode3.addAll(listNode1);

This line is executed with listNode1 being [bla], [bla bla], [bla bla bla] up to 30,000 times "bla". Unfortunately, addAll() is an O(n) operation here.

Is there a way to fix this problem? Is this an inherent problem of sablecc or my grammar?

Sebastian

Etienne Gagnon

unread,
Oct 22, 2013, 7:16:30 PM10/22/13
to sab...@googlegroups.com
Hi Sebastian,

You are right that list handling is unnecessarily slow in SableCC 3
generated code. This will be fixed in SableCC 4.

Etienne

Etienne Gagnon, Ph.D.
http://sablecc.org
Reply all
Reply to author
Forward
0 new messages