Hi all,
I have the following small grammar:
-------------------
Helpers
letter = ['a' .. 'z'] | ['A' .. 'Z'];
digit = ['0' .. '9'];
identifier = letter (letter|digit)*;
Tokens
whitespace = (' '|9|10|13)+;
begin = 'begin';
end = 'end';
semi = ';';
identifier = identifier;
Ignored Tokens
whitespace;
Productions
start = block*;
block {-> block} =
begin
tok_semi+
end
{-> New block.tok([tok_semi.tok])}
;
tok_semi {-> tok} =
tok semi {-> tok.tok}
;
tok {-> tok} = identifier {-> New tok.identifier(identifier)};
Abstract Syntax Tree
start = block*;
block = {tok} tok*;
tok = {identifier} identifier;
-------------------
And the following test program:
-------------------
public class ParseTest {
public static void main(String[] args) {
String input = "begin\n";
for (int i = 0; i < 30000; i++) {
input += "bla;\n";
}
input += "end\n";
Parser parser = new Parser(new Lexer(new PushbackReader(new StringReader(input), 1024)));
try {
Lexer l = new Lexer(new PushbackReader(new StringReader(input), 1024));
long begin = System.nanoTime();
while (true) {
Token t = l.next();
if (t instanceof EOF) {
break;
}
}
long end = System.nanoTime();
System.out.println("lexing: " + (end - begin) / 1000 / 1000 + " ms");
begin = System.nanoTime();
Start start = parser.parse();
end = System.nanoTime();
System.out.println("lexing+parsing: " + (end - begin) / 1000 / 1000 + " ms");
} catch (ParserException e) {
e.printStackTrace();
} catch (LexerException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
-------------------
Running this program results in:
lexing: 150 ms
lexing+parsing: 4846 ms
which is really slow (I'd expect the parsing time in the same order of magnitude as the lexing time).
The culprit here seems to be the following operation in /* reduce ANonTerminal$TokSemi */:
listNode3.addAll(listNode1);
This line is executed with listNode1 being [bla], [bla bla], [bla bla bla] up to 30,000 times "bla". Unfortunately, addAll() is an O(n) operation here.
Is there a way to fix this problem? Is this an inherent problem of sablecc or my grammar?
Sebastian