Questions about Max K found when profiling an ANTLR grammar

156 views
Skip to first unread message

Emmanuel Oga

unread,
Jan 13, 2021, 3:07:33 AM1/13/21
to antlr-discussion
Hello,

I wrote an ANTLR grammar for a markup-like language that uses several lexer modes. I used the IntelliJ idea ANTLR profiler and found that the "Max K" metric reported a big number (in the order of the hundreds), so I gradually modified the grammar until I got it down to max 10 tokens.

My understanding is that max K refers to how many look ahead tokens the ANTLR algorithm required to parse the input. I'm wondering what's a good rule of thumb for this metric... should it be kept below a certain threshold?

I think most handwritten parsers usually use 1 or 2 look ahead tokens, so I'm thinking if someone wanted to write a parser for my language by hand for some reason (maybe even me :-) it would be a bit nightmarish to manually look ahead up to 10 tokens to disambiguate ... is this a correct assessment?

If I get it right it would be more convenient to try to modify the grammar such that look ahead is never higher than, say, 2 or 3 tokens, which would simplify hand-writing a parser following ANTLR grammar somewhat closely. 

Thank you,

E

rtm...@googlemail.com

unread,
Jan 22, 2021, 5:22:24 AM1/22/21
to antlr-discussion
I'm rather rusty on this but if one were to hand-write write a recursive descent parser, the lookahead it could cope with would be arbitrarily large, but you would pay for it heavily in backtracking. I'd not consider needing large lookaheads a good thing at all - I recall an RD parser that when it met ternary expressions (C-style   x ? y : z) and they were heavily nested, it would parse them but take literally minutes. I guess the backtracking went O(n^e) where e was the nesting depth of the ternary exprs.

If the lookahead is done directly, I'd expect the memory to blow up exponentially instead of time.

So no, not a good thing. If possible, have a lookahead of just one token. I think that means a unique opening word/phrase per structure, if you can.

again, I'm rusty on this but I think that's right

cheers

jan
Reply all
Reply to author
Forward
0 new messages