c++ target is very slow in the officia's css3 example

43 views
Skip to first unread message

小冶

unread,
Sep 23, 2024, 2:24:05 AM9/23/24
to antlr-discussion
Using the latest ANTLR 4.13.2, with the official example repository's CSS3 grammar (https://github.com/antlr/grammars-v4/tree/master/css3), I generated a C++ parser to parse a simple CSS file of about 200 lines. Without adding any custom listener logic, on an M1 MacBook Pro, it takes 1 second for the first run! I'll consider that as a warm-up, but even after that, it takes 0.14 seconds each time! In contrast, my own homemade approach relying purely on string find only takes 0.002 seconds. Why is there such a performance difference? Is it because I'm using it incorrectly?

Ken Domino

unread,
Sep 23, 2024, 7:46:13 AM9/23/24
to antlr-discussion
As with many of the grammars in the grammars-v4 repo, you never know what to expect.

The problem is that the css3 grammar is ambiguous. To debug the grammar, you can use "trperf" for a quick explanation of the ambiguity or "trparse --ambig" for a slow but detailed explanation of the ambiguity. These tools are from the Trash toolkit. https://github.com/kaby76/Trash. For "trparse --ambig", I had to pare down the input to a few hundred lines, otherwise it would take too long.

To name a few ambiguities, we have Decisions 55 in "ruleSet", 60 in "declaration", 98 in "fontFaceDeclaration", and 112 in "ws".

"ws" is particularly egregious because it is the most time-consuming expression. Using "trparse --ambig", and Bash diff, I found these rules to be a problem.

expr
    : term (operator_? term)*
    ;

term
    : number ws           # knownTerm
    | percentage ws       # knownTerm
    | dimension ws        # knownTerm
    | String_ ws          # knownTerm
    | UnicodeRange ws     # knownTerm
    | ident ws            # knownTerm
    | var_                # knownTerm
    | url ws              # knownTerm
    | hexcolor            # knownTerm
    | calc                # knownTerm
    | function_           # knownTerm
    | unknownDimension ws # unknownTerm
    | dxImageTransform    # badTerm
    ;

operator_
    : '/' ws   # goodOperator
    | Comma ws # goodOperator
    | Space ws # goodOperator
    | '=' ws   # badOperator // IE filter and DXImageTransform function
    ;

For partial input

abbr[title] {
  border-bottom: 1px dotted;
}

we see two parses for "1px",

examples/xxx.css.112: (expr (term (dimension (Dimension "1px")) (ws (Space " "))) (term (ident (Ident "dotted")) (ws)))
examples/xxx.css.112: (expr (term (dimension (Dimension "1px")) (ws)) (operator_ (Space " ") (ws)) (term (ident (Ident "dotted")) (ws)))

(dotnet trparse -- --ambig examples/xxx.css | dotnet trxgrep -- ' //expr[.//Dimension[text()="1px"]]' | dotnet trtree -- -a)

A space should not be an operator!

Ken

小冶

unread,
Sep 24, 2024, 5:16:15 AM9/24/24
to antlr-discussion
thanks for replying! I'll try modifing grammar then test again.
Reply all
Reply to author
Forward
0 new messages