How to rewrite expression grammar for faster parser generation

JohnHadj

unread,

Mar 11, 2013, 5:25:14 AM3/11/13

to pe...@googlegroups.com

I have a grammar that includes expressions (like a + b * c) with up to 16 levels of priority for operators. With one level pegjs generates a parser in around 100mS (IE9). With 16 levels pegjs takes over 100 Seconds to generate a parser (three orders of magnitude slower). I include a portion of the grammar below. Can anyone help by suggesting how I might rewrite the grammar to reduce the parser generation time to something reasonable?

expression = prio0
prio0 = prio1 prio0op prio0 / prio1
prio1 = prio2 prio1op prio1 / prio2
prio2 = prio3 prio2op prio2 / prio3
prio3 = prio4 prio3op prio3 / prio4
prio4 = prio5 prio4op prio4 / prio5
prio5 = prio6 prio5op prio5 / prio6
prio6 = prio7 prio6op prio6 / prio7
prio7 = prio8 prio7op prio7 / prio8
prio8 = prio9 prio8op prio8 / prio9
prio9 = prioA prio9op prio9 / prioA
prioA = prioB prioAop prioA / prioB
prioB = prioC prioBop prioB / prioC
prioC = prioE prioCop prioC / prioD
prioD = prioE prioDop prioD / prioE
prioE = prioF prioEop prioE / prioF
prioF = prefixed prioFop prioF / prefixed
prefixed = preOp primary / primary

Thanks

John

Shinya Hayakawa

unread,

Mar 11, 2013, 6:32:54 AM3/11/13

to pe...@googlegroups.com

Hi,
Is pegjs actually the cause of that problem?
I tried by adding the following, but I got a result in a moment.

prio0op = '0op'
prio1op = '1op'
prio2op = '2op'
prio3op = '3op'
prio4op = '4op'
prio5op = '5op'
prio6op = '6op'
prio7op = '7op'
prio8op = '8op'
prio9op = '9op'
prioAop = 'Aop'
prioBop = 'Bop'
prioCop = 'Cop'
prioDop = 'Dop'
prioEop = 'Eop'
prioFop = 'Fop'
preOp = 'preOp'
primary = .*

> --
> You received this message because you are subscribed to the Google Groups "PEG.js: Parser Generator for JavaScript" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pegjs+un...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

John Hadjioannou

unread,

Mar 11, 2013, 10:03:49 AM3/11/13

to pe...@googlegroups.com

Thank you for your input. I think it's a bit more complex. I have the
whole grammar below. It takes 27 seconds for pegjs to generate the grammar
(chrome, faster than IE9). When I replace the 17 expression lines with 1
line the time reduces to 71mS. I wonder whether I am hitting some point on
the complexity curve that is making times increase exponentially?

/* hCode synatx v0.1 */
start = enclosedSequence __

enclosedSequence = beginSymbol sequence endSymbol / beginSymbolQuick
sequence endSymbolQuick
sequence = clause (altSymbol clause)* / clause (altSymbolQuick clause)*
clause = guardedClause / unguardedClause
guardedClause = unguardedClause guardSymbol unguardedClause /
unguardedClause guardSymbolQuick unguardedClause
unguardedClause = phrase (goOnSymbol phrase)*
phrase = expression (andAlsoSymbol expression)*

expression = prio0

/* remove this for timing tests */

prio0 = prio1 prio0op prio0 / prio1
prio1 = prio2 prio1op prio1 / prio2
prio2 = prio3 prio2op prio2 / prio3
prio3 = prio4 prio3op prio3 / prio4
prio4 = prio5 prio4op prio4 / prio5
prio5 = prio6 prio5op prio5 / prio6
prio6 = prio7 prio6op prio6 / prio7
prio7 = prio8 prio7op prio7 / prio8
prio8 = prio9 prio8op prio8 / prio9
prio9 = prioA prio9op prio9 / prioA
prioA = prioB prioAop prioA / prioB
prioB = prioC prioBop prioB / prioC
prioC = prioE prioCop prioC / prioD
prioD = prioE prioDop prioD / prioE
prioE = prioF prioEop prioE / prioF
prioF = prefixed prioFop prioF / prefixed

/* end of remove */

/* add this for timing tests *
prio0 = prefixed prio0op prio0 / prefixed
* end of add */

prefixed = preOp primary / primary

primary = literal / enclosedSequence

/* Constructor */

beginSymbol = "BEGIN"i __
endSymbol = "END"i __

beginSymbolQuick = "(" __
endSymbolQuick = ")" __

altSymbol = "ELSE"i __
altSymbolQuick = "|" __

guardSymbol = "THEN"i __
guardSymbolQuick = "?" __

goOnSymbol = ";" __
andAlsoSymbol = "," __

/* Literal */ /* This section to be completed */

literal
= dummyLiteral

/* Operator */ /* This section to be completed */

preOp = dummyPrefixOperator
prio0op = dummySymbol0
prio1op = dummySymbol1
prio2op = dummySymbol2
prio3op = dummySymbol3
prio4op = dummySymbol4
prio5op = dummySymbol5
prio6op = dummySymbol6
prio7op = dummySymbol7
prio8op = dummySymbol8
prio9op = dummySymbol9
prioAop = dummySymbolA
prioBop = dummySymbolB
prioCop = dummySymbolC
prioDop = dummySymbolD
prioEop = dummySymbolE
prioFop = dummySymbolF

/* Identifier */

identifier = !reservedWord name:identifierName {return name}
identifierName = s:identifierStarter f:identifierFollower* __ { return
ids+idf.join("")}
identifierStarter = [a-zA-Z_]
identifierFollower = (identifierStarter / [0-9])

/* Reserved words, white space, comments
****************************************************************************
/

reservedWord
= beginSymbol
/ endSymbol
/ altSymbol
/ guardSymbol

anyCharacter
= .

EOF
= !.

whiteSpace
= [\t\v\f
\u00A0\uFEFF\u0020\u00A0\u1680\u180E\u2000\u2001\u2002\u2003\u2004\u2005\u20
06\u2007\u2008\u2009\u200A\u202F\u205F\u3000]

lineTerminator
= [\n\r\u2028\u2029]
__
= ( whiteSpace / lineTerminator )*

/* temporary placemarkers */
dummySymbol0 = "#0" __
dummySymbol1 = "#1" __
dummySymbol2 = "#2" __
dummySymbol3 = "#3" __
dummySymbol4 = "#4" __
dummySymbol5 = "#5" __
dummySymbol6 = "#6" __
dummySymbol7 = "#7" __
dummySymbol8 = "#8" __
dummySymbol9 = "#9" __
dummySymbolA = "#a" __
dummySymbolB = "#b" __
dummySymbolC = "#c" __
dummySymbolD = "#d" __
dummySymbolE = "#e" __
dummySymbolF = "#f" __

dummyPrefixOperator = "##" __

dummyLiteral = [0-9]+ __

Thanks
John

David Majda

unread,

Mar 17, 2013, 3:12:39 PM3/17/13

to jo...@minster.co.uk, pegjs

Hi,

2013/3/11 John Hadjioannou <jo...@minster.co.uk>:

> Thank you for your input. I think it's a bit more complex. I have the
> whole grammar below.

Could you please resend your grammar in an attachment or paste it as a
Gist on GitHub so that the whitespace isn't mangled?

--
David Majda
Entropy fighter
http://majda.cz/

JohnHadj

unread,

Mar 17, 2013, 4:18:34 PM3/17/13

to pe...@googlegroups.com

Hi David

Here is the link to a Gist of the syntax:

https://gist.github.com/JohnHadj/5183408

As is, there are 16 levels of priority in an expression, the parser takes 27 seconds to generate (chrome). If I replace the prio0 to prioF with the alternative prio0 it takes 71mS. If there is a way of rewriting this so that it is fast with 16 levels I'd really appreciate knowing.

Thanks

John

David Majda

unread,

Aug 17, 2013, 4:26:26 AM8/17/13

to jo...@minster.co.uk, pegjs

2013/3/17 JohnHadj <jo...@minster.co.uk>

Hi David

Here is the link to a Gist of the syntax:

https://gist.github.com/JohnHadj/5183408

I filed an issue about your problem on GitHub and I'll look at it at some point when I'll focus on perfromance issues:

https://github.com/dmajda/pegjs/issues/201

Thanks and sorry that my reply took so long.

John Hadjioannou

unread,

Apr 24, 2014, 4:23:06 PM4/24/14

to pe...@googlegroups.com

Hi David

Has this performance issue made it onto your radar?

David Majda

unread,

Apr 26, 2014, 10:39:02 AM4/26/14

to jo...@minster.co.uk, pegjs

2014-04-24 22:23 GMT+02:00 John Hadjioannou <jo...@minster.co.uk>:

Has this performance issue made it onto your radar?

Yes, I plan to look at it as part the 0.9.0 release.

Reply all

Reply to author

Forward