Weekend changes: position tracking redesign and lexer-level rules simplification

David Majda

unread,

Dec 2, 2012, 12:14:16 PM12/2/12

to pegjs

Hi,

during the weekend, two big changes landed in PEG.js's master:
position tracking redesign and lexer-level rules simplification.

Position tracking redesign
--------------------------

I changed the "line" and "column" variables that contained current
line and column to functions. This allowed computing position data
lazily and without any significant performance degradation in case the
data is not needed. As a result, position tracking does not need to be
enabled explicitly anymore (the "trackLineAndColumn" option is gone)
and the code generator got somewhat simpler.

Commits: https://github.com/dmajda/pegjs/compare/da9ab1bf17...28860e88df

Lexer-level rules simplification
--------------------------------

This feature introduces a new prefix operator: "$". Applied to an
expression, it changes its match result from whatever it was to a
string the expression matched in the input. For example, while parser
generated from this grammar:

start = "a"+

will return ["a", "a", "a"] given input "aaa", a parser generated from
this grammar:

start = $"a"+

will return just "aaa".

This is useful mostly for "lexical" rules at the bottom of many
grammars. For example, instead of:

identifier = first:[a-zA-Z_] rest:[a-zA-Z0-9_]* {
return first + rest.join("");
}

you can now just write:

identifier = $([a-zA-Z_] [a-zA-Z0-9_]*)

The feature also opens room for some interesting optimizations.

While I am pretty sure about the feature itself, I am not 100%
convinced about the syntax -- the current one resembles shell/Makefile
variables too much for my taste. I welcome any suggestions.

Commits: https://github.com/dmajda/pegjs/compare/4e46a6e46e...c54483bb17

--
David Majda
Entropy fighter
http://majda.cz/

José Luis Millán

unread,

Dec 5, 2012, 3:42:12 AM12/5/12

to da...@majda.cz, pegjs

Hi David,

That's a cool feature!

In fact, why is it useful returning an array with the splitted input? I guess this is not functional but practical due to PEGjs design.

I find 100% more useful retrieving the input 'String' for each rule instead of an 'Array', by default and always. I've never used such Array, always need to join() it.

Regards

2012/12/2 David Majda <da...@majda.cz>

--

--
José Luis Millán

David Majda

unread,

Dec 10, 2012, 3:36:57 PM12/10/12

to José Luis Millán, pegjs

2012/12/5 José Luis Millán <jmi...@aliax.net>:

> I find 100% more useful retrieving the input 'String' for each rule instead
> of an 'Array', by default and always. I've never used such Array, always
> need to join() it.

It is true that in lexer-level rules, one usually cares for the raw
text more than for the structure. But in syntax-level rules one cares
more about the structured representation of the input.

For example if you have a rule like

HttpHeaders = HttpHeader*

you really want to get an array of something (object, array, ...) that
represents a HTTP header (a name-value pair). Similarly with statement
lists, variable declarations and many other constructs.

PEG.js is built with an assumption that there are usually few
lexer-level rules but many syntax-level rules in the grammar. This is
the reason why it is optimized for the latter type and why I only now
add some constructs (the "$" operator, the "text" function) that make
building lexer-level rules easier.

José Luis Millán

unread,

Dec 10, 2012, 4:06:05 PM12/10/12

to David Majda, pegjs

Crystal-clear.

Thanks again for the explanation.

2012/12/10 David Majda <da...@majda.cz>

--
José Luis Millán

Reply all

Reply to author

Forward