Example for whitespace/comments

2,008 views
Skip to first unread message

Rich Brown

unread,
Oct 1, 2011, 12:32:23 AM10/1/11
to PEG.js: Parser Generator for JavaScript
My grammar allows whitespace and comments, which should be ignored any
place they occur. Their definitions might be:

WHITESPACE = [ \t\n\r]+
COMMENT = --[^\n\r]*

How can these be handled/specified in PEG.js? Many thanks.

Rich Brown

unread,
Oct 1, 2011, 3:27:38 PM10/1/11
to PEG.js: Parser Generator for JavaScript
Update to above: For example, how would one extend the sample
arithmetic expression grammar at http://pegjs.majda.cz/online to allow
comments or whitespace between any of the terms of the expression. I
would want to accept these examples:

a) 2*(3+4)

b) 2 * ( 3 + 4 )

c) 2 * ( 3 + 4 ) -- this is a comment

d) -- comment line followed by expression on next line
2 * ( 3 + 4 )

etc. Many thanks!

Rich Brown
Hanover, NH

Zak Greant

unread,
Oct 1, 2011, 4:43:19 PM10/1/11
to pe...@googlegroups.com

Hey Rich,

Case b can be handled pretty easily by adding a rule for matching zero or more whitespace characters and then putting it where whitespace may occur.

eg. I'd define my whitespace rule as follows:
_
= [ \t\r\n]*

then change the primary and integer rules as follows:

primary
= integer
/ _ "(" additive:additive ")" _ { return additive; }

integer "integer"
= _ digits:[0-9]+ _ { return parseInt(digits.join(""), 10); }


Case c is easy to handle if you only have a single line - just change the start rule as follows:

start
= additive:additive ("--" [^\r\n]*)? {return additive}


The last case requires more work, as you need to deal with multiple lines. Personally, I'd clean up comments and whitespace in one pass and then parse in another pass.

Here's a rough pass that doesn't cover all the cases - it should get you started though.

/* Parse zero or more lines */
start
= line*

line
= additive:additive comment? {return additive} / comment {return null}

additive
= left:multiplicative "+" right:additive { return left + right; }
/ multiplicative

multiplicative
= left:primary "*" right:multiplicative { return left * right; }
/ primary

primary
= integer
/ _ "(" additive:additive ")" _ { return additive; }

integer "integer"
= _ digits:[0-9]+ _ { return parseInt(digits.join(""), 10); }

comment
= "--" [^\n\r]* EOL

EOL
= [\n\r]{1,2} / !.
_
= [ \n\r\t]*


Cheers!
--zak

Rich Brown

unread,
Oct 1, 2011, 11:27:23 PM10/1/11
to PEG.js: Parser Generator for JavaScript
Hi zak,

Thanks for the speedy response with the good advice.

> The last case requires more work, as you need to deal with multiple lines.
> Personally, I'd clean up comments and whitespace in one pass and then parse in another pass.

Yeah... That's the tricky part. SNMP MIB files are not particularly
fussy about line-endings: the file creators sometimes put a lot on
each line, sometimes they spread a definition across multiple lines. I
was hoping to use a parser generator to "straighten it out" so that I
could chunk them into rational bits.

I think I see from your example how to remove whitespace: it's
"simply" a matter of placing a '_' in strategic places in the
grammar.

The comments will be harder, I think. They start with "--" and
continue to the end of the line. Can you give me any advice how a
strategy for doing this? To continue the example, (and just to make it
hard :-) I would want to handle this case along with the ones above.

e) -- a leading comment followed by a blank line

2 * -- two is the number to multiply the sum...
( 3 -- ... of three
+ 4 ) -- and four

Many thanks and best regards,

Rich

JohnHadj

unread,
Oct 2, 2011, 3:18:22 AM10/2/11
to PEG.js: Parser Generator for JavaScript
Hi

What I have done is something like this:

start
= __ c:clauseSequence {return c}

/* non-terminals, no need for __ */
clauseSequence
= .......

......

/* terminals */

beginSymbol = s:"BEGIN" __ {return s}
endSymbol = s:"END" __ {return s}

goOnSymbol = ";" __ {return s}

boolNullSymbol = s:"NULL" __ {return s}
boolFalseSymbol = s:"FALSE" __ {return s}

/* white space and comments */
__
= ( whiteSpace / lineTerminator / enclosedComment / lineComment )*

whiteSpace
= [\t\v\f \u00A0\uFEFF]

lineTerminator
= [\n\r]

enclosedComment
= "/*" (!"*/" anyCharacter)* "*/"

lineComment
= "//" (!lineTerminator anyCharacter)*

anyCharacter
= .

Rich Brown

unread,
Oct 2, 2011, 10:04:08 AM10/2/11
to PEG.js: Parser Generator for JavaScript
Hi John,

> What I have done is something like this:
> ...

Thanks for your info. I think that everything I've learned here will
let me do what I need to.

Best regards,

Rich
Reply all
Reply to author
Forward
0 new messages