Parsing of heredoc syntax / matching captured strings

99 views
Skip to first unread message

Guilherme Vieira

unread,
Mar 4, 2013, 5:29:58 PM3/4/13
to pe...@googlegroups.com
Hi,

I'm trying to parse PHP code and I'm bumping into issues with heredocs. I'm wondering how can I parse them. I need a way to match a string that has been previously matched.

Is this possible with PEG.js?

--
Atenciosamente / Sincerely,
Guilherme Prá Vieira

Zak Greant

unread,
Mar 5, 2013, 6:36:02 PM3/5/13
to pe...@googlegroups.com
Dear Guilherme,

On 2013-03-04, at 14:29, Guilherme Vieira <n2.ni...@gmail.com (mailto:n2.ni...@gmail.com)> wrote:
...
> I'm trying to parse PHP code and I'm bumping into issues with heredocs. I'm wondering how can I parse them. I need a way to match a string that has been previously matched.



You can use the & { predicate } expression for this (see http://pegjs.majda.cz/documentation for more details.)

Here's a basic example:

{ Array.prototype.f = function( glue ){ return this.join( glue || '' ); } }

start
= heredoc:( s:('<<<' hid __) l:line* e:(hid ';') {return {start:s.f(), content:l.f(), end:e.f()} })
& {return heredoc.start.slice(3,-1) == heredoc.end.slice(0,-1) && -1 == heredoc.content.indexOf( "\n" + heredoc.end );}
{return heredoc;}

__ = "\n"
chr = [^\n]
hid = c:[a-z]i+ {return c.f();}
line = c:chr* __ {return c.f() + "\n";}
lines = l:line+ {return l.f();}


Cheers!
--zak

David Majda

unread,
Mar 12, 2013, 3:49:30 PM3/12/13
to Guilherme Vieira, pe...@googlegroups.com
Hi,

2013/3/4 Guilherme Vieira <n2.ni...@gmail.com>

I'm trying to parse PHP code and I'm bumping into issues with heredocs. I'm wondering how can I parse them. I need a way to match a string that has been previously matched.

Is this possible with PEG.js?

Have a look at the DelimitedBodyVariable rule in this parser to see how I solved it:


A cleaner solution would be to use some kind of backreferences, but PEG.js doesn't support them (at least not yet -- I am considering adding them).

P.S.: Zak's solution mentioned in the other reply to your mail won't work, because the "*" operator does not backtrack, so the "line*" expression in his grammar will match all remianing lines of the input regardless on whether the terminator was encountered or not.

--
David Majda
Entropy fighter
http://majda.cz/

lon...@gmail.com

unread,
Feb 25, 2016, 3:03:18 PM2/25/16
to PEG.js: Parser Generator for JavaScript, n2.ni...@gmail.com
This is nice approach in that the opening and closing tokens for the "HERE" document are not fixed (the keyword "<<<" is) and the user can use their own choice.

Note that if you use the parser actions in node.js with "use strict" then you must also add a ```{ var bodyTerminal; }``` initializer at the start of the grammar so that the global bodyTerminal variable is exposed between parser action blocks.

Reply all
Reply to author
Forward
0 new messages