Grammar for extracting special blocks in text

47 views
Skip to first unread message

pegguy

unread,
Mar 8, 2013, 11:25:49 AM3/8/13
to pe...@googlegroups.com
Hello again,

my example is of course simplified. Given the following text I want to extracting all lines beginning with '#':
ab
#cd
ef
#gh


My grammar:
start = ( ( ! block . ) / block ) +

block = Result:( "#" identifier ) { return( Result ) }                                        // 1
// block = Result:( "#" identifier ) { return( "Block: " + Result ) }                         // 2
// block = Result:( "#" identifier ) { return( "Block: " + Result[0] + Result[1].join("") ) } // 3

identifier = [a-zA-Z_]+


The result:
[
  [
    [
      "",
      "a"
    ] 
  ]

  [
    [
      "",
      "b"
    ] 
  ]

  ...

  [
    "#",   // hint, not in output: Result[0]
    [
      "g", //
hint, not in output: Result[1][0]
      "h"  //
hint, not in output: Result[1][1]
    ] 
  ]
]


How can I avoid all the in my case unnecessary output?
When I use the commented line 2 of my code, I get
[
  ...
  "Block: #,g,h"
]
or with
commented line 3
[
  ...
  "Block: #gh"
]

And that "Block ..." is the only part I need.

Any hints/suggestions?


Many thanks and greetings
pegguy







Guilherme Vieira

unread,
Mar 8, 2013, 1:35:44 PM3/8/13
to weike...@aol.com, pe...@googlegroups.com
Hi,

You're getting unnecessary data because your start rule consumes unnecessary stuff but doesn't discard it, returning it all in an array.

The unnecessary data is in the (!block .) expression, so you can make it just return nothing instead of the matched data. Then, in the start rule action, you can filter out the undefined values in the array, like this:

start = data:( ( ! block . ) { return; } / block ) + { return data.filter(function (d) { return d !== undefined; }); }

block = Result:( "#" identifier ) { return( Result ) }

identifier = [a-zA-Z_]+

Results:
[
   [
      "#",
      [
         "a",
         "b"
      ]
   ],
   [
      "#",
      [
         "c",
         "b"
      ]
   ]
]

I hope this helps.

-- 
Atenciosamente / Sincerely,
Guilherme Prá Vieira



--
You received this message because you are subscribed to the Google Groups "PEG.js: Parser Generator for JavaScript" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pegjs+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
  

Guilherme Vieira

unread,
Mar 8, 2013, 1:43:07 PM3/8/13
to weike...@aol.com, pe...@googlegroups.com
Oh, and also, if you want the full strings after the hash character, you can do this:

start = data:( ( ! block . ) { return; } / block ) + { return data.filter(function (d) { return d !== undefined; }); }

block = ( "#" Result:identifier ) { return( Result ) }

identifier = c:[a-zA-Z_]+ { return c.join(''); }

Zak Greant

unread,
Mar 8, 2013, 5:21:33 PM3/8/13
to pe...@googlegroups.com
Also, in some cases like this, people may find this small function helpful – it recursively joins arrays that contain a mix of strings and arrays.

Array.prototype.rjoin = function( glue ){
glue = glue || ''; // join with an empty string by default

return this.map( function( item ){
return (item instanceof Array) ? item.rjoin( glue ) : item;
}).join( glue );
}



Cheers!
--zak


Reply all
Reply to author
Forward
0 new messages