subrule of subrule immediately fails

Terrence X Brannon

unread,

Oct 23, 2007, 2:31:03 PM10/23/07

to recde...@perl.org

The following grammar (piece) works fine:

geoloc: geoloc_ and(?) geoloc_ | geoloc_
geoloc_: city | state | country | area

but per
http://search.cpan.org/~dconway/Parse-RecDescent-v1.95.1/lib/Parse/RecDescent.pm#Subrules

I want distinct geoloc values, so I did this:

geoloc: geoloc1 and(?) geoloc2 | geoloc_
geeloc1: geoloc_
geeloc2: geoloc_
geoloc_: city | state | country | area

but then the autotrace immediately fails:

--
Terrence Brannon - SID W049945
614-213-2475 (office)
614-213-3426 (fax)
818-359-0893 (cell)

-----------------------------------------
This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.

Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to UK legal entities.

Ted Zlatanov

unread,

Oct 23, 2007, 4:40:31 PM10/23/07

to terrence....@jpmchase.com, recde...@perl.org

On Tue, 23 Oct 2007 14:31:03 -0400 terrence....@jpmchase.com wrote:

txb> The following grammar (piece) works fine:
txb> geoloc: geoloc_ and(?) geoloc_ | geoloc_
txb> geoloc_: city | state | country | area

txb> but per
txb> http://search.cpan.org/~dconway/Parse-RecDescent-v1.95.1/lib/Parse/RecDescent.pm#Subrules

txb> I want distinct geoloc values, so I did this:

txb> geoloc: geoloc1 and(?) geoloc2 | geoloc_
txb> geeloc1: geoloc_
txb> geeloc2: geoloc_
txb> geoloc_: city | state | country | area

txb> but then the autotrace immediately fails:

Any chance you can post a complete example with sample input that fails?

Thanks
Ted

Terrence Brannon

unread,

Oct 24, 2007, 5:17:11 AM10/24/07

to terrence....@jpmchase.com, recde...@perl.org

On 10/23/07, Ted Zlatanov <t...@lifelogs.com> wrote:

>
> Any chance you can post a complete example with sample input that fails?
>

What was I thinking? My goodness, par for the course when submitting a
bug. Anyway, my reduced test case works just fine, which hopefully
means that it's on my end of things. So I will just do some more
head-scratching on my end for now. Here is my test case that worked
flawlessly:

use strict;
use warnings;

use Data::Dumper;

use Parse::RecDescent;

# Generate a parser from the specification in $grammar:

my $grammar = << 'EOGRAMMAR';
store: name geoloc

geoloc: geoloc1 and(?) geoloc2 | geoloc_

geoloc1: geoloc_
geoloc2: geoloc_

geoloc_: city | state | country | area

and: 'and'

EOGRAMMAR

$::RD_AUTOACTION = q { [\%item] } ;

my $parser = new Parse::RecDescent ($grammar);

my $r = $parser->store("trader joe's los angeles california");

warn Dumper $r;

Ted Zlatanov

unread,

Oct 24, 2007, 3:08:23 PM10/24/07

to Terrence Brannon, terrence....@jpmchase.com, recde...@perl.org

On Wed, 24 Oct 2007 05:17:11 -0400 "Terrence Brannon" <meta...@gmail.com> wrote:

TB> On 10/23/07, Ted Zlatanov <t...@lifelogs.com> wrote:
>>
>> Any chance you can post a complete example with sample input that fails?
>>

TB> my $grammar = << 'EOGRAMMAR';
TB> store: name geoloc

TB> name: "trader joe's" | "whole foods"

TB> geoloc: geoloc1 and(?) geoloc2 | geoloc_
TB> geoloc1: geoloc_
TB> geoloc2: geoloc_
TB> geoloc_: city | state | country | area

TB> and: 'and'

TB> EOGRAMMAR

I think actions may be the answer for your original problem, which was
to distinguish the two positions (so you created the geoloc1 and geoloc2
rules). An action like this:

{ $return = { item1 => $item[1], item2 => $item[3] }; }

would give you back a hash with the entries for your matched items named
appropriately. I don't know why you had subrule problems, sorry.

You could also consider the <leftop> command, which could set up
"A and B and C" parsing for you, unlike your current rules which only
accomodate one "and". You can use actions again to return the right
things by name.

Hope this helps.
Ted

Terrence X Brannon

unread,

Oct 30, 2007, 1:41:22 PM10/30/07

to recde...@perl.org

Note: I'm sorry for the long disclaimer in these emails. I have put in
requests to GMANE and Nabble to archive this list, so hopefully my future
posts (while at $dayjob) will be easier for others to trim and reply to.
With that said, I continue...

Ok, I'm wondering what would be the best way to test for a match against a
grammar where there may be up to 25 "junk" characters preceding the match.
We know that PRD does not do backtracking:

http://search.cpan.org/dist/Parse-RecDescent-FAQ/FAQ.pm#Answer_by_Randal_L._Schwartz

And we know that greediness is difficult to control:

http://search.cpan.org/~dconway/Parse-RecDescent-v1.95.1/lib/Parse/RecDescent.pm#ON-GOING_ISSUES_AND_FUTURE_DIRECTIONS

But given all that, and the toy grammar below, which matches "sir george",
how could we modify it to match "io;ajwer;i324 sir george"

One possible strategy is to simply feed 25 successive substrings, chopping
off one character at a time.

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

use lib '.' ;
use Parse::RecDescent;

{
#last;
#$::RD_WARN++;
#$::RD_HINT++;
$::RD_TRACE++;
}

$::RD_AUTOACTION =
q { [@item] } ;

my $G = << 'EOGRAMMAR' ;

name_types: royal

royal: title firstname of(?)

title: 'sir' | 'his holiness'

firstname: 'george' | 'john'

of: 'of' place

place: 'kent'

EOGRAMMAR

my $p = Parse::RecDescent->new($G) ;

my $string = "sir george";

my $parser = 'name' ;

my $r = $p -> $parser ( $string ) ;

warn Dumper $r;

Ted Zlatanov

unread,

Oct 31, 2007, 8:07:25 AM10/31/07

to terrence....@jpmchase.com, recde...@perl.org

On Tue, 30 Oct 2007 13:41:22 -0400 terrence....@jpmchase.com wrote:

txb> But given all that, and the toy grammar below, which matches "sir george",
txb> how could we modify it to match "io;ajwer;i324 sir george"

Here's my solution:

1) treat the whole input as a line made of items
2) each item can be a word or a name (we try to match a 'name' first)
3) a word is any number of non-space characters

The key is to walk through the input, looking for names. I tried a few
other inputs and they seemed to work the way you want.

Ted

#!/usr/bin/perl

use warnings;
use strict;

use Data::Dumper;
use Parse::RecDescent;

{
#last;
#$::RD_WARN++;
#$::RD_HINT++;
$::RD_TRACE++;
}

$::RD_AUTOACTION =
q { [@item] } ;

my $G = << 'EOGRAMMAR' ;

line: item(s)
item: name | word

word: /\S+/

name_types: royal

royal: title firstname of(?)

title: 'sir' | 'his holiness'

firstname: 'george' | 'john'

of: 'of' place

place: 'kent'

EOGRAMMAR

my $p = Parse::RecDescent->new($G) ;

my @strings = ("hello there sir george",
"io;ajwer;i324 sir george",
"welcome his holiness john of kent");

my $parser = 'line' ;

foreach my $string (@strings)
{

Terrence X Brannon

unread,

Oct 31, 2007, 1:24:51 PM10/31/07

to recde...@perl.org

I will be adding this to the FAQ shortly, but thought I would mention
how excellent Data::Match

http://search.cpan.org/~kstephens/Data-Match-0.06/Match.pm

has been for my work.

So, I have these deeply nested parse trees thanks to my use of an
Autoaction:

$::RD_AUTOACTION = q { [@item] } ;

I first inquired about various tools for spelunking in such trees:
http://perlmonks.org/?node_id=646560

And am very happy so far with Data::Match. Here is a sample parse tree
from my parse:

$VAR1 = [
'simple_house',
[
'pre_simple',
[
'geoloc',
[
'geoloc_',
[
'place',
[
'city',
'taipei'
]
]
]
]
],
[
'house',
'house'
],
[]
];

Now, my goal is to get at the inner-most array in $VAR1->[1]:

[
'city',
'taipei'
]

And with Data::Match, I can do so in a very definitional fashion:

my $match = match
(
# The parse tree
$self->{parse_result}[1],

# The Data::Match pattern match template
FIND (
COLLECT (
'x',
[
EXPR(q{! ref}),
EXPR(q{! ref})

]
)
)
);

The pattern is basically saying: "match an array ref consisting of 2
elements where each element is not an reference of any sort". Since
the strings 'city' and 'taipei' both fulfill that criterion that is
what matches.

Terrence X Brannon

unread,

Nov 2, 2007, 3:22:02 PM11/2/07

to recde...@perl.org

Large parts of my current grammar consist of alternation lists which are
easier entered into a file as follows:

+the(?) /bahamas?/ #bahamas
chile
cuba
/d(a|e)nmark/

The items preceded with a plus are non-terminals and entered literally
The iterms preceded with a slash are regexps and are entered literally
The other items get single quote marks put around them

And all items are thrown into an alternation, such as

country: the(?) /bahamas?/ # bahamas
| 'chile'
| 'cuba'
| /d(a|e)nmark/

This saves me from typing double quotes.
And sorting the file makes it easy to check for duplicate entries. Which
is very possible when entering as many records as I am.
And of course it is nice to have the main grammar much smaller.

I'm starting to like Class::Base for all my OOP work, so I would probably
have an API like:

my $o = Parse::RecDescent::Slurp(base => 'path/to/data/files/');
for my $rule (qw(city country state)) {
$grammar = sprintf "$grammar\n%s\n", $o->slurp($rule) ; # rule and data
file name are the same unless 2nd arg gives rule explicit name
}

my $p = Parse::RecDescent->new($grammar);

or maybe I should provide the grammar to the constructor and have slurp
automatically tack the rules on at the end...

at any rate, I dont like how Class::Base takes named parms for the
constructor but positional parms for the methods... I think I will see
what
perlmonks like for their OO-work.

Just brainstorming for now anyway...