geoloc: geoloc_ and(?) geoloc_ | geoloc_
geoloc_: city | state | country | area
but per
http://search.cpan.org/~dconway/Parse-RecDescent-v1.95.1/lib/Parse/RecDescent.pm#Subrules
I want distinct geoloc values, so I did this:
geoloc: geoloc1 and(?) geoloc2 | geoloc_
geeloc1: geoloc_
geeloc2: geoloc_
geoloc_: city | state | country | area
but then the autotrace immediately fails:
2|tok_of_ty| |" bahrain kuwait"
2|tok_of_ty|Trying subrule: [geoloc] |
3| geoloc |Trying rule: [geoloc] |
3| geoloc |Trying production: [geoloc1 and |
| |geoloc2] |
3| geoloc |Trying subrule: [geoloc1] |
3| geoloc |<<Didn't match subrule: [geoloc1]>> |
--
Terrence Brannon - SID W049945
614-213-2475 (office)
614-213-3426 (fax)
818-359-0893 (cell)
-----------------------------------------
This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
and affiliates.
This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.
Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to UK legal entities.
txb> The following grammar (piece) works fine:
txb> geoloc: geoloc_ and(?) geoloc_ | geoloc_
txb> geoloc_: city | state | country | area
txb> but per
txb> http://search.cpan.org/~dconway/Parse-RecDescent-v1.95.1/lib/Parse/RecDescent.pm#Subrules
txb> I want distinct geoloc values, so I did this:
txb> geoloc: geoloc1 and(?) geoloc2 | geoloc_
txb> geeloc1: geoloc_
txb> geeloc2: geoloc_
txb> geoloc_: city | state | country | area
txb> but then the autotrace immediately fails:
txb> 2|tok_of_ty| |" bahrain kuwait"
txb> 2|tok_of_ty|Trying subrule: [geoloc] |
txb> 3| geoloc |Trying rule: [geoloc] |
txb> 3| geoloc |Trying production: [geoloc1 and |
txb> | |geoloc2] |
txb> 3| geoloc |Trying subrule: [geoloc1] |
txb> 3| geoloc |<<Didn't match subrule: [geoloc1]>> |
Any chance you can post a complete example with sample input that fails?
Thanks
Ted
>
> Any chance you can post a complete example with sample input that fails?
>
What was I thinking? My goodness, par for the course when submitting a
bug. Anyway, my reduced test case works just fine, which hopefully
means that it's on my end of things. So I will just do some more
head-scratching on my end for now. Here is my test case that worked
flawlessly:
use strict;
use warnings;
use Data::Dumper;
use Parse::RecDescent;
# Generate a parser from the specification in $grammar:
my $grammar = << 'EOGRAMMAR';
store: name geoloc
name: "trader joe's" | "whole foods"
geoloc: geoloc1 and(?) geoloc2 | geoloc_
geoloc1: geoloc_
geoloc2: geoloc_
geoloc_: city | state | country | area
and: 'and'
city: 'los angeles' | 'new york'
state: 'california' | 'new york'
country: 'united states'
area: 'north' | 'south' | 'east' | 'west'
EOGRAMMAR
$::RD_AUTOACTION = q { [\%item] } ;
my $parser = new Parse::RecDescent ($grammar);
my $r = $parser->store("trader joe's los angeles california");
warn Dumper $r;
TB> On 10/23/07, Ted Zlatanov <t...@lifelogs.com> wrote:
>>
>> Any chance you can post a complete example with sample input that fails?
>>
TB> my $grammar = << 'EOGRAMMAR';
TB> store: name geoloc
TB> name: "trader joe's" | "whole foods"
TB> geoloc: geoloc1 and(?) geoloc2 | geoloc_
TB> geoloc1: geoloc_
TB> geoloc2: geoloc_
TB> geoloc_: city | state | country | area
TB> and: 'and'
TB> city: 'los angeles' | 'new york'
TB> state: 'california' | 'new york'
TB> country: 'united states'
TB> area: 'north' | 'south' | 'east' | 'west'
TB> EOGRAMMAR
I think actions may be the answer for your original problem, which was
to distinguish the two positions (so you created the geoloc1 and geoloc2
rules). An action like this:
{ $return = { item1 => $item[1], item2 => $item[3] }; }
would give you back a hash with the entries for your matched items named
appropriately. I don't know why you had subrule problems, sorry.
You could also consider the <leftop> command, which could set up
"A and B and C" parsing for you, unlike your current rules which only
accomodate one "and". You can use actions again to return the right
things by name.
Hope this helps.
Ted
Ok, I'm wondering what would be the best way to test for a match against a
grammar where there may be up to 25 "junk" characters preceding the match.
We know that PRD does not do backtracking:
http://search.cpan.org/dist/Parse-RecDescent-FAQ/FAQ.pm#Answer_by_Randal_L._Schwartz
And we know that greediness is difficult to control:
http://search.cpan.org/~dconway/Parse-RecDescent-v1.95.1/lib/Parse/RecDescent.pm#ON-GOING_ISSUES_AND_FUTURE_DIRECTIONS
But given all that, and the toy grammar below, which matches "sir george",
how could we modify it to match "io;ajwer;i324 sir george"
One possible strategy is to simply feed 25 successive substrings, chopping
off one character at a time.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use lib '.' ;
use Parse::RecDescent;
{
#last;
#$::RD_WARN++;
#$::RD_HINT++;
$::RD_TRACE++;
}
$::RD_AUTOACTION =
q { [@item] } ;
my $G = << 'EOGRAMMAR' ;
name: name_types eofile { $return = $item[1] }
eofile: /^\Z/
name_types: royal
royal: title firstname of(?)
title: 'sir' | 'his holiness'
firstname: 'george' | 'john'
of: 'of' place
place: 'kent'
EOGRAMMAR
my $p = Parse::RecDescent->new($G) ;
my $string = "sir george";
my $parser = 'name' ;
my $r = $p -> $parser ( $string ) ;
warn Dumper $r;
txb> But given all that, and the toy grammar below, which matches "sir george",
txb> how could we modify it to match "io;ajwer;i324 sir george"
Here's my solution:
1) treat the whole input as a line made of items
2) each item can be a word or a name (we try to match a 'name' first)
3) a word is any number of non-space characters
The key is to walk through the input, looking for names. I tried a few
other inputs and they seemed to work the way you want.
Ted
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use Parse::RecDescent;
{
#last;
#$::RD_WARN++;
#$::RD_HINT++;
$::RD_TRACE++;
}
$::RD_AUTOACTION =
q { [@item] } ;
my $G = << 'EOGRAMMAR' ;
line: item(s)
item: name | word
name: name_types eofile { $return = $item[1] }
eofile: /^\Z/
word: /\S+/
name_types: royal
royal: title firstname of(?)
title: 'sir' | 'his holiness'
firstname: 'george' | 'john'
of: 'of' place
place: 'kent'
EOGRAMMAR
my $p = Parse::RecDescent->new($G) ;
my @strings = ("hello there sir george",
"io;ajwer;i324 sir george",
"welcome his holiness john of kent");
my $parser = 'line' ;
foreach my $string (@strings)
{
http://search.cpan.org/~kstephens/Data-Match-0.06/Match.pm
has been for my work.
So, I have these deeply nested parse trees thanks to my use of an
Autoaction:
$::RD_AUTOACTION = q { [@item] } ;
I first inquired about various tools for spelunking in such trees:
http://perlmonks.org/?node_id=646560
And am very happy so far with Data::Match. Here is a sample parse tree
from my parse:
$VAR1 = [
'simple_house',
[
'pre_simple',
[
'geoloc',
[
'geoloc_',
[
'place',
[
'city',
'taipei'
]
]
]
]
],
[
'house',
'house'
],
[]
];
Now, my goal is to get at the inner-most array in $VAR1->[1]:
[
'city',
'taipei'
]
And with Data::Match, I can do so in a very definitional fashion:
my $match = match
(
# The parse tree
$self->{parse_result}[1],
# The Data::Match pattern match template
FIND (
COLLECT (
'x',
[
EXPR(q{! ref}),
EXPR(q{! ref})
]
)
)
);
The pattern is basically saying: "match an array ref consisting of 2
elements where each element is not an reference of any sort". Since
the strings 'city' and 'taipei' both fulfill that criterion that is
what matches.
+the(?) /bahamas?/ #bahamas
chile
cuba
/d(a|e)nmark/
The items preceded with a plus are non-terminals and entered literally
The iterms preceded with a slash are regexps and are entered literally
The other items get single quote marks put around them
And all items are thrown into an alternation, such as
country: the(?) /bahamas?/ # bahamas
| 'chile'
| 'cuba'
| /d(a|e)nmark/
This saves me from typing double quotes.
And sorting the file makes it easy to check for duplicate entries. Which
is very possible when entering as many records as I am.
And of course it is nice to have the main grammar much smaller.
I'm starting to like Class::Base for all my OOP work, so I would probably
have an API like:
my $o = Parse::RecDescent::Slurp(base => 'path/to/data/files/');
for my $rule (qw(city country state)) {
$grammar = sprintf "$grammar\n%s\n", $o->slurp($rule) ; # rule and data
file name are the same unless 2nd arg gives rule explicit name
}
my $p = Parse::RecDescent->new($grammar);
or maybe I should provide the grammar to the constructor and have slurp
automatically tack the rules on at the end...
at any rate, I dont like how Class::Base takes named parms for the
constructor but positional parms for the methods... I think I will see
what
perlmonks like for their OO-work.
Just brainstorming for now anyway...