Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

possible bugs in Exegesis 5 code for matching patterns

2 views

Skip to first unread message

Steve Tolkin

unread,

Sep 20, 2002, 5:07:53 PM9/20/02

to perl6-l...@perl.org

Here is a discussion thread of Exegesis 5
http://www.perl.com/pub/a/2002/08/22/exegesis5.html at
http://developers.slashdot.org/developers/02/08/23/1232230.shtml?tid=145
But the signal/noise is too low, with side tracks into
Monty Python etc.

In section "Smarter alternatives" there is this code:
{ @$appendline =~ s/<in_marker>/</;
I think this needs a backslash in front of the < symbol,
and a space after in_marker, i.e. it should be:
{ @$appendline =~ s/<in_marker>/\<<sp>/;

That a small issue. But the following is more
important because it strikes at the ease of
using inheritance of grammars (i.e. patterns).
In
http://www.perl.com/pub/a/2002/08/22/exegesis5.html?page=5#different_diffs
we see the code:
rule fileinfo {
<out_marker><3> $oldfile:=(\S+) $olddate:=[\h* (\N+?) \h*?] \n
<in_marker><3> $newfile:=(\S+) $newdate:=[\h* (\N+?) \h*?] \n
}
....
rule out_marker { \+ <sp> }
rule in_marker { - <sp> }

The <sp> means a single literal space.
So I think <out_marker><3> means look for "+ + + "
rather than "+++" which is what is really needed
to match a Unified diff. Similarly for <in_marker><3>

Or am I missing something?
If these are bugs, then what would be the best way to
fix the code while retaining as much reuse as possible.

Hopefully helpfully yours,
Steve
--
Steven Tolkin steve....@fmr.com 617-563-0516
Fidelity Investments 82 Devonshire St. V8D Boston MA 02109
There is nothing so practical as a good theory. Comments are by me,
not Fidelity Investments, its subsidiaries or affiliates.

Smylers

unread,

Sep 21, 2002, 5:07:42 AM9/21/02

to perl6-l...@perl.org

Steve Tolkin wrote:

> { @$appendline =~ s/<in_marker>/</;
>
> I think this needs a backslash in front of the < symbol, and a space
> after in_marker, i.e. it should be:
>
> { @$appendline =~ s/<in_marker>/\<<sp>/;

Isn't the replacement part of a substitution is still a string?
Having the replacement being a rule would mean that you could write
things like:

s:e/ \* / <[aeiou]> /;

That would replace asterisks with 'any' vowel, without specifying which
vowel to use. That makes no sense at all. (Well, not unless it creates
a superposition, but surely Damian can't have intended to introduce
superpositions like this is the core language? Can he?)

So since pointy brackets aren't special in strings, it doesn't take a
backslash; similarly spaces should just be written as spaces.

> rule fileinfo {
> <out_marker><3> $oldfile:=(\S+) $olddate:=[\h* (\N+?) \h*?] \n
> <in_marker><3> $newfile:=(\S+) $newdate:=[\h* (\N+?) \h*?] \n
> }
> ....
> rule out_marker { \+ <sp> }
> rule in_marker { - <sp> }
>
> The <sp> means a single literal space. So I think <out_marker><3>
> means look for "+ + + " rather than "+++" which is what is really
> needed to match a Unified diff.

Yes, you look to be right to me.

> If these are bugs, then what would be the best way to
> fix the code while retaining as much reuse as possible.

This is one way:

rule out_marker_symbol { \+ }
rule in_marker_symbol { - }

rule out_marker { <out_marker_symbol> <sp> }
rule in_marker { <in_marker_symbol> <sp> }

rule fileinfo {
<out_marker_symbol><3> $oldfile:=(\S+) $olddate:=[\h* (\N+?) \h*?] \n
<in_marker_symbol><3> $newfile:=(\S+) $newdate:=[\h* (\N+?) \h*?] \n
}

[While we're on the subject of typos, it looks like the final definition
of C<fileinfo> in the exegesis has C<$newfile> where C<$oldfile> is
meant.]

But it'd probably be easier just to do:

rule fileinfo {
\+\+\+ $oldfile:=(\S+) $olddate:=[\h* (\N+?) \h*?] \n
--- $newfile:=(\S+) $newdate:=[\h* (\N+?) \h*?] \n
}

That isn't a terrible cop out to repeat the symbols, since there's no
reason why the format has to use the same symbols in both places.

Smylers

0 new messages