Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to find all the strings in a long that are at most of n different characters of a test string?

0 views
Skip to first unread message

Peng Yu

unread,
Nov 19, 2008, 6:34:17 PM11/19/08
to
Hi,

Suppose I have a long string $a, and a test string $b.

I want to fine all the substrings in $a, whose length is the same as
$b with at most n mismatches.

For example, string 'abcdef' and string 'aacdxf' have two mismatches
at the 2nd character and the 5th character.

I'm wondering if this can be done easily in perl. Can I use regular
expression to solve this problem?

Thanks,
Peng

s...@netherlands.com

unread,
Nov 19, 2008, 8:55:07 PM11/19/08
to

Not easily, and probably fairly slow.
You need some heuristic algorithym.

sln

----------------------------

use strict;
use warnings;


my $str = 'aacdxfo sdfbsabcrxfodfbdfb';
my $pattern = 'abcdef';
my $misses = 2;

my @tmp = split '',$pattern;
my $pstr;
for (@tmp) {
$pstr .= "(?:$_|(.))";
}
$pstr = '('.$pstr.')';
my $rxpattern = qr/$pstr/i;

print @tmp,"\n",$rxpattern,"\n\n";

while ($str =~ /$rxpattern/g )
{
my $cnt = 0;
for my $i (2..(@tmp+1)) {
last if (($cnt += defined( $-[$i])) > $misses);
}
if ($cnt > $misses) {
pos($str) = $-[0]+1;
} else {
print "$cnt bad chars, but found a close match: '$1'\n";
}
}

__END__

abcdef
(?i-xsm:((?:a|(.))(?:b|(.))(?:c|(.))(?:d|(.))(?:e|(.))(?:f|(.))))

2 bad chars, but found a close match: 'aacdxf'
2 bad chars, but found a close match: 'abcrxf'


Peng Yu

unread,
Nov 19, 2008, 9:52:40 PM11/19/08
to
On Nov 19, 7:55 pm, s...@netherlands.com wrote:

Hi,

I can not find what '$-[$i]' means in your code on online perl
documentation. Would you please let me know its meaning? Where is its
documentation?

Thanks,
Peng

Thanks,
Peng

John W. Krahn

unread,
Nov 20, 2008, 1:19:01 AM11/20/08
to
Peng Yu wrote:
> On Nov 19, 7:55 pm, s...@netherlands.com wrote:
>>
>> use strict;
>> use warnings;
>>
>> my $str = 'aacdxfo sdfbsabcrxfodfbdfb';
>> my $pattern = 'abcdef';
>> my $misses = 2;
>>
>> my @tmp = split '',$pattern;
>> my $pstr;
>> for (@tmp) {
>> $pstr .= "(?:$_|(.))";}
>>
>> $pstr = '('.$pstr.')';
>> my $rxpattern = qr/$pstr/i;
>>
>> print @tmp,"\n",$rxpattern,"\n\n";
>>
>> while ($str =~ /$rxpattern/g )
>> {
>> my $cnt = 0;
>> for my $i (2..(@tmp+1)) {
>> last if (($cnt += defined( $-[$i])) > $misses);
>> }
>> if ($cnt > $misses) {
>> pos($str) = $-[0]+1;
>> } else {
>> print "$cnt bad chars, but found a close match: '$1'\n";
>> }
>> }
>>
>> __END__
>
> I can not find what '$-[$i]' means in your code on online perl
> documentation. Would you please let me know its meaning? Where is its
> documentation?

You can find out about the @- array in:

perldoc perlvar

John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Tad J McClellan

unread,
Nov 19, 2008, 11:15:09 PM11/19/08
to
Peng Yu <Peng...@gmail.com> wrote:


> I can not find what '$-[$i]' means in your code


It is accessing the array named @-


> on online perl
> documentation. Would you please let me know its meaning? Where is its
> documentation?


perldoc perlvar


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Peng Yu

unread,
Nov 20, 2008, 8:54:27 AM11/20/08
to

It seems that the perldoc is not easy to understand by a newbie. While
it explains @-, the perldoc keeps on referring to others things that I
don't understand. Can somebody help explain @- for me with more
examples?

BTW, I'd think that the perldoc be made more readable for newbies. I'm
not sure whether other newbies would have the same feeling.

Thanks,
Peng

Pilcrow

unread,
Nov 20, 2008, 9:07:32 AM11/20/08
to
On Wed, 19 Nov 2008 15:34:17 -0800 (PST), Peng Yu <Peng...@gmail.com>
wrote:

Not too hard the simple way.
-------------------------------------------------------------------
#!/usr/bin/perl
use strict; use warnings;

my $a =
"abcdefbacdefabbdefaaaaacdxfaaacdefcdefbacdefabbdefaaaaacdxfaaacdefaacdxfaacdfx";
my $b = "aacdxf";

my @a = split //,$a;
my @b = split //,$b;
my $limit = 2;
my $cnt;
my $lenb = length $b;
my @substrings = ();

print "\nmatching '$a' against '$b'\n";

OUTER:
for (my $i = 0; $i <= $#a-$lenb+1; $i++) {
$cnt = 0;
for (my $j = 0; $j <= $#b; $j++) {
$cnt++ unless $a[$i+$j] eq $b[$j];
next OUTER if $cnt > $limit;
}
my $sub = substr($a,$i,$lenb);
# push @substrings, $sub; # alternate output
print "match '$sub' at offset $i\n";
}

__END__


matching
'abcdefbacdefabbdefaaaaacdxfaaacdefcdefbacdefabbdefaaaaacdxfaaacdefaacdxfaacdfx'
against 'aacdxf'
match 'abcdef' at offset 0
match 'bacdef' at offset 6
match 'aacdxf' at offset 21
match 'aacdef' at offset 28
match 'bacdef' at offset 38
match 'aacdxf' at offset 53
match 'aacdef' at offset 60
match 'aacdxf' at offset 66
match 'aacdfx' at offset 72

---------------------------------------------------------------
Sorry about the line wrap
--
"Usenet is like a herd of performing elephants with diarrhea:
massive, difficult to redirect, awe-inspiring, entertaining, and a
source of mind-boggling amounts of excrement when you least
expect it." (Gene Spafford, 1992).

Peng Yu

unread,
Nov 20, 2008, 9:28:29 AM11/20/08
to
On Nov 20, 8:07 am, Pilcrow <pilcr...@gmail.com> wrote:
> On Wed, 19 Nov 2008 15:34:17 -0800 (PST), Peng Yu <PengYu...@gmail.com>

If $a could be very long, say, of the order of MB, will this approach
be much slower than the code given by sln?

Thanks,
Peng

Charlton Wilbur

unread,
Nov 20, 2008, 9:52:46 AM11/20/08
to
>>>>> "PY" == Peng Yu <Peng...@gmail.com> writes:

PY> BTW, I'd think that the perldoc be made more readable for
PY> newbies. I'm not sure whether other newbies would have the same
PY> feeling.

There's a tradeoff between making the perldocs more readable for newbies
and making them more information-dense for experts. There are other
sources of documentation for newbies, but precious few for experts, and
so I think the perldocs are at a very sweet spot.

Further, you can learn more about Perl by following up on everything in
the documentation that you don't understand. Indeed, that's how many of
us learned as much as we did.

Charlton

--
Charlton Wilbur
cwi...@chromatico.net

Peng Yu

unread,
Nov 20, 2008, 11:23:24 AM11/20/08
to
On Nov 20, 8:52 am, Charlton Wilbur <cwil...@chromatico.net> wrote:

> >>>>> "PY" == Peng Yu <PengYu...@gmail.com> writes:
>
>     PY> BTW, I'd think that the perldoc be made more readable for
>     PY> newbies. I'm not sure whether other newbies would have the same
>     PY> feeling.
>
> There's a tradeoff between making the perldocs more readable for newbies
> and making them more information-dense for experts.  There are other
> sources of documentation for newbies, but precious few for experts, and
> so I think the perldocs are at a very sweet spot.
>
> Further, you can learn more about Perl by following up on everything in
> the documentation that you don't understand.  Indeed, that's how many of
> us learned as much as we did.

I total understand what you mean.

At this moment, I only be interested in understanding how @- works. I
tried to read that perldoc page, but I feel it is hard to follow.
Since that section of perldocs is not very readable to newbies, I'm
wondering if there is any more introductory material that explains it
with more examples.

I tried to look the more introductory materials on @- online but I
don't find any.

I think as I understand perl more, I shall be able to read perldoc
more easily.

Thanks,
Peng

John W. Krahn

unread,
Nov 20, 2008, 1:06:33 PM11/20/08
to
Pilcrow wrote:
> On Wed, 19 Nov 2008 15:34:17 -0800 (PST), Peng Yu <Peng...@gmail.com>
> wrote:
>>
>> Suppose I have a long string $a, and a test string $b.
>>
>> I want to fine all the substrings in $a, whose length is the same as
>> $b with at most n mismatches.
>>
>> For example, string 'abcdef' and string 'aacdxf' have two mismatches
>> at the 2nd character and the 5th character.
>>
>> I'm wondering if this can be done easily in perl. Can I use regular
>> expression to solve this problem?
>
> Not too hard the simple way.
> -------------------------------------------------------------------
> #!/usr/bin/perl
> use strict; use warnings;
>
> my $a =
> "abcdefbacdefabbdefaaaaacdxfaaacdefcdefbacdefabbdefaaaaacdxfaaacdefaacdxfaacdfx";
> my $b = "aacdxf";
>
> my @a = split //,$a;
> my @b = split //,$b;
> my $limit = 2;
> my $cnt;
> my $lenb = length $b;
> my @substrings = ();
>
> print "\nmatching '$a' against '$b'\n";
>
> OUTER:
> for (my $i = 0; $i <= $#a-$lenb+1; $i++) {
> $cnt = 0;
> for (my $j = 0; $j <= $#b; $j++) {
> $cnt++ unless $a[$i+$j] eq $b[$j];
> next OUTER if $cnt > $limit;
> }
> my $sub = substr($a,$i,$lenb);
> # push @substrings, $sub; # alternate output
> print "match '$sub' at offset $i\n";
> }

Or more simpler as:

#!/usr/bin/perl
use strict;
use warnings;

my $a =
'abcdefbacdefabbdefaaaaacdxfaaacdefcdefbacdefabbdefaaaaacdxfaaacdefaacdxfaacdfx';
my $b = 'aacdxf';

my $limit = 2;


my $lenb = length $b;

print "\nmatching '$a' against '$b'\n";

while ( $a =~ /(?=(.{$lenb}))/sog ) {
next if ( $1 ^ $b ) =~ tr/\0//c > $limit;
print "match '$1' at offset $-[0]\n";
}

__END__

A. Sinan Unur

unread,
Nov 20, 2008, 4:46:27 PM11/20/08
to
Peng Yu <Peng...@gmail.com> wrote in
news:8f31f2be-e332-4d6d...@o2g2000yqd.googlegroups.com:

> On Nov 20, 12:19 am, "John W. Krahn" <some...@example.com> wrote:
>> Peng Yu wrote:
>> > On Nov 19, 7:55 pm, s...@netherlands.com wrote:
>>

...

>> > I can not find what '$-[$i]' means in your code on online perl
>> > documentation. Would you please let me know its meaning? Where is
>> > its documentation?
>>
>> You can find out about the @- array in:
>>
>> perldoc perlvar
>
> It seems that the perldoc is not easy to understand by a newbie. While
> it explains @-, the perldoc keeps on referring to others things that I
> don't understand. Can somebody help explain @- for me with more
> examples?

I looked at the perldoc perlvar page to remind myself what it said. I am
unable to figure out how one could make it clearer:

<blockquote>
This array holds the offsets of the beginnings of the last
successful submatches in the currently active dynamic scope.
$-[0] is the offset into the string of the beginning of the
entire match. The *n*th element of this array holds the offset
of the *n*th submatch, so $-[1] is the offset where $1 begins,
$-[2] the offset where $2 begins, and so on.
</blockquote>

Consider the example below:

C:\DOCUME~1\asu1\LOCALS~1\Temp> cat t.pl
#!/usr/bin/perl

use strict;
use warnings;

my $s = '123abcABC';

if ( $s =~ /^([0-9]+([0-9]))[a-z]+([A-Z]([A-Z]+))$/ ) {
print "@-\n";
print "$_\n" for $1, $2, $3, $4;
}

__END__

In this case, a successful match would produce

$1 = 123 (offset = 0)
$2 = 3 (offset = 2)
$3 = ABC (offset = 6)
$4 = BC (offset = 7)

The entire match starts at offset 0, so we have @- = (0, 0, 2, 6, 7).

Sinan
--
A. Sinan Unur <1u...@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

Dr.Ruud

unread,
Nov 20, 2008, 6:14:52 PM11/20/08
to
Peng Yu schreef:

> Suppose I have a long string $a, and a test string $b.
>
> I want to fine all the substrings in $a, whose length is the same as
> $b with at most n mismatches.

Use substr() and the "^" operator and y/\0//c.


> For example, string 'abcdef' and string 'aacdxf' have two mismatches
> at the 2nd character and the 5th character.

$ perl -wle'
$r = "abcdef" ^ "aacdxf";
print $r =~ y/\0//c
'
2

--
Affijn, Ruud

"Gewoon is een tijger."

s...@netherlands.com

unread,
Nov 20, 2008, 6:52:15 PM11/20/08
to

Yes, this is by far the fastest way. It benches 1/3 of the time
of using arrays and substr. Although, traversing with substr's and
using the bitwise ^ strings with tr///c might be faster, if only
by a little, its worth investigating. I personally think using the
regex engine for this is faster.

This is a prime example where bit-wise operators rule over string
comparisons... pure math.

If the problem were to spill over to multiple character comparisons
this won't work but the OP didn't say that.

Fast indeed!

I looked over this pretty good. I got a question about that look-ahead
zero-width assertion.

I noticed the capture group is in the assertion and it only advances the
regexp ordinal position by one.

Example:

/(.{$lenb})/ advances 6
/(?=(.{$lenb}))/ advances 1

I looked in perlre,perlretut and there is only one mention of capture within
a "non-capture zero-width assertion" extended expression elements.
That was just a side from some example on (?>..), and that just mentioned,
the ordinal position is adjusted with a (?=(regex)) analogy.

Do you know why this is?


sln

Dr.Ruud

unread,
Nov 20, 2008, 7:27:35 PM11/20/08
to
Peng Yu schreef:

Yes, a regular expression is also possible. First write code to create
the regex string, for your example it can be done like this:

perl -wle'
my ($b, @b) = "aacdxf";
for my $i (0 .. 4) {
substr(my $bi = $b, $i, 1, ".");
for my $j ($i+1 .. 5) {
substr(my $bj = $bi, $j, 1, ".");
push @b, $bj;
}
}
my $re = join("\n | ", @b);
$re = qr/
( $re
)
/x;
print $re;
'
(?x-ism:
( ..cdxf
| .a.dxf
| .ac.xf
| .acd.f
| .acdx.
| a..dxf
| a.c.xf
| a.cd.f
| a.cdx.
| aa..xf
| aa.d.f
| aa.dx.
| aac..f
| aac.x.
| aacd..
)
)

s...@netherlands.com

unread,
Nov 20, 2008, 7:33:41 PM11/20/08
to

Follow-up:

Below is a modified version that chops the time in half.

The fastest solution conforming to the original problem statement
is from John Krahn. Because you can't fight math when posing
1-dimensional problems like this.

However, I'm posting this modified version incase the problem set
expands to multi-character heuristics.

The modifications include staying away from the @- array altogether.
In my opinion, it is just a eval{} in disguise anyway, don't know
don't care. Position is set every time. The pos() function is free,
its virtually no overhead. Reading @-, the patter/sub-pattern matching
position array, on the otherhand, incurrs 5-10 times the overhead.
Its to be avoided at all costs on performance related regexp issues.

Staying away from the @- array but keeping the algorithym means doing
something thats not supposed to be done, but it works:

$i = 1; # is not a reference
$$i # $1

I didn't see any side affects except you have to manage the pos() each time.
See: FAQ 7.29: How can I use a variable as a variable name?

That being said Expanding notation to multi-character substrings
and to add some intelligence (possibly) to dynamically generated regexp's,
might be a help in the future.

For instance, you could expand the definition of matching items like so:

@terms = (
# string - required - generated rxterm
# ----------------------------------------
'Merry ' , 1, # (?:Merry )
'C', , 0, # (?:C|(.))
'h', , 0, # (?:h|(.))
'r', , 0, # (?:r|(.))
'i', , 0, # (?:i|(.))
's', , 0, # (?:s|(.))
't', , 0, # (?:t|(.))
'mas', , 0, # (?:mas|(.{3}))
);

But then your getting into heuristics.
Good luck!

sln

--------------------------------

use strict;
use warnings;

use Benchmark ':hireswallclock';

my ($t0,$t1,$tdif);

my $pattern = 'aacdxf';
my $misses = 2;

my @tmp = split '',$pattern;
my $pstr;
for (@tmp) {
$pstr .= "(?:$_|(.))";
}
$pstr = '('.$pstr.')';

my $rxpattern = qr/$pstr/i;
my $extent = @tmp+1;
my ($i,$cnt,$lastpos) = (0,0,0);

print @tmp,"\n",$rxpattern,"\n\n";

my $fname = 'c:\temp\5MEG_FILE.txt';
open my $fh, $fname or die "can't open $fname...";

$t0 = new Benchmark;

while (<$fh>)
{
chomp;
$lastpos = 0;

while ( /$rxpattern/g )
{
$cnt = 0;
for $i (2..$extent) {
last if (($cnt += defined( $$i)) > $misses);
}
if ($cnt <= $misses) {


print "$cnt bad chars, but found a close match: '$1'\n";
}

pos() = ++$lastpos;
}
}
$t1 = new Benchmark;

close $fh;

$tdif = timediff($t1, $t0);
print "the code took:",timestr($tdif),"\n";

__END__

Peng Yu

unread,
Nov 20, 2008, 9:40:39 PM11/20/08
to
On Nov 20, 3:46 pm, "A. Sinan Unur" <1...@llenroc.ude.invalid> wrote:

Thank you the example. I see what it means. In general, I think the
document can be made easier to read with more examples.

I agree that adding more examples would make the manual to long to be
read by experts. But I think that this can be solved by having two
sets of document---one is with examples, the other is without
examples.

Thanks,
Peng

Jürgen Exner

unread,
Nov 20, 2008, 10:06:24 PM11/20/08
to
Peng Yu <Peng...@gmail.com> wrote:
>On Nov 20, 3:46 pm, "A. Sinan Unur" <1...@llenroc.ude.invalid> wrote:
>> Peng Yu <PengYu...@gmail.com> wrote innews:8f31f2be-e332-4d6d...@o2g2000yqd.googlegroups.com:
>> > It seems that the perldoc is not easy to understand by a newbie.

It is a reference, concise and accurate (as much as possible), it is not
and never will be a tutorial.

>> > While
>> > it explains @-, the perldoc keeps on referring to others things that I
>> > don't understand. Can somebody help explain @- for me with more
>> > examples?

Then you will have to look for a tutorial or dig down into those other
things. Very often it is not possible or not desirable to explain
something without making references to other concepts. And @- is
certainly not the right place to explain grouping in regular
expressions.

>> I looked at the perldoc perlvar page to remind myself what it said. I am
>> unable to figure out how one could make it clearer:

>Thank you the example. I see what it means. In general, I think the


>document can be made easier to read with more examples.


But there are examples in that section:
<quote>
After a match against some variable $var:

"$`" is the same as "substr($var, 0, $-[0])"
"$&" is the same as "substr($var, $-[0], $+[0] - $-[0])"
"$'" is the same as "substr($var, $+[0])"
"$1" is the same as "substr($var, $-[1], $+[1] - $-[1])"
"$2" is the same as "substr($var, $-[2], $+[2] - $-[2])"
"$3" is the same as "substr $var, $-[3], $+[3] - $-[3])"
</quote>


>I agree that adding more examples would make the manual to long to be
>read by experts. But I think that this can be solved by having two
>sets of document---one is with examples, the other is without
>examples.

Go ahead and write the fivehundredandnintysecond tutorial about Perl.
You will find that either you are repeating yourself all over the place
or you start using references all over the place because Perl's concept
are so intertwined and cannot be explained stand-alone.

Not to mention the ensuing maintenance nightmare when you have to fix
something all over the place.

jue

s...@netherlands.com

unread,
Nov 20, 2008, 10:52:15 PM11/20/08
to
On Thu, 20 Nov 2008 19:06:24 -0800, Jürgen Exner <jurg...@hotmail.com> wrote:

>Peng Yu <Peng...@gmail.com> wrote:
>>On Nov 20, 3:46 pm, "A. Sinan Unur" <1...@llenroc.ude.invalid> wrote:
>>> Peng Yu <PengYu...@gmail.com> wrote innews:8f31f2be-e332-4d6d...@o2g2000yqd.googlegroups.com:
>>> > It seems that the perldoc is not easy to understand by a newbie.
>
>It is a reference, concise and accurate (as much as possible), it is not
>and never will be a tutorial.

What are you defending, a reference?? A reference is and has always been
another word for a POS! It is just a list of prototypes.
A reference has been and always will be nothing of significance at all.
Used by somebody who already knows it all to adjust for typo's and syntax.
I hope those people don't actually write the referece.

>
>>> > While
>>> > it explains @-, the perldoc keeps on referring to others things that I
>>> > don't understand. Can somebody help explain @- for me with more
>>> > examples?
>
>Then you will have to look for a tutorial or dig down into those other
>things. Very often it is not possible or not desirable to explain
>something without making references to other concepts. And @- is
>certainly not the right place to explain grouping in regular
>expressions.

Why the f*ck not? Can't explain something you have no idea where you
learned it from, then blowhard to someone else needing answer's? Get off man..

>
>>> I looked at the perldoc perlvar page to remind myself what it said. I am
>>> unable to figure out how one could make it clearer:
>
>>Thank you the example. I see what it means. In general, I think the
>>document can be made easier to read with more examples.
>
>
>But there are examples in that section:
><quote>
> After a match against some variable $var:
>
> "$`" is the same as "substr($var, 0, $-[0])"
> "$&" is the same as "substr($var, $-[0], $+[0] - $-[0])"
> "$'" is the same as "substr($var, $+[0])"
> "$1" is the same as "substr($var, $-[1], $+[1] - $-[1])"
> "$2" is the same as "substr($var, $-[2], $+[2] - $-[2])"
> "$3" is the same as "substr $var, $-[3], $+[3] - $-[3])"
></quote>
>
>
>>I agree that adding more examples would make the manual to long to be
>>read by experts. But I think that this can be solved by having two
>>sets of document---one is with examples, the other is without
>>examples.
>
>Go ahead and write the fivehundredandnintysecond tutorial about Perl.
>You will find that either you are repeating yourself all over the place
>or you start using references all over the place because Perl's concept
>are so intertwined and cannot be explained stand-alone.
>

You my friend should not be involved in Perl. You have cactus up your
ass, every move you make when reading posts on this group drive's that
POINT home brother!!!

>Not to mention the ensuing maintenance nightmare when you have to fix
>something all over the place.
>
>jue

Go fly a friggin kite friend..


sln

Pilcrow

unread,
Nov 20, 2008, 11:38:01 PM11/20/08
to
On Thu, 20 Nov 2008 10:06:33 -0800, "John W. Krahn"
<som...@example.com> wrote:

Very elegant and fast, but you don't need all the modifiers /sog to the
regex, just /g will do.


s...@netherlands.com

unread,
Nov 20, 2008, 11:54:40 PM11/20/08
to

Actually, he needs 'so' as well. If this is used in a multi-lined
way, the '.' needs the 's' modifier if it is possible a '\n' is encounterred
and could be part of the pattern.
The 'o' suggests to not recompile, which might happen, don't know, anything is
possible. By including both, no harm, ho foul.

sln

Peng Yu

unread,
Nov 21, 2008, 12:08:13 AM11/21/08
to
On Nov 20, 9:06 pm, Jürgen Exner <jurge...@hotmail.com> wrote:

I didn't understand the above examples because it refers to $-[0],
which was exactly what I wanted to understand.

> >I agree that adding more examples would make the manual to long to be
> >read by experts. But I think that this can be solved by having two
> >sets of document---one is with examples, the other is without
> >examples.
>
> Go ahead and write the fivehundredandnintysecond tutorial about Perl.
> You will find that either you are repeating yourself all over the place
> or you start using references all over the place because Perl's concept
> are so intertwined and cannot be explained stand-alone.
>
> Not to mention the ensuing maintenance nightmare when you have to fix
> something all over the place.

I agree your points. But I think that the perldoc shall have as less
cyclic references as possible. If the example that you mentioned in
the perldoc is not cyclic referred, it would easier to read.

Thanks,
Peng

Sherm Pendley

unread,
Nov 21, 2008, 12:58:00 AM11/21/08
to
Peng Yu <Peng...@gmail.com> writes:

> On Nov 20, 9:06 pm, Jürgen Exner <jurge...@hotmail.com> wrote:
>>
>> But there are examples in that section:
>> <quote>
>>             After a match against some variable $var:
>>
>>             "$`" is the same as "substr($var, 0, $-[0])"
>>             "$&" is the same as "substr($var, $-[0], $+[0] - $-[0])"
>>             "$'" is the same as "substr($var, $+[0])"
>>             "$1" is the same as "substr($var, $-[1], $+[1] - $-[1])"
>>             "$2" is the same as "substr($var, $-[2], $+[2] - $-[2])"
>>             "$3" is the same as "substr $var, $-[3], $+[3] - $-[3])"
>> </quote>
>
> I didn't understand the above examples because it refers to $-[0],
> which was exactly what I wanted to understand.

Arrays are referred to with @, but array *elements* with $. So, given
the array @-, its first element is $-[0].

sherm--

--
My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net

ilovelinux

unread,
Nov 21, 2008, 6:16:49 AM11/21/08
to
On 21 nov, 04:54, s...@netherlands.com wrote:
> >>#!/usr/bin/perl
> >>use strict;
> >>use warnings;
>
> >>my $a =
> >>'abcdefbacdefabbdefaaaaacdxfaaacdefcdefbacdefabbdefaaaaacdxfaaacdefaacdxfaacdfx';
> >>my $b = 'aacdxf';
>
> >>my $limit = 2;
> >>my $lenb = length $b;
>
> >>print "\nmatching '$a' against '$b'\n";
>
> >>while ( $a =~ /(?=(.{$lenb}))/sog ) {
> >>     next if ( $1 ^ $b ) =~ tr/\0//c > $limit;
> >>     print "match '$1' at offset $-[0]\n";
> >>     }
>
> >>__END__
>
> >Very elegant and fast, but you don't need all the modifiers /sog to the
> >regex, just /g will do.
>
> Actually, he needs 'so' as well. If this is used in a multi-lined
> way, the '.' needs the 's' modifier if it is possible a '\n' is encounterred
> and could be part of the pattern.
> The 'o' suggests to not recompile, which might happen, don't know, anything is
> possible. By including both, no harm, ho foul.

Harm will be done using the 'o'nce option, when the code is
transferred to a subroutine and the regexp will be called with $b's of
different length, because the regexp will be compiled with the first
$lenb encountered and never change.

Better put the regexp into a variable and use this in the while ():

my $re = qr/(?=(.{$lenb}))/s;
while ( $a =~ /$re/g ) {

HTH

A. Sinan Unur

unread,
Nov 21, 2008, 6:51:34 AM11/21/08
to
Peng Yu <Peng...@gmail.com> wrote in
news:255734ce-9bcf-4445...@y18g2000yqn.googlegroups.com:

> On Nov 20, 9:06 pm, Jürgen Exner <jurge...@hotmail.com> wrote:

<snip>

>> But there are examples in that section:
>> <quote>
>>             After a match against some variable $var:
>>
>>             "$`" is the same as "substr($var, 0, $-[0])"
>>             "$&" is the same as "substr($var, $-[0], $+[0] -
> $-[0])"
>>             "$'" is the same as "substr($var, $+[0])"
>>             "$1" is the same as "substr($var, $-[1], $+[1] -
> $-[1])"
>>             "$2" is the same as "substr($var, $-[2], $+[2] -
> $-[2])"
>>             "$3" is the same as "substr $var, $-[3], $+[3] -
> $-[3])"
>> </quote>
>
> I didn't understand the above examples because it refers to $-[0],
> which was exactly what I wanted to understand.

Are you joking? How can one explain what @- is without explaining the
meaning of its elements?

You do know that $x[1] refers to the second element of @x, right?

> I agree your points. But I think that the perldoc shall have as less
> cyclic references as possible. If the example that you mentioned in
> the perldoc is not cyclic referred, it would easier to read.

I don't see such cyclic references in the documentation for @-.

Peter J. Holzer

unread,
Nov 23, 2008, 4:47:35 AM11/23/08
to
On 2008-11-21 05:08, Peng Yu <Peng...@gmail.com> wrote:
> On Nov 20, 9:06 pm, Jürgen Exner <jurge...@hotmail.com> wrote:
>> Peng Yu <PengYu...@gmail.com> wrote:
>> >On Nov 20, 3:46 pm, "A. Sinan Unur" <1...@llenroc.ude.invalid> wrote:
>> >> I looked at the perldoc perlvar page to remind myself what it said. I am
>> >> unable to figure out how one could make it clearer:

Did you quote from a newer version than 5.10.0? It's slightly different
in my docs (only checked 5.8.8 and 5.10.0).

>> >Thank you the example. I see what it means. In general, I think the
>> >document can be made easier to read with more examples.
>>
>> But there are examples in that section:
>> <quote>
>>             After a match against some variable $var:
>>
>>             "$`" is the same as "substr($var, 0, $-[0])"
>>             "$&" is the same as "substr($var, $-[0], $+[0] - $-[0])"
>>             "$'" is the same as "substr($var, $+[0])"
>>             "$1" is the same as "substr($var, $-[1], $+[1] - $-[1])"
>>             "$2" is the same as "substr($var, $-[2], $+[2] - $-[2])"
>>             "$3" is the same as "substr $var, $-[3], $+[3] - $-[3])"
>> </quote>
>
> I didn't understand the above examples because it refers to $-[0],
> which was exactly what I wanted to understand.

If it didn't contain $-[0], it wouldn't be an example for the use of
$-[0], would it?

> I agree your points. But I think that the perldoc shall have as less
> cyclic references as possible. If the example that you mentioned in
> the perldoc is not cyclic referred, it would easier to read.

I don't see any cyclic references in the section about @-. $-[0] is
explained in the very first sentence:

@- $-[0] is the offset of the start of the last successful match.

To understand the examples, it helps to know what $`, $&, etc. are.
These are explained elsewhere in the same document, and none of them
is explained by referring to @-. No cycles there.

hp

A. Sinan Unur

unread,
Nov 23, 2008, 8:44:02 AM11/23/08
to
"Peter J. Holzer" <hjp-u...@hjp.at> wrote in
news:slrngii9lo.3q...@hrunkner.hjp.at:

>>> >On Nov 20, 3:46 pm, "A. Sinan Unur" <1...@llenroc.ude.invalid>
>>> >wrote:
>>> >> I looked at the perldoc perlvar page to remind myself what it
>>> >> said. I am unable to figure out how one could make it clearer:
>
> Did you quote from a newer version than 5.10.0? It's slightly
> different in my docs (only checked 5.8.8 and 5.10.0).

C:\Documents and Settings\asu1> perl -v

This is perl, v5.10.0 built for MSWin32-x86-multi-thread
(with 5 registered patches, see perl -V for more detail)

Copyright 1987-2007, Larry Wall

Binary build 1004 [287188] provided by ActiveState
http://www.ActiveState.com
Built Sep 3 2008 13:16:37

s...@netherlands.com

unread,
Nov 23, 2008, 1:54:21 PM11/23/08
to

I never used the /o modifier. Tested it a long time ago and found it
squirley. I only use qr//.

Yes, you are right, the 'o' option lasts the life of the script.
This defies all logic, where is this regexp stored. Is it related to
variables, even lexical ones? This has to be a runtime association on
the first $pattern 'named' variable, even to nested scope lexicals?
I guess its possible.

Over the "life of the script" is all thats mentioned. I think this has
strictly limited use, basically a one time usage.

Docs:
"If $pattern won't be changing over the lifetime of the script,
we can add the //o modifier, which directs perl to only perform
variable substitutions once"

Thanks!

sln

----------------

my $pattern = 'this';
print getmatch_o('this is a test',$pattern)."\n";
$pattern = 'is';
print getmatch_o('this is a test',$pattern)."\n";

sub getmatch_o
{
my ($target,$pattern) = @_;
print $pattern."\n";
my $ttt = $pattern;
my $ggg = $target;
return $1 if ($ggg =~ /($ttt)/o);

return '';
}
__END__

this
this
is
this


s...@netherlands.com

unread,
Nov 23, 2008, 2:00:48 PM11/23/08
to
On Thu, 20 Nov 2008 23:52:15 GMT, s...@netherlands.com wrote:

>On Thu, 20 Nov 2008 10:06:33 -0800, "John W. Krahn" <som...@example.com> wrote:
>

[snip]


>>while ( $a =~ /(?=(.{$lenb}))/sog ) {

[snip]


>
>I noticed the capture group is in the assertion and it only advances the
>regexp ordinal position by one.
>
>Example:
>
>/(.{$lenb})/ advances 6
>/(?=(.{$lenb}))/ advances 1
>
>I looked in perlre,perlretut and there is only one mention of capture within
>a "non-capture zero-width assertion" extended expression elements.
>That was just a side from some example on (?>..), and that just mentioned,
>the ordinal position is adjusted with a (?=(regex)) analogy.
>
>Do you know why this is?
>
>
>sln

Does anybody know?

sln

0 new messages