Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[perl #24704] Regex replace looses chars

0 views
Skip to first unread message

perlbug-...@perl.org

unread,
Dec 19, 2003, 4:16:26 PM12/19/03
to bugs-bi...@netlabs.develooper.com
# New Ticket Created by perl-...@ton.iguana.be
# Please include the string: [perl #24704]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=24704 >

This is a bug report for perl from perl-...@ton.iguana.be,
generated with the help of perlbug 1.34 running under perl v5.8.0.


-----------------------------------------------------------------
[Please enter your report here]

perl -le '$_="65x"; print; s/65/chr/e; print; print unpack("H*", $_)'
65x
Ax
4178

This is normal.

perl -le '$_="x65x"; s/x//; print; s/65/chr/e; print; print unpack("H*", $_)'
65x
A
4100

Where did the second x go ????

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=low
---
Site configuration information for perl v5.8.0:

Configured by ton at Tue Nov 12 01:56:18 CET 2002.

Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
Platform:
osname=linux, osvers=2.4.19, archname=i686-linux-thread-multi-64int-ld
uname='linux quasar 2.4.19 #5 wed oct 2 02:34:25 cest 2002 i686 unknown '
config_args=''
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=define use64bitall=undef uselongdouble=define
usemymalloc=y, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -fomit-frame-pointer',
cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='2.95.3 20010315 (release)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -lndbm -ldb -ldl -lm -lpthread -lc -lposix -lcrypt -lutil
perllibs=-lnsl -ldl -lm -lpthread -lc -lposix -lcrypt -lutil
libc=/lib/libc-2.2.4.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.2.4'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:

---
@INC for perl v5.8.0:
/usr/lib/perl5/5.8.0/i686-linux-thread-multi-64int-ld
/usr/lib/perl5/5.8.0
/usr/lib/perl5/site_perl/5.8.0/i686-linux-thread-multi-64int-ld
/usr/lib/perl5/site_perl/5.8.0
/usr/lib/perl5/site_perl
.

---
Environment for perl v5.8.0:
HOME=/home/ton
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/ton/bin.Linux:/home/ton/bin:/home/ton/bin.SampleSetup:/usr/local/bin:/usr/local/sbin:/usr/local/jre/bin:/home/oracle/product/9.0.1/bin:/usr/local/ar/bin:/usr/games/bin:/usr/X11R6/bin:/usr/share/bin:/usr/bin:/usr/sbin:/bin:/sbin:.
PERL_BADLANG (unset)
SHELL=/bin/bash

Tels

unread,
Dec 20, 2003, 3:56:11 PM12/20/03
to via RT, perl5-...@perl.org
-----BEGIN PGP SIGNED MESSAGE-----

Moin,

>[Please enter your report here]
>
>perl -le '$_="65x"; print; s/65/chr/e; print; print unpack("H*", $_)'
>65x
>Ax
>4178
>
>This is normal.
>
>perl -le '$_="x65x"; s/x//; print; s/65/chr/e; print; print unpack("H*", $_)'
>65x
>A
>4100
>
>Where did the second x go ????

Confirmed on v5.8.1 and v5.8.2.

Cheers,

Tels

- --
Signed on Sat Dec 20 21:55:18 2003 with key 0x93B84C15.
Visit my photo gallery at http://bloodgate.com/photos/
PGP key on http://bloodgate.com/tels.asc or per email.

"Call me Justin, Justin Case."

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux)
Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.

iQEVAwUBP+S3encLPEOTuEwVAQGE7Af+PdAqt3NEBJHj3JXMYeXy6Abjs21BPNQM
g0VWYWXn+8zXi5qvpLEPFP1cUOqVTVaqC4C/T4gLI60utdNdr9JUZT47nftJk9Bz
8Ct47wf5tiOSzJt4qjFKJ8bhIkA0GcDNwiEGLXkoCM1GdtyAbeGKs9KfZSyuLews
4CcPrSiViPdK7WPr/aV5CX3fF3mfQ7sIQ/hI3hU+EKQmoOlpH0AI246IwHw9MhF5
5xm8ikikGAh3nIyDt65Da6m2IKPaT0SZNEv06hr7mkDG8jO8EkdvvbqhZVZ/kwAn
PXm1C8QbCotslQ5R/x7OA7EEnXmmqebmdBmpzP62SOoO6Hyqyi9rMw==
=2GmU
-----END PGP SIGNATURE-----

Sadahiro Tomoyuki

unread,
Dec 21, 2003, 12:30:39 AM12/21/03
to perl5-...@perl.org

On Sun, 21 Dec 2003 14:14:53 +0900
SADAHIRO Tomoyuki <bqw1...@nifty.com> wrote:

> I tested with various numbers of bytes before and after "65",
> where initial value of $_ would be "ab65z", "abc65xyz", and so on.
>
> The offset value is set to $numOfCharsBefore65. In this case,
> if $numOfCharsBefore65 >= $numOfCharsAfter65, this problem occurs.

Sorry, "if $numOfCharsBefore65 <= $numOfCharsAfter65" is correct.
~~~~

> #!perl
> for my $numOfCharsBefore65 (1..5) {
> for my $numOfCharsAfter65 (1..5) {
> my $prefix = substr("abcdefghijklm", 0, $numOfCharsBefore65);
> my $suffix = substr("nopqrstuvwxyz", -$numOfCharsAfter65);
> print "$numOfCharsBefore65,$numOfCharsAfter65\t";
>
> $_ = $prefix.'65'.$suffix;
> substr($_,0,$numOfCharsBefore65,''); # remove $prefix
> print "$_\t"; # should be '65'.$suffix
> s/65/chr/e;
> print "$_\t";
> print unpack("H*", $_), "\n";
> }
> }
> __END__
>
> 1,1 65z A 4100
> 1,2 65yz Az 417a00
> 1,3 65xyz Ayz 41797a00
> 1,4 65wxyz Axyz 4178797a00
> 1,5 65vwxyz Awxyz 417778797a00
> 2,1 65z Az 417a
> 2,2 65yz A z 41007a
> 2,3 65xyz Az z 417a007a
> 2,4 65wxyz Ayz z 41797a007a
> 2,5 65vwxyz Axyz z 4178797a007a
> 3,1 65z Az 417a
> 3,2 65yz Ayz 41797a
> 3,3 65xyz A yz 4100797a
> 3,4 65wxyz Az yz 417a00797a
> 3,5 65vwxyz Ayz yz 41797a00797a
> 4,1 65z Az 417a
> 4,2 65yz Ayz 41797a
> 4,3 65xyz Axyz 4178797a
> 4,4 65wxyz A xyz 410078797a
> 4,5 65vwxyz Az xyz 417a0078797a
> 5,1 65z Az 417a
> 5,2 65yz Ayz 41797a
> 5,3 65xyz Axyz 4178797a
> 5,4 65wxyz Awxyz 417778797a
> 5,5 65vwxyz A wxyz 41007778797a

----
Regards,
SADAHIRO Tomoyuki

Sadahiro Tomoyuki

unread,
Dec 21, 2003, 12:14:53 AM12/21/03
to perl5-...@perl.org

> -----------------------------------------------------------------
> [Please enter your report here]
>
> perl -le '$_="65x"; print; s/65/chr/e; print; print unpack("H*", $_)'
> 65x
> Ax
> 4178
>
> This is normal.
>
> perl -le '$_="x65x"; s/x//; print; s/65/chr/e; print; print unpack("H*", $_)'
> 65x
> A
> 4100
>
> Where did the second x go ????
>
> [Please do not change anything below this line]

Instead of s/x///, substr($_,0,1,"") causes the same result.
It seems to be related to an offset OK string.

perl -le '$_="x65x"; substr($_,0,1,""); print; s/65/chr/e; print; print unpack("H*", $_)'
65x
A
4100

I tested with various numbers of bytes before and after "65",


where initial value of $_ would be "ab65z", "abc65xyz", and so on.

The offset value is set to $numOfCharsBefore65. In this case,
if $numOfCharsBefore65 >= $numOfCharsAfter65, this problem occurs.

#!perl

Marty Pauley

unread,
Dec 21, 2003, 5:39:50 PM12/21/03
to perl5-...@perl.org
The problem here is that the subst context sb_s is pointing to parts of
the PV, but the PVIV upgrade moves the text in the PV, so the subst
context sb_s is still pointing to where it used to be.

A simple (but not great) fix is to remove the OOK before we mess with
the sv.


--- pp_hot.c.orig 2003-12-21 03:35:10.000000000 +0000
+++ pp_hot.c 2003-12-21 22:32:56.000000000 +0000
@@ -2172,6 +2172,7 @@
RETURN;
}

+ SvOOK_off(TARG);
if (CALLREGEXEC(aTHX_ rx, s, strend, orig, 0, TARG, NULL,
r_flags | REXEC_CHECKED))
{

If you're feeling paranoid, a safer place to unOOK is much closer to the
start of pp_subst, just before we take the first pointer to the inside
of the PV.


--- pp_hot.c.orig 2003-12-21 03:35:10.000000000 +0000
+++ pp_hot.c 2003-12-21 22:34:10.000000000 +0000
@@ -1981,6 +1981,7 @@
DIE(aTHX_ PL_no_modify);
PUTBACK;

+ SvOOK_off(TARG);
s = SvPV(TARG, len);
if (!SvPOKp(TARG) || SvTYPE(TARG) == SVt_PVGV)
force_on_match = 1;


--
Marty

Nicholas Clark

unread,
Dec 21, 2003, 5:57:34 PM12/21/03
to perl5-...@perl.org
On Sun, Dec 21, 2003 at 10:39:50PM +0000, Marty Pauley wrote:
> If you're feeling paranoid, a safer place to unOOK is much closer to the
> start of pp_subst, just before we take the first pointer to the inside
> of the PV.

What do I do if I'm feeling really paranoid, and wondering if there are a
whole class of bugs like these?

[taking pointers to the insides of SVs that are then moved around because
of upgrading. Hack the core to create all scalars as OOK, and see which
tests fail?]

Nicholas Clark

Marty Pauley

unread,
Dec 21, 2003, 6:40:10 PM12/21/03
to perl5-...@perl.org
On Sun Dec 21 22:57:34 2003, Nicholas Clark wrote:
> On Sun, Dec 21, 2003 at 10:39:50PM +0000, Marty Pauley wrote:
> > If you're feeling paranoid, a safer place to unOOK is much closer to the
> > start of pp_subst, just before we take the first pointer to the inside
> > of the PV.
>
> What do I do if I'm feeling really paranoid, and wondering if there are a
> whole class of bugs like these?

I think there might be a few more like this. We could also get similar
problems when we grow the sv and cause the pv to change altogether. We
should get a segmentation fault in that case.

The thing that worried me the most about this one was that there were
cases that hit the bug but still 'worked': when the offset was larger
than the remainder of the string, the move wouldn't overwrite the old
part of the string.

> [taking pointers to the insides of SVs that are then moved around because
> of upgrading. Hack the core to create all scalars as OOK, and see which
> tests fail?]

That might be a good start. :-)

If we had a version of Move that filled the source with rubbish, we
might be able to find some more bugs. And if we changed sv_backoff and
sv_grow to release memory when they could, we might get some nice
segmentation faults to indicate bugs.

--
Marty

Nicholas Clark

unread,
Dec 21, 2003, 6:46:23 PM12/21/03
to perl5-...@perl.org

Strictly we'd want a version of Copy that filled the source with rubbish
(or bit flipped it it or something), as Copy is for non-overlapping
cases. Move would have to be more clever and selectively trash only the
non-overlapping parts.

But it's a suitably evil idea. If no-one gets there first I'll try it,
once I acquire some copious free time.

Nicholas Clark

Andreas J Koenig

unread,
Dec 22, 2003, 2:51:25 AM12/22/03
to SADAHIRO Tomoyuki, perl5-...@perl.org
If I rewrite the test to not use chr() but something that existed in
perl4, say

$_="65x"; s/65/$_-65/e; $res1 = $_;
$_="x65x"; s/x//; s/65/$_-65/e; $res2 = $_;
print $res1 eq $res2 ? "ok\n" : "not ok res1[$res1]res2[$res2]\n";

then I see the same result in perl4 as in bleed:

not ok res1[0x]res2[0]

--
andreas

0 new messages