This is a bug report for perl from bqw1...@nifty.com,
generated with the help of perlbug 1.34 running under perl v5.8.3.
-----------------------------------------------------------------
[Please enter your report here]
I'm noticed by a recent talk on utf8 upgrading here,
that SvPV should precede SvUTF8 or DO_UTF8
in order to cope with overloaded objects correctly.
As I look up pp.c briefly,
pp_length, pp_uc, pp_ucfirst, pp_lc, pp_lcfirst have
SvUTF8 or DO_UTF8 without preceding SvPV (or the like).
#!perl
package RefString;
sub new { my $p = shift; bless [@_], $p }
use overload
'""' => sub { shift->[0] },
'fallback' => 1;
package main;
# output a string as a sequence of hexadecimal codepoints.
sub strhex ($) { join " ", map sprintf("%04X", $_), unpack 'U*', shift }
# FF21-FF23 are Fullwidth latin capital A, B, C
# FF41-FF43 are Fullwidth latin small A, B, C
my $ov1 = new RefString pack('U*', 0xFF41,0xFF42,0xFF43);
print "length(): before SvPV : ";
print length($ov1), "\n";
print "length(): after SvPV : ";
print length($ov1), "\n";
print "length(): after SvPV : ";
print length($ov1), "\n";
my $ov2 = new RefString pack('U*', 0xFF41,0xFF42,0xFF43);
print "uc(): before SvPV : ";
print strhex uc($ov2), "\n";
print "uc(): after SvPV : ";
print strhex uc("$ov2"), "\n";
my $ov3 = new RefString pack('U*', 0xFF41,0xFF42,0xFF43);
print "ucfirst(): before SvPV : ";
print strhex ucfirst($ov3), "\n";
print "ucfirst(): after SvPV : ";
print strhex ucfirst("$ov3"), "\n";
my $ov4 = new RefString pack('U*', 0xFF21,0xFF22,0xFF23);
print "lc(): before SvPV : ";
print strhex lc($ov4), "\n";
print "lc(): after SvPV : ";
print strhex lc("$ov4"), "\n";
my $ov5 = new RefString pack('U*', 0xFF21,0xFF22,0xFF23);
print "lcfirst(): before SvPV : ";
print strhex lcfirst($ov5), "\n";
print "lcfirst(): after SvPV : ";
print strhex lcfirst("$ov5"), "\n";
__END__
Perl 5.8.3 and perl-current print like this.
I think outputs before SvPV should be same as those after SvPV.
length(): before SvPV : 9
length(): after SvPV : 3
length(): after SvPV : 3
uc(): before SvPV : FF41 FF42 FF43
uc(): after SvPV : FF21 FF22 FF23
ucfirst(): before SvPV : FF41 FF42 FF43
ucfirst(): after SvPV : FF21 FF42 FF43
lc(): before SvPV : FF21 FF22 FF23
lc(): after SvPV : FF41 FF42 FF43
lcfirst(): before SvPV : FF21 FF22 FF23
lcfirst(): after SvPV : FF41 FF22 FF23
Thank you.
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=medium
---
Site configuration information for perl v5.8.3:
Summary of my perl5 (revision 5 version 8 subversion 3) configuration:
Platform:
osname=MSWin32, osvers=4.0, archname=MSWin32-x86-multi-thread
uname=''
config_args='undef'
hint=recommended, useposix=true, d_sigaction=undef
usethreads=undef use5005threads=undef useithreads=define usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cl', ccflags ='-nologo -Gf -W3 -MD -DNDEBUG -O1 -DWIN32 -D_CONSOLE -DNO_STRICT -DHAVE_DES_FCRYPT -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX',
optimize='-MD -DNDEBUG -O1',
cppflags='-DWIN32'
ccversion='', gccversion='', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='link', ldflags ='-nologo -nodefaultlib -release -libpath:"c:\usr\perl\lib\CORE" -machine:x86'
libpth=C:\PROGRA~1\MICROS~4\VC98\lib
libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib
perllibs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib
libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl58.lib
gnulibc_version='undef'
Dynamic Linking:
dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -release -libpath:"c:\usr\perl\lib\CORE" -machine:x86'
Locally applied patches:
---
@INC for perl v5.8.3:
C:/usr/perl/lib
C:/usr/perl/site/lib
.
---
Environment for perl v5.8.3:
HOME (unset)
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=C:\PROGRA~1\MICROS~4\COMMON\MSDEV98\BIN;C:\PROGRA~1\MICROS~4\VC98\BIN;C:\PROGRA~1\MICROS~4\COMMON\TOOLS\WIN95;C:\PROGRA~1\MICROS~4\COMMON\TOOLS;\SYSTEM;C:\FJUTY;C:\WINDOWS;C:\WINDOWS\COMMAND;;C:\USR\BIN;C:\USR\PERL\BIN;;C:\USR\EXPAT\LIBS;;C:\USR\ICU\BIN;C:\USR\ICU\LIB;
PERL_BADLANG (unset)
SHELL (unset)
Do you think you found all the cases in pp.c? Would you be willing to
look through the other pp*.c files and perhaps do??.c also?
I see do_print has it backward.
I'm afraid they are not all in pp.c. If I had more time...
Certainly do_print has SvUTF8/DO_UTF8 before SvPV.
But sv_utf8_upgrade_flags and sv_utf8_downgrade don't cope with
(a mortal copy of) a overloaded reference since it isn't POK.
Therefore whether SvUTF8/DO_UTF8 cope with an overloaded reference
correctly or not, parhaps no difference should not appear at present.
(If automatic upgrade/downgrade on utf8 were croaked, this part
would be removed. I don't know how people feels about it...)
I found another related bug in pp_substr.
When I run the following codelet,
the first substr($s,0,0,$ov) does not turn on UTF8 flag of $s.
#!perl
package RefString;
sub new { my $p = shift; bless [@_], $p }
use overload '""' => sub { shift->[0] }, 'fallback' => 1;
package main;
{
my $ov = new RefString "\x{10000}";
my $s = "abc";
substr($s,0,0,$ov);
print $s eq "\x{10000}abc" # $s is not in utf8!
? "ok" : "not ok", "\n";
}
{
my $ov = new RefString "\x{10000}";
my $s = "abc";
substr($s,0,0,$ov);
print $s eq "\x{10000}abc" # $s is in utf8
? "ok" : "not ok", "\n";
}
__END__
not ok
ok
In the bad case, DO_UTF8(repl_sv) was true
but SvCUR(repl_sv) was *zero* where repl_sv was ROK.
ROK scalars with UTF8 turned on would be quite problematic, I think...
This is a proposed patch.
--- pp.c~ Wed Feb 18 04:50:20 2004
+++ pp.c Wed Mar 17 23:29:56 2004
@@ -2958,7 +2958,7 @@
if (num_args > 3) {
repl_sv = POPs;
repl = SvPV(repl_sv, repl_len);
- repl_is_utf8 = DO_UTF8(repl_sv) && SvCUR(repl_sv);
+ repl_is_utf8 = DO_UTF8(repl_sv) && repl_len;
}
len = POPi;
}
Regards,
SADAHIRO Tomoyuki
The theory was that it would be at least better than how things were before,
where only pp_stringify would keep the utf8 flag via sv_copypv.
> > Do you think you found all the cases in pp.c? Would you be willing to
> > look through the other pp*.c files and perhaps do??.c also?
>
> I'm afraid they are not all in pp.c. If I had more time...
I tried to see all SvUTF8() and DO_UTF8() in *.c files
at the top level perl directory.
I found do_vop() also has such DO_UTF8,
so this problem influences pp_bit_and, pp_bit_or, pp_bit_xor.
#!perl
package RefString;
sub new { my $p = shift; bless [@_], $p }
use overload '""' => sub { shift->[0] }, 'fallback' => 1;
package main;
#U+FF = "\xC3\xBF" in UTF8;
#0xC3 = 0b11000011
#0x0F = 0b00001111
my $ov1 = new RefString pack('U*', 0xFF);
printf "%02X\n", ord($ov1 & chr(0xF)); # '03' = 0xC3 & 0xF
printf "%02X\n", ord($ov1 & chr(0xF)); # '0F' = 0xFF & 0xF
my $ov2 = new RefString pack('U*', 0xFF);
printf "%02X\n", ord($ov2 | chr(0xF)); # 'CF' = 0xC3 | 0xF
printf "%02X\n", ord($ov2 | chr(0xF)); # 'FF' = 0xFF | 0xF
my $ov3 = new RefString pack('U*', 0xFF);
printf "%02X\n", ord($ov3 ^ chr(0xF)); # 'CC' = 0xC3 ^ 0xF
printf "%02X\n", ord($ov3 ^ chr(0xF)); # 'F0' = 0xFF ^ 0xF
__END__
Regards,
SADAHIRO Tomoyuki