Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

5.8.5 RC2

5 views
Skip to first unread message

Nicholas Clark

unread,
Jul 9, 2004, 6:45:42 AM7/9/04
to perl5-...@perl.org
Beeches are trees of the Genus Fagus, family Fagaceae, including about
ten species in Europe, Asia, and North America. The leaves are entire or
sparsely toothed. The fruit is a small, sharply-angled nut, borne in
pairs in spiny husks. The beech most commonly grown as an ornamental or
shade tree is the European beech (Fagus sylvatica).

The southern beeches belong to a different but related genus,
Nothofagus. They are found in Australia, New Zealand, New Guinea, New
Caledonia and South America.

(From Wikipedia)

It turned out that not all was rosy in the garden. Gtk2 failed its regression
tests due to changes made to Perl_sv_utf8_upgrade_flags, which Rafael found
after I'd uploaded RC1 to PAUSE. So there's now RC2:

http://opensource.fotango.com/~nclark/perl-5.8.5-RC2.tar.bz2

(or s/bz2$/gz/ if you really want a 25% larger download.)

coming soon to a CPAN mirror near you soon as

ftp://ftp.cpan.org/pub/CPAN/authors/id/N/NW/NWCLARK/perl-5.8.5-RC1.tar.bz2

Once it's propagated round the CPAN mirrors I'll make an announcement
on use.perl

The plan still is not to need another release candidate.

No plan ever survives contact with the enemy.
-- Field Marshall Helmuth Carl Bernard von Moltke


Nicholas Clark

Greg Matheson

unread,
Jul 11, 2004, 2:00:10 AM7/11/04
to perl5-...@perl.org
On Fri, 09 Jul 2004, Nicholas Clark wrote:

> http://opensource.fotango.com/~nclark/perl-5.8.5-RC2.tar.bz2

It builds on Win98 with MinGW-3.1.0 but only with the old tweaks to
ExtUtils::MakeMaker. Running in a cygwin shell, the tests hung for me
at comp/hints

--
Greg Matheson, Taiwan

Nick Ing-Simmons

unread,
Jul 12, 2004, 5:24:14 AM7/12/04
to ni...@ing-simmons.net, ni...@ccl4.org, perl5-...@perl.org
Nick Ing-Simmons <ni...@ing-simmons.net> writes:
>Nick Ing-Simmons <ni...@ing-simmons.net> writes:
>>
>>I am getting fails with Tk804.??? - which also seems to be UTF-8
>>related:
>>
>>Breakpoint 5, Perl_warner (my_perl=0x814f008, err=41,
>> pat=0x81405d8 "Use of uninitialized value%s%s") at util.c:1349
>>1349 va_start(args, pat);
>>(gdb) bt
>>#0 Perl_warner (my_perl=0x814f008, err=41,
>> pat=0x81405d8 "Use of uninitialized value%s%s") at util.c:1349
>>#1 0x080c445b in Perl_report_uninit (my_perl=0x814f008) at sv.c:611
>>#2 0x080cadbc in Perl_sv_2pv_flags (my_perl=0x814f008, sv=0x843743c,
>> lp=0xbfffe998, flags=2) at sv.c:3207
>>#3 0x080cb3c8 in Perl_sv_pvn_force_flags (my_perl=0x814f008, sv=0x843743c,
>> lp=0xbfffe998, flags=2) at sv.c:7475
>>#4 0x080cbd35 in Perl_sv_utf8_upgrade_flags (my_perl=0x814f008, sv=0x843743c,
>> flags=2) at sv.c:3495
>>#5 0x080d131b in Perl_sv_2pvutf8 (my_perl=0x814f008, sv=0x843743c,
>> lp=0xbfffe9f4) at sv.c:3357
>>#6 0x080d20c7 in Perl_sv_2pvutf8_nolen (my_perl=0x814f008, sv=0x843743c)
>> at sv.c:3340
>>#7 0x4030875f in Tcl_GetStringFromObj (objPtr=Variable "objPtr" is not available.
>>) at objGlue.c:499
>>#8 0x40308a7e in Tcl_GetString (objPtr=0x843743c) at objGlue.c:578
>>#9 0x40309a40 in ForceList (my_perl=0x814f008, interp=0x0, sv=0x843743c)
>> at objGlue.c:627
>>#10 0x40309f44 in MaybeForceList (my_perl=0x814f008, interp=0x0, sv=Variable "sv" is not available.
>>)
>> at objGlue.c:768
>>#11 0x4030a036 in Tcl_ListObjGetElements (interp=0x0, listPtr=0x843743c,
>>
>>Will look at it tomorrow...
>
>Further data - it seems SV is pPOK but not POK:
>
>(gdb) call Perl_sv_dump(my_perl,(SV *) 0x8437488)
>SV = PVMG(0x8435a60) at 0x8437488
> REFCNT = 1
> FLAGS = (pPOK,UTF8)
> IV = 0
> NV = 0
> PV = 0x8438380 "Helvetica -12 bold"\0
>
>That looks like one of Tk's own MAGICal vars, so may be my own fault.

Maybe, not sure. It turns out that this "upgrade" call is being done
by the mg_get() entry of the MAGIC vtable in question.
I can make _that_ message go away if I only do the upgrade
if (!SvUTF8(sv))
{
...
}

which is simple enough, but surley upgrade could do that itself?

With that change all is still not right,

#0 Perl_vcroak (my_perl=0x84374d0,
pat=0x81403a0 "Modification of a read-only value attempted",
args=0xbfffe8e4) at util.c:1146
#1 0x080ad74d in Perl_croak (my_perl=0x814f008,
pat=0x81403a0 "Modification of a read-only value attempted") at util.c:1250
#2 0x080c6b85 in Perl_sv_force_normal_flags (my_perl=0x814f008, sv=0x84374d0,
flags=0) at sv.c:4287
#3 0x080c6c95 in Perl_sv_force_normal (my_perl=0x814f008, sv=0x84374d0)
at sv.c:4308
#4 0x080cb4a9 in Perl_sv_pvn_force_flags (my_perl=0x814f008, sv=0x84374d0,
lp=0xbfffe988, flags=2) at sv.c:7464
#5 0x080cbd35 in Perl_sv_utf8_upgrade_flags (my_perl=0x814f008, sv=0x84374d0,
flags=2) at sv.c:3495
#6 0x080d131b in Perl_sv_2pvutf8 (my_perl=0x814f008, sv=0x84374d0,
lp=0xbfffe9e4) at sv.c:3357
#7 0x080d20c7 in Perl_sv_2pvutf8_nolen (my_perl=0x814f008, sv=0x84374d0)
at sv.c:3340
#8 0x4030877f in Tcl_GetStringFromObj (objPtr=Variable "objPtr" is not available.
) at objGlue.c:506

(gdb) call Perl_sv_dump(my_perl,sv)
SV = PVMG(0x8435aa8) at 0x84374d0
REFCNT = 1
FLAGS = (GMG,SMG,RMG,READONLY,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x84383c8 "Helvetica -12 bold"\0 [UTF8 "Helvetica -12 bold"]
CUR = 18
LEN = 19
MAGIC = 0x84383f8
MG_VIRTUAL = 0x40393008
MG_TYPE = PERL_MAGIC_ext(~)
MG_FLAGS = 0x02
REFCOUNTED
MG_OBJ = 0x84374dc
SV = PV(0x8436ed4) at 0x84374dc
REFCNT = 1
FLAGS = ()
PV = 0x84383e0 ""\0
CUR = 0
LEN = 13

It isn't going to modify it other than possibly setting the SvPOK bit.
String is ALREADY marked as UTF8. So SvPVutf8_nolen(sv) doesn't
have to do anything just hand me back the SvPVX !

The essence of the problem seems to be is that this is an ASCII string and so
is invariant under UTF-8, but Tk doesn't want to have to replicate
the code and the scan of string. So it calls upgrade so get UTF-8
string - the ASCII would do.

Once it has a UTF-8 string, Tk can lookup in its internals and find out
what the magic "get" should return. (Which in this case is just
going to be the string! - later on lookup of same font string (which could
have contained non-ASCII will may return an "actual" font that has been bound)

This has worked fine since 5.8.0 ...

What exactly has changed in perl5.8.5 ?


Nick Ing-Simmons

unread,
Jul 12, 2004, 6:24:21 AM7/12/04
to ni...@ing-simmons.net, ni...@ccl4.org, perl5-...@perl.org
Nick Ing-Simmons <ni...@ing-simmons.net> writes:
>Nick Ing-Simmons <ni...@ing-simmons.net> writes:
>>Nick Ing-Simmons <ni...@ing-simmons.net> writes:
>>That looks like one of Tk's own MAGICal vars, so may be my own fault.
>
>Maybe, not sure. It turns out that this "upgrade" call is being done
>by the mg_get() entry of the MAGIC vtable in question.

There is some weirdness going on here. My "GET" routine does this:

TclObj_get (my_perl=0x814f008, sv=Variable "sv" is not available.
) at objGlue.c:1368
1368 SvPOK_on(sv);

And it "takes":

(gdb) call Perl_sv_dump(my_perl,sv)
SV = PVMG(0x8435a88) at 0x84374b0
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)

But then restore_magic() does:

2658 if (SvGMAGICAL(sv))
(gdb)
2659 SvFLAGS(sv) &= ~(SVf_IOK|SVf_NOK|SVf_POK);

And turns it off again!

I thought the poing of GET vtable entries was to turn that bit on?


Nicholas Clark

unread,
Jul 12, 2004, 6:32:06 AM7/12/04
to Nick Ing-Simmons, perl5-...@perl.org

I'm never too sure about how it all works (particularly given that it's
not documented well) but as far as I knew magic ends up with the private
OK flags on, but the public flags off. (So that things like SvPOK() is
not true, and SvPV() always calls into sv_2pv())

Nicholas Clark

Nick Ing-Simmons

unread,
Jul 12, 2004, 7:54:29 AM7/12/04
to ni...@ccl4.org, Nick Ing-Simmons, perl5-...@perl.org
Nicholas Clark <ni...@ccl4.org> writes:
>> But then restore_magic() does:
>>
>> 2658 if (SvGMAGICAL(sv))
>> (gdb)
>> 2659 SvFLAGS(sv) &= ~(SVf_IOK|SVf_NOK|SVf_POK);
>>
>> And turns it off again!
>>
>> I thought the poing of GET vtable entries was to turn that bit on?
>
>I'm never too sure about how it all works (particularly given that it's
>not documented well) but as far as I knew magic ends up with the private
>OK flags on, but the public flags off. (So that things like SvPOK() is
>not true, and SvPV() always calls into sv_2pv())
>
Ok, but as one half-informed porter to another, I thought that was normal
state, but once mg_get() had been called and if GET routine turned on
SvPOK() then SvPV() just used the value.

Anyway, it seems the root of my problem is two changes:

1. SvPVutf8_nolen(sv) nolonger works on an SvREADONLY(sv)

(The Magic-al stuff was probably a red-herring.)

I can work round (1) by always making a copy, but this is sad
as all the shared string optimizations go out the window.
"-text" is still "-text" even if SvUTF8 is set, but now if
perl code does:

$widget->xs_method(-text => "Something")

XS cannot call SvPVutf8_nolen() on the -text

This is a giant leap backwards IMHO.

2. is_utf8_string(s,len) can no loner be passed s=NULL with len=0

s = SvPV(objPtr, len);
if (!is_utf8_string(s,len)) // THIS NOW CORE dumps if s is NULL
{
}


I can work round (2) by writing:

if (len && !is_utf8_string(s,len))

But so could the perl source...

So a fairly small Tk patch gets it working again (p4 diff attached),
but is a pain.

Nick Ing-Simmons

unread,
Jul 12, 2004, 8:01:14 AM7/12/04
to ni...@ing-simmons.net, ni...@ccl4.org, perl5-...@perl.org
Nick Ing-Simmons <ni...@ing-simmons.net> writes:
>
>2. is_utf8_string(s,len) can no loner be passed s=NULL with len=0
>
> s = SvPV(objPtr, len);
> if (!is_utf8_string(s,len)) // THIS NOW CORE dumps if s is NULL
> {
> }
>
>
>I can work round (2) by writing:
>
> if (len && !is_utf8_string(s,len))
>
>But so could the perl source...

Patch to not call strlen on NULL attached.

patch

Nick Ing-Simmons

unread,
Jul 12, 2004, 8:39:14 AM7/12/04
to ni...@ing-simmons.net, ni...@ccl4.org, perl5-...@perl.org
Nick Ing-Simmons <ni...@ing-simmons.net> writes:
>
>Patch to not call strlen on NULL attached.

And now in blead as change 23083 if you prefer.


>--- utf8.c.ship 2004-07-12 12:58:25.155675522 +0100
>+++ utf8.c 2004-07-12 12:58:46.657245954 +0100
>@@ -232,7 +232,7 @@
> U8* send;
> STRLEN c;
>
>- if (!len)
>+ if (!len && s)
> len = strlen((char *)s);
> send = s + len;

h...@crypt.org

unread,
Jul 12, 2004, 10:00:13 AM7/12/04
to Nick Ing-Simmons, perl5-...@perl.org
Nick Ing-Simmons <ni...@ing-simmons.net> wrote:
:Anyway, it seems the root of my problem is two changes:

:
:1. SvPVutf8_nolen(sv) nolonger works on an SvREADONLY(sv)

I think it would be reasonable for this to complain only if the SV would
actually need to be changed, if delaying the test doesn't impose a large
overhead.

Hugo

Nicholas Clark

unread,
Jul 12, 2004, 10:42:39 AM7/12/04
to Nick Ing-Simmons, perl5-...@perl.org
With this:

On Mon, Jul 12, 2004 at 01:01:14PM +0100, Nick Ing-Simmons wrote:

> Patch to not call strlen on NULL attached.

> --- utf8.c.ship 2004-07-12 12:58:25.155675522 +0100


> +++ utf8.c 2004-07-12 12:58:46.657245954 +0100
> @@ -232,7 +232,7 @@
> U8* send;
> STRLEN c;
>
> - if (!len)
> + if (!len && s)
> len = strlen((char *)s);
> send = s + len;
>


On Mon, Jul 12, 2004 at 12:54:29PM +0100, Nick Ing-Simmons wrote:

and this:

> ==== //depot/Tkutf8/objGlue.c#53 - /home/p4work/Tkutf8/objGlue.c ====
> @@ -410,11 +410,11 @@

I get

Failed Test Stat Wstat Total Fail Failed List of Failed
-------------------------------------------------------------------------------
t/browseentry-subclassing.t 0 11 2 2 100.00% 2
t/browseentry2.t 0 11 6 12 200.00% 1-6
t/create.t 0 11 528 326 61.74% 366-528
t/fileselect.t 0 11 5 6 120.00% 3-5
t/magic.t 0 11 1 2 200.00% 1
(3 subtests UNEXPECTEDLY SUCCEEDED), 2 tests and 23 subtests skipped.
Failed 5/47 test scripts, 89.36% okay. 174/2054 subtests failed, 91.53% okay.


whereas I had all tests (plus IIRC 4 unexpected) passing with 5.8.4 on the
same machine.

I'm not convinced that these utf8 changes are actually stable yet.
I've been trying to find a way to make the core happy, but I can't even get
TK's t/autoload.t to pass. Something keeps creating a "use of unitialized
value warning" and I can't see what's causing it, as the specific code:

char *
Perl_sv_2pvutf8(pTHX_ register SV *sv, STRLEN *lp)
{
sv_utf8_upgrade(sv);
return SvPV(sv,*lp);
}

hasn't changed.

Nicholas Clark

Nick Ing-Simmons

unread,
Jul 12, 2004, 10:47:12 AM7/12/04
to ni...@ing-simmons.net, ni...@ccl4.org, perl5-...@perl.org
Nick Ing-Simmons <ni...@ing-simmons.net> writes:
>Nick Ing-Simmons <ni...@ing-simmons.net> writes:
>>
>>Patch to not call strlen on NULL attached.
>
>And now in blead as change 23083 if you prefer.

The composite patch attached IS NOT FOR APPLICATION, but does make
unmodified Tk pass its tests.

But the lack of some readonly croaks makes perl fail several of its tests.

Root cause of Tk's problems seems to be asking for UTF8 for
READONLY ASCII strings now causes this die.

This is a cascade effect as "force_normal" now considers READONLY
non-normal, and utf8_upgrade now calls force_normal even if
string is SvPOK and SvUTF8.

The other snag the patch "fixes" is an infinite redraw loop
caused by upgrade's calling SvSETMAGIC.

What happens is:
An SV gets changed which is being "watched" by Tk using Magic.
SET handler queues a redraw.
Redraw asks for UTF8 for SV via SvPVutf8_nolen()
that calls sv_utf8_upgrade() even though it is UTF8 because
it isn't SvPOK as mg_get() turned flag off again.
Upgrade calls SvSETMAGIC
SET queues a redraw.
(Chain of calls returns)
Redraw asks for UTF8

IMHO utf8_upgrade is just changing representation not the value
so it has no business calling SvSETMAGIC.

Change 23084 to blead removes those calls and doesn't break
any tests.

patch

Nicholas Clark

unread,
Jul 12, 2004, 12:23:39 PM7/12/04
to Nick Ing-Simmons, perl5-...@perl.org
On Mon, Jul 12, 2004 at 03:47:12PM +0100, Nick Ing-Simmons wrote:

> The composite patch attached IS NOT FOR APPLICATION, but does make
> unmodified Tk pass its tests.
>
> But the lack of some readonly croaks makes perl fail several of its tests.
>
> Root cause of Tk's problems seems to be asking for UTF8 for
> READONLY ASCII strings now causes this die.
>
> This is a cascade effect as "force_normal" now considers READONLY
> non-normal, and utf8_upgrade now calls force_normal even if
> string is SvPOK and SvUTF8.
>
> The other snag the patch "fixes" is an infinite redraw loop
> caused by upgrade's calling SvSETMAGIC.
>
> What happens is:
> An SV gets changed which is being "watched" by Tk using Magic.
> SET handler queues a redraw.
> Redraw asks for UTF8 for SV via SvPVutf8_nolen()
> that calls sv_utf8_upgrade() even though it is UTF8 because
> it isn't SvPOK as mg_get() turned flag off again.
> Upgrade calls SvSETMAGIC
> SET queues a redraw.
> (Chain of calls returns)
> Redraw asks for UTF8
>
> IMHO utf8_upgrade is just changing representation not the value
> so it has no business calling SvSETMAGIC.

But this wasn't the only thing needed to make things work. Hmmm.

Alternatively just removing the changes to Perl_sv_utf8_upgrade_flags
alone makes Tk pass all its tests, but utftaint and on substr test then
fail. But these are all bugs that were in 5.8.4

I'm still confused by what's changed in sv_utf8_upgrade_flags that now
causes sv_2pv_soemthing-or-other to fail with a readonly error.
Given that *as far as I can tell) the call that fails is after
sv_uf8_upgrade_flags is called.

Nicholas Clark

--- sv.c.orig Thu Jul 8 14:29:51 2004
+++ sv.c Mon Jul 12 16:36:44 2004
@@ -3447,17 +3447,18 @@ Perl_sv_utf8_upgrade_flags(pTHX_ registe
U8 *s, *t, *e;
int hibit = 0;

- if (sv == &PL_sv_undef)
+ if (!sv)
return 0;
+
if (!SvPOK(sv)) {
STRLEN len = 0;
- (void) SvPV_force(sv,len);
+ (void) sv_2pv_flags(sv,&len, flags);
+ if (!SvPOK(sv))
+ return len;
}

- if (SvUTF8(sv)) {
- SvSETMAGIC(sv);
+ if (SvUTF8(sv))
return SvCUR(sv);
- }

if (SvREADONLY(sv) && SvFAKE(sv)) {
sv_force_normal(sv);
@@ -3492,7 +3493,6 @@ Perl_sv_utf8_upgrade_flags(pTHX_ registe
/* Mark as UTF-8 even if no hibit - saves scanning loop */
SvUTF8_on(sv);
}
- SvSETMAGIC(sv);
return SvCUR(sv);
}

Nick Ing-Simmons

unread,
Jul 12, 2004, 12:49:51 PM7/12/04
to ni...@ccl4.org, Nick Ing-Simmons, perl5-...@perl.org

utf8.c patch + attached patch with _original_ Tk passes all perl's
tests and all Tk's tests for me.

But it does rather enforce Tk's view of want constitutes a "change"
to an SV.


patch

Nick Ing-Simmons

unread,
Jul 12, 2004, 12:53:53 PM7/12/04
to ni...@ccl4.org, Nick Ing-Simmons, perl5-...@perl.org
Nicholas Clark <ni...@ccl4.org> writes:
>
>But this wasn't the only thing needed to make things work. Hmmm.
>
>Alternatively just removing the changes to Perl_sv_utf8_upgrade_flags
>alone makes Tk pass all its tests, but utftaint and on substr test then
>fail. But these are all bugs that were in 5.8.4
>
>I'm still confused by what's changed in sv_utf8_upgrade_flags that now
>causes sv_2pv_soemthing-or-other to fail with a readonly error.
>Given that *as far as I can tell) the call that fails is after
>sv_uf8_upgrade_flags is called.

gcc/gdb did that to me as well.

The problem is that SvPV_force() uses the enhanced(?) force_normal()
which considers READONLY an error.

Rafael Garcia-Suarez

unread,
Jul 12, 2004, 1:02:29 PM7/12/04
to Nick Ing-Simmons, perl5-...@perl.org
Nick Ing-Simmons wrote:
> utf8.c patch + attached patch with _original_ Tk passes all perl's
> tests and all Tk's tests for me.
>
> But it does rather enforce Tk's view of want constitutes a "change"
> to an SV.

I haven't read the whole thread yet, but I don't understand the purpose
of removing the SvSETMAGIC calls at the end your patch ?

Yitzchak Scott-Thoennes

unread,
Jul 12, 2004, 1:10:40 PM7/12/04
to Nick Ing-Simmons, ni...@ccl4.org, perl5-...@perl.org
On Mon, Jul 12, 2004 at 12:54:29PM +0100, Nick Ing-Simmons <ni...@ing-simmons.net> wrote:
> Nicholas Clark <ni...@ccl4.org> writes:
> >> But then restore_magic() does:
> >>
> >> 2658 if (SvGMAGICAL(sv))
> >> (gdb)
> >> 2659 SvFLAGS(sv) &= ~(SVf_IOK|SVf_NOK|SVf_POK);
> >>
> >> And turns it off again!
> >>
> >> I thought the poing of GET vtable entries was to turn that bit on?
> >
> >I'm never too sure about how it all works (particularly given that it's
> >not documented well) but as far as I knew magic ends up with the private
> >OK flags on, but the public flags off. (So that things like SvPOK() is
> >not true, and SvPV() always calls into sv_2pv())
> >
> Ok, but as one half-informed porter to another, I thought that was normal
> state, but once mg_get() had been called and if GET routine turned on
> SvPOK() then SvPV() just used the value.

You generally shouldn't call SvPV more than once in an operation
(whether pp or xs). Once mg_get is done, use SvPV_nomg. It needs to
be POKp only so that the next operation can call SvPV and have it get
down into sv_2pv_flags.

Nick Ing-Simmons

unread,
Jul 12, 2004, 1:12:44 PM7/12/04
to h...@crypt.org, Nick Ing-Simmons, perl5-...@perl.org
<h...@crypt.org> writes:
>Nick Ing-Simmons <ni...@ing-simmons.net> wrote:
>:Anyway, it seems the root of my problem is two changes:
>:
>:1. SvPVutf8_nolen(sv) nolonger works on an SvREADONLY(sv)
>
>I think it would be reasonable for this to complain only if the SV would
>actually need to be changed,

What consitutites a change?

That last problem I had before patches I sent was a READONLY
SV that was SvIOK.

XS code asked for SvPVutf8_nolen() of it (for font lookup so it
could display it).

e.g. something like:

$button->configure(-text => 10);

Is that "unreasonable"?

To get that it has to be sv_upgraded to SvPVIV so there is a SvPVX
slot and that slot populated and then scanned for high-bit chars
(none found at least in _my_ locale) and SvUTF8 flag set.

But one can't use SvPV_force() on an SvREADONLY these days,
hence the mess in latest patch. The merits of using SvPV_force()
rather than SvPV() are unclear in this case, but that seemed
to be a key part of the change.

However once all that is done the SV is still same "value".

Nick Ing-Simmons

unread,
Jul 12, 2004, 1:19:58 PM7/12/04
to rgarci...@mandrakesoft.com, Nick Ing-Simmons, perl5-...@perl.org

The value hasn't changed (only the representation) so why call SvSETMAGIC?
(Removing them alone breaks not tests in either maint-5.8 or blead.)

It causes a loop if Tk is using MAGIC to trigger redraw, and
redraw call SvPVutf8 and that does SvSETMAGIC. CPU sticks nicely
at 100% busy.

e.g.

$parent->Label(-textvariable => \$foo); # adds 'U' magic to $foo
$foo = 42; # U magic's SET re-draws the label.

Nick Ing-Simmons

unread,
Jul 12, 2004, 1:31:57 PM7/12/04
to stho...@efn.org, ni...@ccl4.org, Nick Ing-Simmons, perl5-...@perl.org
Yitzchak Scott-Thoennes <stho...@efn.org> writes:
>On Mon, Jul 12, 2004 at 12:54:29PM +0100, Nick Ing-Simmons <ni...@ing-simmons.net> wrote:
>> Nicholas Clark <ni...@ccl4.org> writes:
>> >> But then restore_magic() does:
>> >>
>> >> 2658 if (SvGMAGICAL(sv))
>> >> (gdb)
>> >> 2659 SvFLAGS(sv) &= ~(SVf_IOK|SVf_NOK|SVf_POK);
>> >>
>> >> And turns it off again!
>> >>
>> >> I thought the poing of GET vtable entries was to turn that bit on?
>> >
>> >I'm never too sure about how it all works (particularly given that it's
>> >not documented well) but as far as I knew magic ends up with the private
>> >OK flags on, but the public flags off. (So that things like SvPOK() is
>> >not true, and SvPV() always calls into sv_2pv())
>> >
>> Ok, but as one half-informed porter to another, I thought that was normal
>> state, but once mg_get() had been called and if GET routine turned on
>> SvPOK() then SvPV() just used the value.
>
>You generally shouldn't call SvPV more than once in an operation
>(whether pp or xs).

What is an "operation" when Tk can do callbacks to perl which can
then call back to Tk gets more than a little involved.

>Once mg_get is done, use SvPV_nomg. It needs to
>be POKp only so that the next operation can call SvPV and have it get
>down into sv_2pv_flags.

Once upon a time, mg_get() cleared SvPOK flag and MAGIC's GET
set it again. So that on return from SvGETMAGIC(sv) the magical calls
had been done and flag allowed SvPV to effectively become SvPV_nomg.
(I know this because Tk's GET routines had to set SvPOK_on() or
things didn't work.)

That is once upon a time the rule was "You should only call
SvGETMAGIC() once per operation" not SvPV().
Tk has expicit SvGETMAGIC() calls which were essential in older
perls.

Pushing the SvGETMAGIC() call into SvPV has muddled things.

But if true then Tk has a need for

SvPVutf8_nomg_nolen()

Which can be called repeatedly

Nicholas Clark

unread,
Jul 12, 2004, 1:52:14 PM7/12/04
to Nick Ing-Simmons, perl5-...@perl.org
On Mon, Jul 12, 2004 at 05:49:51PM +0100, Nick Ing-Simmons wrote:

> But it does rather enforce Tk's view of want constitutes a "change"
> to an SV.
>
>

> --- sv.c.ship 2004-07-12 14:21:07.000000000 +0100
> +++ sv.c 2004-07-12 17:34:44.010848315 +0100
> @@ -3354,7 +3354,9 @@


> char *
> Perl_sv_2pvutf8(pTHX_ register SV *sv, STRLEN *lp)
> {

> + if (!SvUTF8(sv)) {
> sv_utf8_upgrade(sv);
> + }
> return SvPV(sv,*lp);
> }

Independent of the other change hunk, is the above test an optimisation and
worth putting in?

> @@ -3449,13 +3451,23 @@
>
> if (sv == &PL_sv_undef)


> return 0;
> +
> if (!SvPOK(sv)) {
> STRLEN len = 0;

> + int readOnly;
> + if (SvREADONLY(sv) && SvFAKE(sv)) {
> + sv_force_normal(sv);
> + }
> + if ((readOnly = SvREADONLY(sv))) {
> + SvREADONLY_off(sv);
> + }
> (void) SvPV_force(sv,len);
> + if (readOnly) {
> + SvREADONLY_on(sv);
> + }
> }
>
> if (SvUTF8(sv)) {
> - SvSETMAGIC(sv);
> return SvCUR(sv);
> }
>
> @@ -3492,7 +3504,6 @@


> /* Mark as UTF-8 even if no hibit - saves scanning loop */
> SvUTF8_on(sv);
> }
> - SvSETMAGIC(sv);
> return SvCUR(sv);
> }
>

The following change seems less aggressive, and appears to make Tk pass all
its tests for me.

==== //depot/perl/sv.c#753 - /Users/nick/p4perl/perl/sv.c ====
--- /tmp/tmp.14763.0 Mon Jul 12 18:46:25 2004
+++ /Users/nick/p4perl/perl/sv.c Mon Jul 12 18:45:49 2004
@@ -3941,7 +3941,13 @@ Perl_sv_utf8_upgrade_flags(pTHX_ registe
return 0;


if (!SvPOK(sv)) {
STRLEN len = 0;
- (void) SvPV_force(sv,len);

+ if (SvREADONLY(sv) && (SvPOKp(sv) || SvIOKp(sv) || SvNOKp(sv))) {
+ (void) sv_2pv_flags(sv,&len, flags);
+ if (SvUTF8(sv))
+ return len;
+ } else {
+ (void) SvPV_force(sv,len);
+ }
}

if (SvUTF8(sv)) {

I applied it to blead as 23085. Unmodified Tk passes all tests for me on
blead now (Unless I've screwed something up), and perl passes all its tests.

Are you able to verify that blead is now good?

Nicholas Clark

Yitzchak Scott-Thoennes

unread,
Jul 12, 2004, 1:59:22 PM7/12/04
to Nick Ing-Simmons, perl5-...@perl.org
On Mon, Jul 12, 2004 at 06:52:14PM +0100, Nicholas Clark <ni...@ccl4.org> wrote:
> On Mon, Jul 12, 2004 at 05:49:51PM +0100, Nick Ing-Simmons wrote:
>
> > But it does rather enforce Tk's view of want constitutes a "change"
> > to an SV.
> >
> >
>
> > --- sv.c.ship 2004-07-12 14:21:07.000000000 +0100
> > +++ sv.c 2004-07-12 17:34:44.010848315 +0100
> > @@ -3354,7 +3354,9 @@
> > char *
> > Perl_sv_2pvutf8(pTHX_ register SV *sv, STRLEN *lp)
> > {
> > + if (!SvUTF8(sv)) {
> > sv_utf8_upgrade(sv);
> > + }
> > return SvPV(sv,*lp);
> > }
>
> Independent of the other change hunk, is the above test an optimisation and
> worth putting in?

Testing UTF8 *before* calling sv_2pv_flags is wrong, because qr// or
overloaded objects won't have it set correctly yet.

Nick Ing-Simmons

unread,
Jul 12, 2004, 3:29:20 PM7/12/04
to ni...@ccl4.org, Nick Ing-Simmons, perl5-...@perl.org
Nicholas Clark <ni...@ccl4.org> writes:
>> --- sv.c.ship 2004-07-12 14:21:07.000000000 +0100
>> +++ sv.c 2004-07-12 17:34:44.010848315 +0100
>> @@ -3354,7 +3354,9 @@
>> char *
>> Perl_sv_2pvutf8(pTHX_ register SV *sv, STRLEN *lp)
>> {
>> + if (!SvUTF8(sv)) {
>> sv_utf8_upgrade(sv);
>> + }
>> return SvPV(sv,*lp);
>> }
>
>Independent of the other change hunk, is the above test an optimisation and
>worth putting in?

I think so.

>
>The following change seems less aggressive, and appears to make Tk pass all
>its tests for me.
>
>==== //depot/perl/sv.c#753 - /Users/nick/p4perl/perl/sv.c ====
>--- /tmp/tmp.14763.0 Mon Jul 12 18:46:25 2004
>+++ /Users/nick/p4perl/perl/sv.c Mon Jul 12 18:45:49 2004
>@@ -3941,7 +3941,13 @@ Perl_sv_utf8_upgrade_flags(pTHX_ registe
> return 0;
> if (!SvPOK(sv)) {
> STRLEN len = 0;
>- (void) SvPV_force(sv,len);
>+ if (SvREADONLY(sv) && (SvPOKp(sv) || SvIOKp(sv) || SvNOKp(sv))) {
>+ (void) sv_2pv_flags(sv,&len, flags);
>+ if (SvUTF8(sv))
>+ return len;
>+ } else {
>+ (void) SvPV_force(sv,len);
>+ }
> }
>
> if (SvUTF8(sv)) {
>

That looks as thoughit should work - it seems to me that it is
use of SvPV_force that is main snag, and that avoids it for the
Tk care-abouts.

>I applied it to blead as 23085. Unmodified Tk passes all tests for me on
>blead now (Unless I've screwed something up), and perl passes all its tests.
>
>Are you able to verify that blead is now good?

Takes little while to build Tk - let you know soon.

>
>Nicholas Clark

Nick Ing-Simmons

unread,
Jul 12, 2004, 3:47:36 PM7/12/04
to stho...@efn.org, Nick Ing-Simmons, perl5-...@perl.org

Hmm, this is more of the same. In Tk's case SvGETMAGIC has already
been called so yes they have. It had to do that so the SvXOK flags
were valid.

But I suppose if your model of
one SvPV is correct then what you say is true, and we need
a whole slew of new macros:

SvPVutf8_nomg
SvPVutf8_nomg_nolen
SvPVbytes_nomg
SvPVbytes_nomg_nolen
SvIV_nomg
SvNV_nomg
SvRV_nomg

Then those must be used after which ever of the original
macros did the

if (SvGMAGICAL(sv))
mg_get(sv);

I liked the old way:

if (SvGMAGICAL(sv))
mg_get(sv);
if (SvIOK(sv))
its_an_integer();
else if (sv_isobject(sv))
its_an_object(sv);
else if (SvNOK(sv))
its_a_real();
else {
// treat as string
}

Nick Ing-Simmons

unread,
Jul 12, 2004, 4:04:01 PM7/12/04
to ni...@ccl4.org, Nick Ing-Simmons, perl5-...@perl.org

Tk804 did not build on blead (my defult Multiplicty/Debugging).
Seems SvMAGIC() now needs a my_perl in scope as it does an assert and
so is going to Perl_croak(aTHX_ ...).

But adding yet another dTHX fixes that.

It then passes all its tests (well all the ones I expect to
pass with SuSE9.1 KDE and font set).

So your fix works.

>
>Nicholas Clark

Nick Ing-Simmons

unread,
Jul 13, 2004, 4:01:01 AM7/13/04
to ni...@ccl4.org, Nick Ing-Simmons, perl5-...@perl.org
Nicholas Clark <ni...@ccl4.org> writes:
>
>The following change seems less aggressive, and appears to make Tk pass all
>its tests for me.
>
>I applied it to blead as 23085. Unmodified Tk passes all tests for me on
>blead now (Unless I've screwed something up), and perl passes all its tests.
>
>Are you able to verify that blead is now good?

And I can now verify //depot/maint-5.8/... is good as well.

Indeed it is executing a perl5.8.4 built Tk to send this mail ;-)


h...@crypt.org

unread,
Jul 13, 2004, 5:03:35 AM7/13/04
to Nick Ing-Simmons, perl5-...@perl.org
Nick Ing-Simmons <ni...@ing-simmons.net> wrote:

:<h...@crypt.org> writes:
:>Nick Ing-Simmons <ni...@ing-simmons.net> wrote:
:>:Anyway, it seems the root of my problem is two changes:
:>:
:>:1. SvPVutf8_nolen(sv) nolonger works on an SvREADONLY(sv)
:>
:>I think it would be reasonable for this to complain only if the SV would
:>actually need to be changed,
:
:What consitutites a change?
:
:That last problem I had before patches I sent was a READONLY
:SV that was SvIOK.

Ah yes, I'd count that as a change - I was thinking only of the case
where the _only_ change required was to set the UTF8 flag.

Hugo

Sadahiro Tomoyuki

unread,
Jul 13, 2004, 11:37:52 AM7/13/04
to Nick Ing-Simmons, ni...@ccl4.org, perl5-...@perl.org
>
> The other snag the patch "fixes" is an infinite redraw loop
> caused by upgrade's calling SvSETMAGIC.
>
> What happens is:
> An SV gets changed which is being "watched" by Tk using Magic.
> SET handler queues a redraw.
> Redraw asks for UTF8 for SV via SvPVutf8_nolen()
> that calls sv_utf8_upgrade() even though it is UTF8 because
> it isn't SvPOK as mg_get() turned flag off again.
> Upgrade calls SvSETMAGIC
> SET queues a redraw.
> (Chain of calls returns)
> Redraw asks for UTF8
>
> IMHO utf8_upgrade is just changing representation not the value
> so it has no business calling SvSETMAGIC.
>
> Change 23084 to blead removes those calls and doesn't break
> any tests.

SvPV_force followed by SvSETMAGIC, that's introduced by Change 22842,
is an imitation of do_chop().

The removal of SvSETMAGIC causes POK flag of a tainted value
to be set on, as a side effect of sv_utf8_upgrade.
Parhaps such is weird (but I don't know why any tainted string
is private...)

[perl-current at Change 23094]
>perl -MDevel::Peek -Twe "$a=$0;utf8::upgrade($a);Dump($a);"
SV = PVMG(0x167ce8c) at 0x155b758
REFCNT = 1
FLAGS = (GMG,SMG,POK,pPOK,UTF8)


IV = 0
NV = 0

PV = 0x16743ac "-e"\0 [UTF8 "-e"]
CUR = 2
LEN = 3
MAGIC = 0x16747ec
MG_VIRTUAL = &PL_vtbl_taint
MG_TYPE = PERL_MAGIC_taint(t)
MG_LEN = 1

If private PV is allowed to be handled directly,
isn't coercion into POK necessary for private-POK scalars?

--- sv.c~ Tue Jul 13 02:47:40 2004
+++ sv.c Tue Jul 13 23:06:16 2004
@@ -3939,9 +3939,9 @@



if (sv == &PL_sv_undef)
return 0;

- if (!SvPOK(sv)) {
+ if (!SvPOKp(sv)) {
STRLEN len = 0;
- if (SvREADONLY(sv) && (SvPOKp(sv) || SvIOKp(sv) || SvNOKp(sv))) {
+ if (SvREADONLY(sv) && (SvIOKp(sv) || SvNOKp(sv))) {
(void) sv_2pv_flags(sv,&len, flags);
if (SvUTF8(sv))
return len;
#End of patch

Regards,
SADAHIRO Tomoyuki

Nick Ing-Simmons

unread,
Jul 14, 2004, 6:17:46 PM7/14/04
to bqw1...@nifty.com, ni...@ccl4.org, Nick Ing-Simmons, perl5-...@perl.org
SADAHIRO Tomoyuki <bqw1...@nifty.com> writes:
>>
>> IMHO utf8_upgrade is just changing representation not the value
>> so it has no business calling SvSETMAGIC.
>>
>> Change 23084 to blead removes those calls and doesn't break
>> any tests.
>
>SvPV_force followed by SvSETMAGIC, that's introduced by Change 22842,
>is an imitation of do_chop().

But do_chop() is definitely changing the SV - Tk should redraw
a string which is chopped.

>
>The removal of SvSETMAGIC causes POK flag of a tainted value
>to be set on, as a side effect of sv_utf8_upgrade.
>Parhaps such is weird (but I don't know why any tainted string
>is private...)

There seems to have been a shift in either how POKp and MAGIC works
or how some of us think they do.
TAINTEDness is another whole can of worms, which happens to
be implemented via a degenerate kind of MAGIC.

It isn't clear that setting POK on a tainted value is "bad",
provided that the per-op "we touched something suspect" flag
has been set, so we can die if value is to be used
for something "dangerous". i.e. tainting isn't about making value
undef, it is about giving system/open/unlink etc info that something
untrusted has happened in building their args.

That tainted MAGIC converted SvPOKp to SvPOK used to be
expected behaviour...

>
>[perl-current at Change 23094]
> >perl -MDevel::Peek -Twe "$a=$0;utf8::upgrade($a);Dump($a);"
>SV = PVMG(0x167ce8c) at 0x155b758
> REFCNT = 1
> FLAGS = (GMG,SMG,POK,pPOK,UTF8)
> IV = 0
> NV = 0
> PV = 0x16743ac "-e"\0 [UTF8 "-e"]
> CUR = 2
> LEN = 3
> MAGIC = 0x16747ec
> MG_VIRTUAL = &PL_vtbl_taint
> MG_TYPE = PERL_MAGIC_taint(t)
> MG_LEN = 1
>
>If private PV is allowed to be handled directly,
>isn't coercion into POK necessary for private-POK scalars?
>
>--- sv.c~ Tue Jul 13 02:47:40 2004
>+++ sv.c Tue Jul 13 23:06:16 2004
>@@ -3939,9 +3939,9 @@
>
> if (sv == &PL_sv_undef)
> return 0;
>- if (!SvPOK(sv)) {
>+ if (!SvPOKp(sv)) {

Not all SvPOK are SvPOKp, so it would have to be

if (!SvPOK(sv) && !SvPOKp(sv))

Nicholas Clark

unread,
Jul 20, 2004, 6:23:42 AM7/20/04
to h...@crypt.org, Nick Ing-Simmons, perl5-...@perl.org

The problem seems to be that like utf8 vs Unicode, the perl core wants to
have 2 different meanings associated with the readonly flag

One is "this SV is physically not to be modified"
Other is "this value is not to be modified, but the representation can be
changed, cached string conversions added, etc"

I'm not sure what to do about this. Or even if any solution is possible.
Apart from not making this mistake in parrot.

Nicholas Clark

Nick Ing-Simmons

unread,
Jul 20, 2004, 7:23:29 AM7/20/04
to ni...@ccl4.org, Nick Ing-Simmons, h...@crypt.org, perl5-...@perl.org
Nicholas Clark <ni...@ccl4.org> writes:
>On Tue, Jul 13, 2004 at 10:03:35AM +0100, h...@crypt.org wrote:
>> Nick Ing-Simmons <ni...@ing-simmons.net> wrote:
>> :<h...@crypt.org> writes:
>> :>Nick Ing-Simmons <ni...@ing-simmons.net> wrote:
>> :>:Anyway, it seems the root of my problem is two changes:
>> :>:
>> :>:1. SvPVutf8_nolen(sv) nolonger works on an SvREADONLY(sv)
>> :>
>> :>I think it would be reasonable for this to complain only if the SV would
>> :>actually need to be changed,
>> :
>> :What consitutites a change?
>> :
>> :That last problem I had before patches I sent was a READONLY
>> :SV that was SvIOK.
>>
>> Ah yes, I'd count that as a change - I was thinking only of the case
>> where the _only_ change required was to set the UTF8 flag.

What I am complaining about is inconsistency:

Apparently I shouldn't do this:

char *s = SvPVutf8_nolen(sv); // Can't do that to SvREADONLY

But this is perfectly allowed:
(void) SvPV_nolen(sv); // sv_upgrade of READONLY allowed. e.g. print 10
s = SvPVutf8_nolen(sv); // Now SV has SvPOK we can set SvUTF8

>
>The problem seems to be that like utf8 vs Unicode, the perl core wants to
>have 2 different meanings associated with the readonly flag
>
>One is "this SV is physically not to be modified"

Which is new to me - I really didn't think we had that.
One has been able to say

$widget->configure(-text => 10);

for years, and that '10' has been a READONLY SvIV all/most of that time.

Calling SvPV on it has never been a problem.
This has even worked for SVs where SvLEN is 0 - indicating that
char * isn't owned by perl, and has worked with the share_pvn()
things.

But calling SvPVutf8 on it "is" seen as a problem as it seems setting
SvUTF8_on for the SvPVX we just put there after upgrading to SVt_PVIV is
something you can't do to a READONLY. (The SvPVX itself doesn't need
to change in such cases it would still be it is still "\x31\x30\x00".)

Thus it seems messing with a flag bit is considered more
of a change than wholesale change of data structure.
So IMHO you should be able to call SvPVutf8 on anything you can
call SvPV() on i.e. that sv_upgrade(sv,SVt_PV) will allow.
Which includes READONLY IVs and NVs.

I wouldn't even that much if the SvUTF8 bit was left unset (and so
we re-scan next time we are asked) so long as it returns the
valid char * I asked for.

Ton Hospel

unread,
Jul 20, 2004, 7:49:03 AM7/20/04
to perl5-...@perl.org
In article <20040720112...@llama.elixent.com>,

Nick Ing-Simmons <ni...@ing-simmons.net> writes:
> Which is new to me - I really didn't think we had that.
> One has been able to say
>
> $widget->configure(-text => 10);
>
> for years, and that '10' has been a READONLY SvIV all/most of that time.
>
> Calling SvPV on it has never been a problem.

bug 20661....

h...@crypt.org

unread,
Jul 20, 2004, 8:01:48 AM7/20/04
to perl5-...@perl.org
Nicholas Clark <ni...@ccl4.org> wrote:

:On Tue, Jul 13, 2004 at 10:03:35AM +0100, h...@crypt.org wrote:
:> Nick Ing-Simmons <ni...@ing-simmons.net> wrote:
:> :<h...@crypt.org> writes:
:> :>Nick Ing-Simmons <ni...@ing-simmons.net> wrote:
:> :>:Anyway, it seems the root of my problem is two changes:
:> :>:
:> :>:1. SvPVutf8_nolen(sv) nolonger works on an SvREADONLY(sv)
:> :>
:> :>I think it would be reasonable for this to complain only if the SV would
:> :>actually need to be changed,
:> :
:> :What consitutites a change?
:> :
:> :That last problem I had before patches I sent was a READONLY
:> :SV that was SvIOK.
:>
:> Ah yes, I'd count that as a change - I was thinking only of the case
:> where the _only_ change required was to set the UTF8 flag.
:
:The problem seems to be that like utf8 vs Unicode, the perl core wants to
:have 2 different meanings associated with the readonly flag
:
:One is "this SV is physically not to be modified"
:Other is "this value is not to be modified, but the representation can be
:changed, cached string conversions added, etc"

Yes, and it's worse than that: historically we cache all the representations
of a value, and remember which is canonical. With the introduction of
Unicode support that's no longer true: we only keep one of the two variants
of the string. So you can't get the up/downgraded representation of the
string without making a potentially irreversible change.

:I'm not sure what to do about this. Or even if any solution is possible.


:Apart from not making this mistake in parrot.

I think this probably classes as a Jenga problem. But it'd be helpful if
we could define what we mean by readonly - I'd be tempted to go for
something like:
- readonly means nothing should change the value
- the SV can be upgraded
- alternative representations can be cached
- even if the PV is canonical we can upgrade to utf8
- if the PV is canonical we cannot downgrade unless we can do so reversibly

So, what would break by that definition?

Hugo

Ton Hospel

unread,
Jul 20, 2004, 8:39:39 AM7/20/04
to perl5-...@perl.org
In article <200407201201...@zen.crypt.org>,

h...@crypt.org writes:
> - readonly means nothing should change the value
> - the SV can be upgraded
> - alternative representations can be cached
> - even if the PV is canonical we can upgrade to utf8
> - if the PV is canonical we cannot downgrade unless we can do so reversibly
>
> So, what would break by that definition?
>
If the SV can be upgraded, it can pollute another string with unicode
if it's later used in an append, which means that the result has different
semantics. Possibly triggerable with code like:

for ("") { # Or whatever it takes to get a readonly string
print substr("ã$_", 0, 1) =~ /\w/ ? 1 : 0; # will print 0
foo($_); # Upgrades the SV
print substr("ã$_", 0, 1) =~ /\w/ ? 1 : 0; # will print 1
}

(I think it was a mistake to conflate the "encoded in utf8" flag with a
"has unicode semantics" flag. As I understand it parrot fortunately did
not make that mistake).

Nick Ing-Simmons

unread,
Jul 20, 2004, 12:16:23 PM7/20/04
to perl5-...@ton.iguana.be, perl5-...@perl.org

And status to date seems to indicate that upgrading things _is_ allowed.
That bug seems to say the "+0" should NOT be upgraded to a number.
Now, to me +0 "looks like a number" already, so in that case
I see no real harm.

However, the Tk (and UTF8) case that is problematic isn't giving a
numeric representation to a string, but the converse - geting string
representation of a number.


Nick Ing-Simmons

unread,
Jul 20, 2004, 2:29:53 PM7/20/04
to h...@crypt.org, perl5-...@perl.org
<h...@crypt.org> writes:
>:One is "this SV is physically not to be modified"
>:Other is "this value is not to be modified, but the representation can be
>:changed, cached string conversions added, etc"
>
>Yes, and it's worse than that: historically we cache all the representations
>of a value, and remember which is canonical. With the introduction of
>Unicode support that's no longer true: we only keep one of the two variants
>of the string. So you can't get the up/downgraded representation of the
>string without making a potentially irreversible change.

UTF8 upgrade is reversible. (You can always convert back - until
someone adds a character that isn't representable in the "bytes" encoding
in use - but by definition that is not the case at point of upgrade.)

[
Off topic?
My view is that SvUTF8 is (should be?) a flag that just tells you about
representation. It should have no semantic side effects.
Semantics should belong to the hint bit that is set by 'use utf8'.
That is how uc($foo) decides if ñ is alpha by Unicode-ism or
not because default C locale denies existance of high bits etc.
]

>
>:I'm not sure what to do about this. Or even if any solution is possible.
>:Apart from not making this mistake in parrot.
>
>I think this probably classes as a Jenga problem. But it'd be helpful if
>we could define what we mean by readonly - I'd be tempted to go for
>something like:
> - readonly means nothing should change the value
> - the SV can be upgraded
> - alternative representations can be cached
> - even if the PV is canonical we can upgrade to utf8

I am not sure what you mean by "canonical" there.
If you mean all the bytes are "invariant" (i.e. ASCII) than in this
case I don't mind if SvUTF8 flag gets set or not - provided
SvPVutf8() returns the pointer and doesn't warn/die or otherwise
winge.

In such a case it would be nice to have a second hint bit to say
that all chars are invariant but I don't think we have a spare.

> - if the PV is canonical we cannot downgrade unless we can do so reversibly

I think we need that "canonical" defining too.

>
>So, what would break by that definition?

AFAIK it would break nothing. Also AFAIK that is what perl5.8.5 does
(after the various patches and re-works that Gtk and Tk provoked).

>
>Hugo

Nick Ing-Simmons

unread,
Jul 20, 2004, 2:35:35 PM7/20/04
to perl5-...@ton.iguana.be, perl5-...@perl.org
Ton Hospel <perl5-...@ton.iguana.be> writes:
>In article <200407201201...@zen.crypt.org>,
> h...@crypt.org writes:
>> - readonly means nothing should change the value
>> - the SV can be upgraded
>> - alternative representations can be cached
>> - even if the PV is canonical we can upgrade to utf8
>> - if the PV is canonical we cannot downgrade unless we can do so reversibly
>>
>> So, what would break by that definition?
>>
>If the SV can be upgraded, it can pollute another string with unicode


>if it's later used in an append, which means that the result has different
>semantics.

SvUTF8 bit on SV should not affect semantics, semantics
are the job of the lexical 'use utf8' hint bit.

>Possibly triggerable with code like:
>
> for ("") { # Or whatever it takes to get a readonly string
> print substr("ã$_", 0, 1) =~ /\w/ ? 1 : 0; # will print 0
> foo($_); # Upgrades the SV
> print substr("ã$_", 0, 1) =~ /\w/ ? 1 : 0; # will print 1
> }
>
>(I think it was a mistake to conflate the "encoded in utf8" flag with a
> "has unicode semantics" flag. As I understand it parrot fortunately did
> not make that mistake).

I thought Jarkko had undone that brain damage in perl5.8.
But maybe I am just fantasizing...


Chip Salzenberg

unread,
Jul 20, 2004, 2:39:53 PM7/20/04
to h...@crypt.org, perl5-...@perl.org
According to Hugo van der Sanden:
> Historically we cache all the representations of a value, and
> remember which is canonical.

Remembering which one is canonical? I don't remember that.
If we did remember that, what would it be for C<$!> ?
--
Chip Salzenberg - a.k.a. - <ch...@pobox.com>
Poetry mode ... is the default.

Nick Ing-Simmons

unread,
Jul 20, 2004, 3:28:32 PM7/20/04
to ch...@pobox.com, h...@crypt.org, perl5-...@perl.org
Chip Salzenberg <ch...@pobox.com> writes:
>According to Hugo van der Sanden:
>> Historically we cache all the representations of a value, and
>> remember which is canonical.
>
>Remembering which one is canonical? I don't remember that.
>If we did remember that, what would it be for C<$!> ?

That is an easy one. the int.

Chip Salzenberg

unread,
Jul 20, 2004, 3:55:14 PM7/20/04
to Nick Ing-Simmons, h...@crypt.org, perl5-...@perl.org
According to Nick Ing-Simmons:

> Chip Salzenberg <ch...@pobox.com> writes:
> >According to Hugo van der Sanden:
> >> Historically we cache all the representations of a value, and
> >> remember which is canonical.
> >
> >Remembering which one is canonical? I don't remember that.

I still don't. Perl 5 has no concept of _canonical_ representations,
just _valid_ ones. SvIOK() and SvPOK() are booleans; there's no way
to do the conceptual comparison SvIOK() > SvPOK().

Now let us voyage from the Code Mines to the Happy Land of Hypotheticals:

> >If we did remember that, what would it be for C<$!> ?
>
> That is an easy one. the int.

If any given bit of code knew that conversions were customized (via
MAGIC), then yes, that's true.

However: If custom conversions aren't relevant (probably because any
such conversion was already performed before the tests are made), then
both the IV and the PV of $! should be considered canonical: the IV is
not atoi() of the PV, and the PV is not sprintf("%d") of the IV.

Yitzchak Scott-Thoennes

unread,
Jul 20, 2004, 4:54:20 PM7/20/04
to Chip Salzenberg, Nick Ing-Simmons, h...@crypt.org, perl5-...@perl.org
On Tue, Jul 20, 2004 at 03:55:14PM -0400, Chip Salzenberg <ch...@pobox.com> wrote:
> According to Nick Ing-Simmons:
> > Chip Salzenberg <ch...@pobox.com> writes:
> > >According to Hugo van der Sanden:
> > >> Historically we cache all the representations of a value, and
> > >> remember which is canonical.
> > >
> > >Remembering which one is canonical? I don't remember that.
>
> I still don't. Perl 5 has no concept of _canonical_ representations,
> just _valid_ ones. SvIOK() and SvPOK() are booleans; there's no way
> to do the conceptual comparison SvIOK() > SvPOK().

I think this is what he's referring to. In the first case, the ending IV is
canonical; in the second, it is not.

$ perl -MDevel::Peek -we'$x=1.5; $x*=2.0; print do{use integer;$x+3}; Dump $x'
SV = PVNV(0xa07ee08) at 0xa06ccbc
REFCNT = 1
FLAGS = (IOK,NOK,pIOK,pNOK)
IV = 3
NV = 3
PV = 0
6
$ perl -MDevel::Peek -we'$x=1.5; $x*=2.1; print do{use integer;$x+3}; Dump $x'
SV = PVNV(0xa07ee08) at 0xa06ccbc
REFCNT = 1
FLAGS = (NOK,pIOK,pNOK)
IV = 3
NV = 3.15
PV = 0

> Now let us voyage from the Code Mines to the Happy Land of Hypotheticals:
>
> > >If we did remember that, what would it be for C<$!> ?
> >
> > That is an easy one. the int.
>
> If any given bit of code knew that conversions were customized (via
> MAGIC), then yes, that's true.
>
> However: If custom conversions aren't relevant (probably because any
> such conversion was already performed before the tests are made), then
> both the IV and the PV of $! should be considered canonical: the IV is
> not atoi() of the PV, and the PV is not sprintf("%d") of the IV.

I don't understand to what you refer. According to my understanding,
magic vars have no specific canonical values; whatever the set routine
looks at is effectively canonical. For $! this is the int.

$x = "$!"; $! = $x loses information, while $x = 0+$!; $! = $x doesn't.

Ton Hospel

unread,
Jul 20, 2004, 5:04:18 PM7/20/04
to perl5-...@perl.org
In article <2004072018...@llama.ing-simmons.net>,

Nick Ing-Simmons <ni...@ing-simmons.net> writes:
>
> SvUTF8 bit on SV should not affect semantics, semantics
> are the job of the lexical 'use utf8' hint bit.
>
> ..snip..

>
> I thought Jarkko had undone that brain damage in perl5.8.
> But maybe I am just fantasizing...

But it *does* affect semantics:

#!/usr/bin/perl -wl
use Devel::Peek;
$a = $b = "é";
chop($b .= chr(400));
print "equal" if $a eq $b;
print "$a is a letter: ", $a =~ /\w/ ? 1 : 0;
print "$b is a letter: ", $b =~ /\w/ ? 1 : 0;
Dump($a);
Dump($b);

equal
é is a letter: 0
é is a letter: 1
SV = PV(0x81624dc) at 0x8175458
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x8163ac0 "\351"\0
CUR = 1
LEN = 2
SV = PV(0x8162464) at 0x8175470
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x81692c0 "\303\251"\0 [UTF8 "\x{e9}"]
CUR = 2
LEN = 8

Chip Salzenberg

unread,
Jul 20, 2004, 5:12:11 PM7/20/04
to Yitzchak Scott-Thoennes, Nick Ing-Simmons, h...@crypt.org, perl5-...@perl.org
According to Yitzchak Scott-Thoennes:

> $ perl -MDevel::Peek -we'$x=1.5; $x*=2.1; print do{use integer;$x+3}; Dump $x'
> SV = PVNV(0xa07ee08) at 0xa06ccbc
> REFCNT = 1
> FLAGS = (NOK,pIOK,pNOK)
> IV = 3
> NV = 3.15
> PV = 0

Thanks for the demonstration. I see that the p*OK flags are being
used to maintain "cached OK" bits as distinct from "canonical OK".
This is new to me...

<history>

Last I recall the p*OK flags were only important when looking at the
results of magic operations, i.e. after mg_get. I believe the *OK
flags would be translated down (up?) to their p*OK equivalents, so
that the presence of *OK wouldn't inadvertently avert future mg_get()
calls.

</history>


> I don't understand to what you refer. According to my understanding,
> magic vars have no specific canonical values; whatever the set routine
> looks at is effectively canonical.

I was talking about post-get examination of flags, and using the
term 'canonical' in an apparently inappropriate way. "Never Mind."

Nick Ing-Simmons

unread,
Jul 20, 2004, 5:22:19 PM7/20/04
to stho...@efn.org, Chip Salzenberg, h...@crypt.org, Nick Ing-Simmons, perl5-...@perl.org
Yitzchak Scott-Thoennes <stho...@efn.org> writes:
>On Tue, Jul 20, 2004 at 03:55:14PM -0400, Chip Salzenberg <ch...@pobox.com> wrote:
>> According to Nick Ing-Simmons:
>> > Chip Salzenberg <ch...@pobox.com> writes:
>> > >According to Hugo van der Sanden:
>> > >> Historically we cache all the representations of a value, and
>> > >> remember which is canonical.
>> > >
>> > >Remembering which one is canonical? I don't remember that.
>>
>> I still don't. Perl 5 has no concept of _canonical_ representations,
>> just _valid_ ones. SvIOK() and SvPOK() are booleans; there's no way
>> to do the conceptual comparison SvIOK() > SvPOK().
>
>I think this is what he's referring to. In the first case, the ending IV is
>canonical; in the second, it is not.

Just to be clear: for $! _IN PARTICULAR_ the 'int errno' is the one true
source of _the_ value. That is all I meant.

For other "dual vars" it isn't as clear.
BUT - for the case that caused all the grief recently.

$widget->configure(-text => 10);

The dump could be :

FLAGS = (IOK,NOK,POK,UTF8)
IV = 10
NV = 10
PV = "\x31\x30"

And NV, IV and PV (with or without UTF8) are identical in meaning
and inter-convertible without loss.

(If NV = 0.1 (or is that 0.09999999999999999999999997?) things
get a little fuzzier...)

>>
>> However: If custom conversions aren't relevant (probably because any
>> such conversion was already performed before the tests are made), then
>> both the IV and the PV of $! should be considered canonical: the IV is
>> not atoi() of the PV, and the PV is not sprintf("%d") of the IV.

Fine. But if we sv_utf8_upgrade() the PV we can get the bytes form
back without loss.

>
>I don't understand to what you refer. According to my understanding,
>magic vars have no specific canonical values; whatever the set routine
>looks at is effectively canonical. For $! this is the int.
>
>$x = "$!"; $! = $x loses information, while $x = 0+$!; $! = $x doesn't.

Use of a (non-READONLY) MAGIC var as an example is leading us astray.

The question(s) are:
Are we allowed to sv_upgrade() a SvREADONLY?
[Historically I think yes]
Are we allowed to return char * to ASCII string that results from IV -> PV
upgrade from SvPVutf8()?
[Why not it _is_ UTF-8]
Should we set SvUTF8_on in such a case?
[Don't really care, but chances are once SvPVutf8 has been called
SV is being considered in a UTF8-ish way so will save future scans.]
Should we turn SvIOK_off()?
[No one has suggested we should, but can happen with MAGICals it seems.]

Chip Salzenberg

unread,
Jul 20, 2004, 5:24:54 PM7/20/04
to Ton Hospel, perl5-...@perl.org
According to Ton Hospel:
> $a = $b = "?";

> chop($b .= chr(400));
> print "equal" if $a eq $b;
> print "$a is a letter: ", $a =~ /\w/ ? 1 : 0;
> print "$b is a letter: ", $b =~ /\w/ ? 1 : 0;

That's the case that had me going on a couple months ago.

In my opinion, which I expressed at length, C<$b .= chr(400)> -- or
with any other operation that converts bytes to Unicode -- should
die() if the string contains any non-ASCII characters (i.e. anything
with the high bit set) ... UNLESS the programmer has specifically told
Perl which locale $b came from. Currently Perl ass_u_mes Latin-1.

Jarkko's answer, paraphrased:

"I see your point, but it would hurt too much."

At which point, confronted with the immovable object, I stopped
pretending I was an unstoppable force, gave up, and learned to love
the bomb.

Chip Salzenberg

unread,
Jul 20, 2004, 6:26:52 PM7/20/04
to Nick Ing-Simmons, stho...@efn.org, h...@crypt.org, perl5-...@perl.org
According to Nick Ing-Simmons:

> The question(s) are:
> Are we allowed to sv_upgrade() a SvREADONLY?
> [Historically I think yes]

Definitely not. That's because upgrading causes a semantic change,
most obviously in the behavior of /\w/.

> Are we allowed to return char * to ASCII string that results from IV -> PV
> upgrade from SvPVutf8()?
> [Why not it _is_ UTF-8]

Sure.

> Should we set SvUTF8_on in such a case?
> [Don't really care, but chances are once SvPVutf8 has been called
> SV is being considered in a UTF8-ish way so will save future scans.]

I'd say "no", again because of /\w/. Consider:

$a = 4;
call_xs_that_does_SvPVutf8($a);
$a .= "\xE4"; # or something else that's /\w/ in Unicode
$a =~ /\w/ ? print("insane") : print("sane")

> Should we turn SvIOK_off()?
> [No one has suggested we should, but can happen with MAGICals it seems.]

No, the int is still "right".

Yitzchak Scott-Thoennes

unread,
Jul 20, 2004, 6:37:26 PM7/20/04
to Chip Salzenberg, Ton Hospel, perl5-...@perl.org

I thought the amicable resolution to this was to recommend
encoding::warnings at appropriate points in the doc, and possibly
to make it a core module. I see neither has been done.

Jarkko Hietaniemi

unread,
Jul 20, 2004, 6:38:24 PM7/20/04
to Perl 5 Porters, Chip Salzenberg
> In my opinion, which I expressed at length, C<$b .= chr(400)> -- or
> with any other operation that converts bytes to Unicode -- should
> die() if the string contains any non-ASCII characters (i.e. anything
> with the high bit set) ... UNLESS the programmer has specifically told
> Perl which locale $b came from. Currently Perl ass_u_mes Latin-1.
>
> Jarkko's answer, paraphrased:
>
> "I see your point, but it would hurt too much."
>
> At which point, confronted with the immovable object, I stopped
> pretending I was an unstoppable force, gave up, and learned to love
> the bomb.

Why do I have this sudden lust for a black glove... what I think these
days is that if someone feels like fixing this, feel free, go ahead,
hasta la vista, and so forth. I just would like to point out from the
roadside the bleached bones of the discussions that went before you.

--
Jarkko Hietaniemi <j...@iki.fi> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

Chip Salzenberg

unread,
Jul 20, 2004, 7:47:19 PM7/20/04
to Perl 5 Porters
According to Jarkko Hietaniemi:
> Chip:

> > Jarkko's answer, paraphrased:
> > "I see your point, but it would hurt too much."
>
> Why do I have this sudden lust for a black glove...

Hm. Terminator? Evil Michael Jackson? Help me out here.

> what I think these days is that if someone feels like fixing this,
> feel free, go ahead, hasta la vista, and so forth. I just would
> like to point out from the roadside the bleached bones of the
> discussions that went before you.

I'm doing just that: pointing to one of the more recent piles of
bones, and incidentally observing that some of them are mine.

John Peacock

unread,
Jul 20, 2004, 8:35:32 PM7/20/04
to Chip Salzenberg, Perl 5 Porters
Chip Salzenberg wrote:
> According to Jarkko Hietaniemi:

>
>>> At which point, confronted with the immovable object, I stopped
>>> pretending I was an unstoppable force, gave up, and learned to love
>>> the bomb.
>>
>>Why do I have this sudden lust for a black glove...
>
>
>
> Hm. Terminator? Evil Michael Jackson? Help me out here.
>

http://www.imdb.com/title/tt0057012/


HTH

John

--
John Peacock
Director of Information Research and Technology
Rowman & Littlefield Publishing Group
4720 Boston Way
Lanham, MD 20706
301-459-3366 x.5010
fax 301-429-5747

h...@crypt.org

unread,
Jul 21, 2004, 7:51:04 AM7/21/04
to Nick Ing-Simmons, perl5-...@perl.org
Nick Ing-Simmons <ni...@ing-simmons.net> wrote:
:<h...@crypt.org> writes:
:>I think this probably classes as a Jenga problem. But it'd be helpful if

:>we could define what we mean by readonly - I'd be tempted to go for
:>something like:
:> - readonly means nothing should change the value
:> - the SV can be upgraded
:> - alternative representations can be cached
:> - even if the PV is canonical we can upgrade to utf8
:
:I am not sure what you mean by "canonical" there.

I meant SVf_POK, rather than only SVp_POK; ie, "canonical" in the sense
that we may use this to derive other representations eg if you try to
get the numerical value.

Hugo

Nick Ing-Simmons

unread,
Jul 21, 2004, 1:12:38 PM7/21/04
to perl5-...@ton.iguana.be, perl5-...@perl.org
Ton Hospel <perl5-...@ton.iguana.be> writes:
>In article <2004072018...@llama.ing-simmons.net>,
> Nick Ing-Simmons <ni...@ing-simmons.net> writes:
>>
>> SvUTF8 bit on SV should not affect semantics, semantics
>> are the job of the lexical 'use utf8' hint bit.
>>
>> ..snip..
>>
>> I thought Jarkko had undone that brain damage in perl5.8.
>> But maybe I am just fantasizing...
>
>But it *does* affect semantics:

I know - but *should* it?
i.e. is this a bug?

Seems to me such cases are seldom wanted, and if the word-ness
depended on 'use utf8' state rather than SvUTF8 everyone
would be happier.

Nick Ing-Simmons

unread,
Jul 21, 2004, 1:14:13 PM7/21/04
to ch...@pobox.com, h...@crypt.org, Nick Ing-Simmons, perl5-...@perl.org, Yitzchak Scott-Thoennes
Chip Salzenberg <ch...@pobox.com> writes:
>According to Yitzchak Scott-Thoennes:
>> $ perl -MDevel::Peek -we'$x=1.5; $x*=2.1; print do{use integer;$x+3}; Dump $x'
>> SV = PVNV(0xa07ee08) at 0xa06ccbc
>> REFCNT = 1
>> FLAGS = (NOK,pIOK,pNOK)
>> IV = 3
>> NV = 3.15
>> PV = 0
>
>Thanks for the demonstration. I see that the p*OK flags are being
>used to maintain "cached OK" bits as distinct from "canonical OK".
>This is new to me...

Me too.

Nick Ing-Simmons

unread,
Jul 21, 2004, 1:19:13 PM7/21/04
to ch...@pobox.com, perl5-...@perl.org, Ton Hospel
Chip Salzenberg <ch...@pobox.com> writes:
>According to Ton Hospel:
>> $a = $b = "?";
>> chop($b .= chr(400));
>> print "equal" if $a eq $b;
>> print "$a is a letter: ", $a =~ /\w/ ? 1 : 0;
>> print "$b is a letter: ", $b =~ /\w/ ? 1 : 0;
>
>That's the case that had me going on a couple months ago.
>
>In my opinion, which I expressed at length, C<$b .= chr(400)> -- or
>with any other operation that converts bytes to Unicode -- should
>die() if the string contains any non-ASCII characters (i.e. anything
>with the high bit set) ... UNLESS the programmer has specifically told
>Perl which locale $b came from.

I could live with that.

>Currently Perl ass_u_mes Latin-1.

It is inconsistent in that assumption though.
The \w ness of a non SvUTF8 string is C locale not Latin-1 locale.

Mike Guy

unread,
Jul 21, 2004, 1:25:37 PM7/21/04
to perl5-...@perl.org
Yitzchak Scott-Thoennes <stho...@efn.org> wrote

> $x = "$!"; $! = $x loses information, while $x = 0+$!; $! = $x doesn't.

It loses more than information:

DB<5> $!=13

DB<6> x $x = 0+$!; $! = $x; $!
0 'Permission denied'
DB<7> x $x = "$!"; $! = $x; $!
Argument "Permission denied" isn't numeric in scalar assignment at (eval 12)[/home/mjtg/perl-5.8.1-RC4/lib/perl5db.pl:618] line 2.
eval '($@, $!, $^E, $,, $/, $\\, $^W) = @saved;package main; $^D = $^D | $DB::db_stop;
$x = "$!"; $! = $x; $!;

;' called at /home/mjtg/perl-5.8.1-RC4/lib/perl5db.pl line 618
DB::eval called at /home/mjtg/perl-5.8.1-RC4/lib/perl5db.pl line 3314
DB::DB called at -e line 1
0 ''
DB<8> x $!
0 ''
DB<9>


That's why the int value is canonical.


Mike Guy

Chip Salzenberg

unread,
Jul 21, 2004, 1:46:38 PM7/21/04
to Nick Ing-Simmons, perl5-...@ton.iguana.be, perl5-...@perl.org
According to Nick Ing-Simmons:

> Ton Hospel <perl5-...@ton.iguana.be> writes:
> >In article <2004072018...@llama.ing-simmons.net>,
> > Nick Ing-Simmons <ni...@ing-simmons.net> writes:
> >> SvUTF8 bit on SV should not affect semantics, semantics
> >> are the job of the lexical 'use utf8' hint bit.
> >
> >But it *does* affect semantics:
>
> I know - but *should* it?
> i.e. is this a bug?

I think the consensus is that (1) it's a bug that (2) is less painful
than the fix. It's an inevitable result of conflicting requirements.
Larry wanted chr(255) and chr(256) to be directly mixable, and he
wanted /\w/ to DWIM. The latter is a bug in the presence of the
former. Larry didn't realize this, or he didn't think it mattered.
In any case, we're stuck with it now.

> If the word-ness depended on 'use utf8' state rather than SvUTF8
> everyone would be happier.

That's a reasonable idea, and one I sympathize with, seeing as how I
invented C<use locale>. [*] But the new Conventional Wisdom is that
Unicode-ness is an attribute of the data, not the code.

As a result of this philosophy, a given bit of code (e.g. /\w/) will
behave differently depending on what kind of data you give it. So
in some sense Unicode data are like tainted data ... if you let it
touch your data you're asking for trouble ... except that Perl will
*protect* you from tainted data. No so Unicode. Oops.

[*] I guess the current Powers That Be would have been happier to see
a locale (or locale-like) data attribute, rather than a pragma.

Chip Salzenberg

unread,
Jul 21, 2004, 1:48:28 PM7/21/04
to Nick Ing-Simmons, perl5-...@perl.org, Ton Hospel
According to Nick Ing-Simmons:
> Chip Salzenberg <ch...@pobox.com> writes:
> >Currently Perl ass_u_mes Latin-1.
>
> It is inconsistent in that assumption though.
> The \w ness of a non SvUTF8 string is C locale not Latin-1 locale.

Point.

Nick Ing-Simmons

unread,
Jul 21, 2004, 1:58:44 PM7/21/04
to ch...@pobox.com, h...@crypt.org, Nick Ing-Simmons, perl5-...@perl.org, stho...@efn.org
Chip Salzenberg <ch...@pobox.com> writes:
>According to Nick Ing-Simmons:
>> The question(s) are:
>> Are we allowed to sv_upgrade() a SvREADONLY?
>> [Historically I think yes]
>
>Definitely not. That's because upgrading causes a semantic change,
>most obviously in the behavior of /\w/.

Please note I am distinguishing
sv_upgrade() and sv_utf8_upgrade()

This is about:
sv_upgrade($read_only_sv,SVt_PVIV);

e.g. This behaviour:
#!perl
use Devel::Peek;
my $a = \4; # or use constant
Dump($$a);
my $b = $$a.'x';
Dump($$a);
__END__

nick@llama:/home/p4work/mail> perl /tmp/d
SV = IV(0x8162e34) at 0x8161278
REFCNT = 2
FLAGS = (IOK,READONLY,pIOK)
IV = 4
SV = PVIV(0x81502f8) at 0x8161278
REFCNT = 2
FLAGS = (IOK,POK,READONLY,pIOK,pPOK)
IV = 4
PV = 0x815a470 "4"\0


CUR = 1
LEN = 2

>


>> Are we allowed to return char * to ASCII string that results from IV -> PV
>> upgrade from SvPVutf8()?
>> [Why not it _is_ UTF-8]
>
>Sure.

Good.

>
>> Should we set SvUTF8_on in such a case?
>> [Don't really care, but chances are once SvPVutf8 has been called
>> SV is being considered in a UTF8-ish way so will save future scans.]
>
>I'd say "no", again because of /\w/. Consider:
>
> $a = 4;
> call_xs_that_does_SvPVutf8($a);
> $a .= "\xE4"; # or something else that's /\w/ in Unicode
> $a =~ /\w/ ? print("insane") : print("sane")

Hmm, '4' is \w so I will pretend you wrote /\w$/

But, while the semantic-mis-link is there, if instead it reads:

$a = 4;


$a .= "\xE4"; # or something else that's /\w/ in Unicode

call_xs_that_does_SvPVutf8($a);
$a =~ /\w$/ ? print("insane") : print("sane")

You still get insane. To fix that SvPVutf8() would have to malloc
a "mortal PV" and return that to its caller leaving SV untouched.

The snag here is if XS is working entirely in UTF-8
as Tk804 and Gtk are, then we are forever scanning to
see if any high bits, if yes malloc & copy, call XS, free.

Chip Salzenberg

unread,
Jul 21, 2004, 2:25:33 PM7/21/04
to Nick Ing-Simmons, h...@crypt.org, perl5-...@perl.org, stho...@efn.org
According to Nick Ing-Simmons:
> This is about:
> sv_upgrade($read_only_sv,SVt_PVIV);

Oh.[1] IMO, the answer is still "No": sv_upgrade(SVt_PVIV) isn't just
setting SVp_IOK, it's also setting SVf_IOK, and that's an externally
visible change.[2]

[1] "Never mind."
[2] It's visible via the published C API, as opposed to unscrupulous C
hacking. Therefore a well-behaved XS module could see it.

Chip Salzenberg

unread,
Jul 21, 2004, 2:27:38 PM7/21/04
to Nick Ing-Simmons, h...@crypt.org, perl5-...@perl.org, stho...@efn.org
According to Nick Ing-Simmons:
> FLAGS = (IOK,POK,READONLY,pIOK,pPOK)

I should note that the presence of IOK is not visible only from C.
It also influences Perl builtin behavior, e.g. C<&>, C<|>, and C<~>,
which will do bitwise string operations only if none of the numeric
OK bits are set.

This is an additional reason why sv_upgrade() on a readonly value
should be forbidden.

Chip Salzenberg

unread,
Jul 21, 2004, 2:30:49 PM7/21/04
to Nick Ing-Simmons, h...@crypt.org, perl5-...@perl.org, stho...@efn.org
According to Nick Ing-Simmons:

> $a = 4;
> $a .= "\xE4"; # or something else that's /\w/ in Unicode
> call_xs_that_does_SvPVutf8($a);
> $a =~ /\w$/ ? print("insane") : print("sane")
>
> You still get insane. To fix that SvPVutf8() would have to malloc
> a "mortal PV" and return that to its caller leaving SV untouched.

Yes.

> The snag here is if XS is working entirely in UTF-8
> as Tk804 and Gtk are, then we are forever scanning to
> see if any high bits, if yes malloc & copy, call XS, free.

If you don't care whether the answer is right, I can get it to you as
fast as you want.

Nick Ing-Simmons

unread,
Jul 21, 2004, 4:22:52 PM7/21/04
to ch...@pobox.com, h...@crypt.org, Nick Ing-Simmons, perl5-...@perl.org, stho...@efn.org
Chip Salzenberg <ch...@pobox.com> writes:
>According to Nick Ing-Simmons:
>> FLAGS = (IOK,POK,READONLY,pIOK,pPOK)

That isn't a proposed change, but copy/paste of output of
the code fragment I posted run on perl5.8.5.
(Similar on SuSE's 5.8.3)

>
>I should note that the presence of IOK is not visible only from C.
>It also influences Perl builtin behavior, e.g. C<&>, C<|>, and C<~>,
>which will do bitwise string operations only if none of the numeric
>OK bits are set.

True enough - but that is a problem with turning ON IOK.
All my examples were already IOK, they demonstrate turning on POK.


>
>This is an additional reason why sv_upgrade() on a readonly value
>should be forbidden.

But it never has been before and it will, (I promise) break a lot of
pure perl code (as well as XS code) if you change that.

It can happen in pure perl when a READONLY is aliased so this
might be a better example:

#!perl
use Devel::Peek;
foreach (1)
{
Dump($_);
my $b = $_.'';
Dump($_);
}
foreach ('1')
{
Dump($_);
my $b = $_+0;
# bitfields on $_ now have changed semantics
Dump($_);
}
__END__


Yields:

SV = IV(0x8162e34) at 0x816126c


REFCNT = 2
FLAGS = (IOK,READONLY,pIOK)

IV = 1
SV = PVIV(0x81502f8) at 0x816126c


REFCNT = 2
FLAGS = (IOK,POK,READONLY,pIOK,pPOK)

IV = 1
PV = 0x815a470 "1"\0


CUR = 1
LEN = 2

SV = PV(0x814fffc) at 0x81612b4
REFCNT = 2
FLAGS = (POK,READONLY,pPOK)
PV = 0x816c388 "1"\0


CUR = 1
LEN = 2

SV = PVIV(0x8150308) at 0x81612b4


REFCNT = 2
FLAGS = (IOK,POK,READONLY,pIOK,pPOK)

IV = 1
PV = 0x816c388 "1"\0


CUR = 1
LEN = 2


Changing API/internals so that sv_upgrade(sv,SVt_PVIV) on a read only
is NO LONGER allowed will cause a pile of grief.

Assuming the staus quo is accepted ...

I am using it to demonstrate that we already, and have for years/ever
allowed sv_upgrade() of READONLY, and so that sets a precedent of
changing the representation while leaving value intact.

Hence asking for SvPVutf8() of a READONLY SVt_IV should be allowed.
[This was original start of this thread.]
(And indeed is - once more - in 5.8.5 that shipped.)

But I am also making the case that, by extension:

if SvUTF8 was *just* a representational thing
(and I accept that it isn't in current perl), then having
SvPVutf8 "adjust" that flag would be in the spirt of the
way sv_upgrade() "adjusts" IOK/POK.

Nick Ing-Simmons

unread,
Jul 21, 2004, 4:58:51 PM7/21/04
to ch...@pobox.com, h...@crypt.org, Nick Ing-Simmons, perl5-...@perl.org, stho...@efn.org
Chip Salzenberg <ch...@pobox.com> writes:
>According to Nick Ing-Simmons:
>> $a = 4;
>> $a .= "\xE4"; # or something else that's /\w/ in Unicode
>> call_xs_that_does_SvPVutf8($a);
>> $a =~ /\w$/ ? print("insane") : print("sane")
>>
>> You still get insane. To fix that SvPVutf8() would have to malloc
>> a "mortal PV" and return that to its caller leaving SV untouched.
>
>Yes.
>
>> The snag here is if XS is working entirely in UTF-8
>> as Tk804 and Gtk are, then we are forever scanning to
>> see if any high bits, if yes malloc & copy, call XS, free.
>
>If you don't care whether the answer is right, I can get it to you as
>fast as you want.

And what is point you are trying to make there?

The point I am trying to make is that current guts penalize
"modern" modules that WANT to use UTF-8 and Unicode semantics.


package Speech::Synthesis;
use utf8; # Not enough, it seems
# Can those of us that want to move forward have
use unicode_yes_I_am_non_american_and_I_mean_it;

my %pronounce = (rôle => 'rəʊl',
cwm => 'kuːm');
...

That is much more readable (given the editor/fonts) than:

my (%pronounce) = ("r\364le", "r\x{259}\x{28a}l",
'cwm', "ku\x{2d0}m");

The semantics of the lexical pragma would be to apply Unicode semantics
to all strings even if perl has "helpfully" downgraded to/left them in
latin-1. So that (e.g.) chr(0364) (o circumflex) is alphabetic!

Maybe it is better expressed as:

# Not locale's CTYPE, not POSIX/C CTYPE, but Unicode's CTYPE
no locale qw(Unicode);

Chip Salzenberg

unread,
Jul 21, 2004, 5:07:52 PM7/21/04
to Nick Ing-Simmons, h...@crypt.org, perl5-...@perl.org, stho...@efn.org
According to Nick Ing-Simmons:
> Chip:

> > I should note that the presence of IOK is not visible only from C.
> > It also influences Perl builtin behavior, e.g. C<&>, C<|>, and C<~>,
>
> True enough - but that is a problem with turning ON IOK.
> All my examples were already IOK, they demonstrate turning on POK.

I see. Sorry I missed that.

> I am using it to demonstrate that we already, and have for years/ever
> allowed sv_upgrade() of READONLY, and so that sets a precedent of
> changing the representation while leaving value intact.
>
> Hence asking for SvPVutf8() of a READONLY SVt_IV should be allowed.

I agree now. (Assuming I understand, which I think I do.)

> if SvUTF8 was *just* a representational thing
> (and I accept that it isn't in current perl), then having
> SvPVutf8 "adjust" that flag would be in the spirt of the
> way sv_upgrade() "adjusts" IOK/POK.

This makes sense too.

Chip Salzenberg

unread,
Jul 21, 2004, 5:18:54 PM7/21/04
to Nick Ing-Simmons, h...@crypt.org, perl5-...@perl.org, stho...@efn.org
According to Nick Ing-Simmons:

> Chip Salzenberg <ch...@pobox.com> writes:
> >According to Nick Ing-Simmons:
> >> The snag here is if XS is working entirely in UTF-8
> >> as Tk804 and Gtk are, then we are forever scanning to
> >> see if any high bits, if yes malloc & copy, call XS, free.
> >
> >If you don't care whether the answer is right, I can get it to you as
> >fast as you want.
>
> And what is point you are trying to make there?

I think that the two of us (at least) agree that:
A UTF-8 upgrade is a semantic change.
SvPVutf8 is performing such an upgrade.
That upgrade is therefore a bug.

SvPVutf8 could be fixed in two ways:

(1) It could cache the utf8 representation without changing the
SvPVX() byte string. This would require either a change to
one of the XPV structures (unlikely) or a new MAGIC structure
(in the same manner as locale collation data).

(2) It could store the utf8 translation in malloc'd memory that is
not cached, which would be freed at end of scope.

You described (2) and suggested it was not acceptable because it's
slower than the status quo. I'm making the point that correctness
is more important than incremental performance loss. But (1) is
both correct _and_ fast, so I don't know why nobody's tried it.

Oh well. >>TODO I suppose.

> The semantics of the lexical pragma would be to apply Unicode semantics
> to all strings even if perl has "helpfully" downgraded to/left them in
> latin-1. So that (e.g.) chr(0364) (o circumflex) is alphabetic!

That's an interesting idea, but orthogonal to my point. You seem to
be still fighting the battle about Unicode strings having different
semantics from byte strings, but I've stopped worrying about it (and
learned to love the bomb).

Nick Ing-Simmons

unread,
Jul 21, 2004, 5:50:42 PM7/21/04
to ch...@pobox.com, h...@crypt.org, Nick Ing-Simmons, perl5-...@perl.org, stho...@efn.org
Chip Salzenberg <ch...@pobox.com> writes:
>>
>> And what is point you are trying to make there?
>
>I think that the two of us (at least) agree that:
> A UTF-8 upgrade is a semantic change.
> SvPVutf8 is performing such an upgrade.
> That upgrade is therefore a bug.
>
>SvPVutf8 could be fixed in two ways:
>
> (1) It could cache the utf8 representation without changing the
> SvPVX() byte string. This would require either a change to
> one of the XPV structures (unlikely) or a new MAGIC structure
> (in the same manner as locale collation data).
>
> (2) It could store the utf8 translation in malloc'd memory that is
> not cached, which would be freed at end of scope.

(3) SvUTF8 could be made non-semantic, hence rendering changing
it a non-bug the same way setting SvPOK on an SvIOK is a non-bug.

>
>You described (2) and suggested it was not acceptable because it's
>slower than the status quo. I'm making the point that correctness
>is more important than incremental performance loss. But (1) is
>both correct _and_ fast, so I don't know why nobody's tried it.

(1) still has search for and possibly to malloc the MAGIC.
But now it doesn't get free()d so every string takes more than twice
as much memory as before.

>
>Oh well. >>TODO I suppose.
>
>> The semantics of the lexical pragma would be to apply Unicode semantics
>> to all strings even if perl has "helpfully" downgraded to/left them in
>> latin-1. So that (e.g.) chr(0364) (o circumflex) is alphabetic!
>
>That's an interesting idea, but orthogonal to my point.

Yeah, I am drifting - but is still todo with my current project
- multi-lingual speech synthesis - which is perl/Tk app
which needs to take strings if "words" (in latin-1 mainly) and manipulate
them and IPA phonetics (near "\x{2xx}") and display / pattern-match
both.

It works quite well for American english or really odd languages
outside latin1 - but English with accents or French or Spanish
which are latin1 seem to keep falling into the 0x80..0xFF hole
but I haven't located exactly what is going on yet.

>You seem to
>be still fighting the battle about Unicode strings having different
>semantics from byte strings,

I just want a way to get Unicode strings...

Nicholas Clark

unread,
Jul 21, 2004, 6:23:23 PM7/21/04
to Nick Ing-Simmons, ch...@pobox.com, h...@crypt.org, perl5-...@perl.org, stho...@efn.org
On Wed, Jul 21, 2004 at 10:50:42PM +0100, Nick Ing-Simmons wrote:

> > (1) It could cache the utf8 representation without changing the
> > SvPVX() byte string. This would require either a change to
> > one of the XPV structures (unlikely) or a new MAGIC structure
> > (in the same manner as locale collation data).

It doesn't necessarily need a new MAGIC structure as it could piggyback
off the existing utf8 offset cache structure ('w' magic, IIRC)

Also, one of my TODOs is to rewrite the utf8 caching to be optional.
And see if it's viable to replace its current position/substr pair of
pairs with arbitrary numbers of position pairs.

> > (2) It could store the utf8 translation in malloc'd memory that is
> > not cached, which would be freed at end of scope.
>
> (3) SvUTF8 could be made non-semantic, hence rendering changing
> it a non-bug the same way setting SvPOK on an SvIOK is a non-bug.
>
> >
> >You described (2) and suggested it was not acceptable because it's
> >slower than the status quo. I'm making the point that correctness
> >is more important than incremental performance loss. But (1) is
> >both correct _and_ fast, so I don't know why nobody's tried it.
>
> (1) still has search for and possibly to malloc the MAGIC.
> But now it doesn't get free()d so every string takes more than twice
> as much memory as before.

It can get freed every time the string is modified, which may well be
faster than constant realloc()ing.

This also sounds like a job for the use less; pragma.
Which in turn really needs lexical pragmata. Which is a TODO.
(and partly TODOne, I believe - might be more news after OSCON)

Nicholas Clark

Ton Hospel

unread,
Jul 21, 2004, 6:48:59 PM7/21/04
to perl5-...@perl.org
In article <2004072117...@llama.ing-simmons.net>,

Nick Ing-Simmons <ni...@ing-simmons.net> writes:
> Ton Hospel <perl5-...@ton.iguana.be> writes:
>>In article <2004072018...@llama.ing-simmons.net>,
>> Nick Ing-Simmons <ni...@ing-simmons.net> writes:
>>>
>>> SvUTF8 bit on SV should not affect semantics, semantics
>>> are the job of the lexical 'use utf8' hint bit.
>>>
>>> ..snip..
>>>
>>> I thought Jarkko had undone that brain damage in perl5.8.
>>> But maybe I am just fantasizing...
>>
>>But it *does* affect semantics:
>
> I know - but *should* it?
No, I don't think it should.

> i.e. is this a bug?
It *is* intentional. I consider it a design mistake, but I'm not a
core developer, I'm only from the peanut gallery.
(hopefully that will save me from the black glove)

Though even as a normal perl programmer it affects me since I can't
just safely do things like utf8::downgrade($_[0]) if I write general
libraries.

>
> Seems to me such cases are seldom wanted, and if the word-ness
> depended on 'use utf8' state rather than SvUTF8 everyone
> would be happier.
>

Actually, I'd prefer that it depended on locale, which is the way we
normally modify the meaning of things like \w, and utf8-locales is
how it works in other languages. I think being global is actually more
useful than lexical.

Though I could also live with it being lexical or an extra flag (that seems
to be what parrot does)

Jim Cromie

unread,
Jul 22, 2004, 11:56:04 AM7/22/04
to david nicol, h...@crypt.org, Perl 5 Porters
david nicol wrote:

>On Tue, 2004-07-20 at 07:01, h...@crypt.org wrote:
>
>
>
>>Yes, and it's worse than that: historically we cache all the representations
>>of a value, and remember which is canonical. With the introduction of
>>Unicode support that's no longer true: we only keep one of the two variants
>>of the string. So you can't get the up/downgraded representation of the
>>string without making a potentially irreversible change.
>>
>>:I'm not sure what to do about this. Or even if any solution is possible.
>>:Apart from not making this mistake in parrot.
>>
>>
>
>
>What if the PMAW (Perl's Magical Autoconverting Wondervariable) was
>extended to have three slots in it rather than two, for unicode/native
>support, instead of cramming both UTF8 and non-UTF8 into the same
>character strings? Conversion/unconversion difficulties would
>go away, or at least be simplified, and there wouldn't be that much
>memory cost since in situations where converting happens apparently you
>have to keep around both versions anyway as-is.
>
>Alternately, a protocol for carrying around a pointer to the
>non-canonical character string could be devised. We have character
>string types that maintain offsets into the malloced area, so maybe
>a protocol for storing a pointer to the pre-converted string at the
>beginning of the block instead of in another slot in the structure would
>make sense and maintain more backwards compatability by not suddenly
>making all SVs another pointer larger.
>
>

Ooh, with a subject like that, I feel 'qualified' to add $.02


Rather than extend the SV related structures, could we instead:

. double the storage size attached to the PV
. put the converted form of the string at PV[len]

now both forms are available, at the cost of some slightly gross
substringing
to get the added form. (I avoid saying which is which)

.utf flag becomes 'offset into PV by CUR+2', ie past the \0


Taking Nick Clark's comments;

Also, one of my TODOs is to rewrite the utf8 caching to be optional.
And see if it's viable to replace its current position/substr pair of
pairs with arbitrary numbers of position pairs.


IIUC, that is geared toward potentially unlimited interleaving of
stringlets into strings,
some of the stringlets being utf-8, others being their downconverted
counterparts,
and others being straight ascii usable in both forms. Ropes (ie string
bundles) come to mind.

one last wild (and completely unsubstantiated) conjecture:

could an extended-flags byte be embedded (profitably) into the char*
refd by xpv.PVX ?
if so, it could distinguish cached vs canonical ? more flags good.


0 new messages