> In hindsight, the reason was pretty obvious. TomC's talk of normalization in
> the other thread made things click for me; Neither hv_name_set (and related
> functions) nor gv_init downgraded input when inserting new names. I think
> this was a conscious decision early on, because all of those functions
> internally call share_hek(), which does downgrade whenever possible.
> What I hadn't noticed early on was that the hash passed to share_hek wasn't
> modified along with the pv/len; Now I'm downgrading before calculating the
> hash, and things are working fine, apparently. So huzzah.
That's a bug in Perl_share_hek() then. As it's taking it upon itself to change
which string is stored, it ought to be calculating the hash corresponding to
the change it unilaterally made.
> The second bug was something I mentioned in a reply to the previous report;
> trying to load swashes too late (i.e. after a croak from the tokenizer)
> resulted in a 'do FILE' in utf8_heavy.pl dying from a compilation error.
> I still have no clue why. But a workaround (that doesn't suck as much as
> switching the isIDFIRST checks to something else) is simply loading the XIDS
> swash early on. Putting something like
> if (UTF) {
> bool tmpool;
> tmpbool = is_utf8_xidfirst((U8*)"");
> }
> near the top of yylex() should make do for that; I haven't traveled far
> enough into SWASH territory to figure out a less ham-fisted solution.
That's ugly. It would be nice to work out why it's a problem. I appreciate
that you don't have enough time to be that person.
> - Source filters still have the UTF-8ness of their returned SVs ignored.
Yes. Source filters are bugging me as part of the "what *does* C<use utf8;>"
mean?"
Nicholas Clark