GSOC Status Report, Week 12

13 views

Skip to first unread message

Brian Fraser

unread,

Aug 19, 2011, 11:26:14 PM8/19/11

to tpf-gsoc...@googlegroups.com, Perl5 Porters Mailing List, Florian Ragwitz, Father Chrysostomos, Zefram, Karl Williamson

Howdy all.

Apologies for the delay; Some family issues came up so I've been tangled up in that.

Not much happened last week. Organized the commits, got a couple of warnings that I had missed, reviewed things a bit (a handful of things I cleaned up in toke.c weren't paying attention to XIDC, only XIDS), and kept trying to figure out that do FILE failure.

About the latter: Preloading the swash breaks t/comp/utf.t - I'm not sure why. There's a way to get both cases working, somewhat, but it's basically just piling more hacks on top of the originally not-too-pleasing solution -- So nothing close to a resolution yet. I guess we'll deal with it afterwards. I might be able to tie it with the swallow_bom part now, so it could be for the best eventually, even if somewhat discouraging right now.

About swallow_bom(): I've been giving Nicholas' suggestion of changing the custom filter to an encoding layer. That seems like a winner to me, but I seem to recall reading that PerlIO & Encode don't handle BOM'd streams all too well. Admittedly, it was a 5-6 year old post which I can't seem to track right now, but would that be a worry here?

I also did some tinkering with normalization; With lexicals, as expected, it's really rather trivial to implement (It only requires two calls to Unicode::Normalize::Etc(), one when storing and one when fetching), but that's low hanging fruit.
Having given GVs and stashes a thought, I've come to realize that Nicholas' original assessment was spot-on, my original optimism be damned. Not only do you have to normalize strings passed in for lookup, but since we don't enforce a normalization form by default, you can't rely on the stash keys being properly normalized either!
And what do you do about, say, labels? Or package names? What if a package has a different normalization form?
Plus, if I type in $::ni\x{F1}o and later do keys %::, I want that to come out, not "nin\x{303}o" - So it's WASUTF8 all over again there.

Is there a way out of this that doesn't require the core to normalize by default?