Brian Fraser
unread,Aug 19, 2011, 11:26:14 PM8/19/11Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tpf-gsoc...@googlegroups.com, Perl5 Porters Mailing List, Florian Ragwitz, Father Chrysostomos, Zefram, Karl Williamson
Howdy all.
Apologies for the delay; Some family issues came up so I've been tangled up in that.
Not much happened last week. Organized the commits, got a couple of warnings that I had missed, reviewed things a bit (a handful of things I cleaned up in toke.c weren't paying attention to XIDC, only XIDS), and kept trying to figure out that do FILE failure.
About the latter: Preloading the swash breaks t/comp/utf.t - I'm not sure why. There's a way to get both cases working, somewhat, but it's basically just piling more hacks on top of the originally not-too-pleasing solution -- So nothing close to a resolution yet. I guess we'll deal with it afterwards. I might be able to tie it with the swallow_bom part now, so it could be for the best eventually, even if somewhat discouraging right now.
About swallow_bom(): I've been giving Nicholas' suggestion of changing the custom filter to an encoding layer. That seems like a winner to me, but I seem to recall reading that PerlIO & Encode don't handle BOM'd streams all too well. Admittedly, it was a 5-6 year old post which I can't seem to track right now, but would that be a worry here?
I also did some tinkering with normalization; With lexicals, as expected, it's really rather trivial to implement (It only requires two calls to Unicode::Normalize::Etc(), one when storing and one when fetching), but that's low hanging fruit.
Having given GVs and stashes a thought, I've come to realize that Nicholas' original assessment was spot-on, my original optimism be damned. Not only do you have to normalize strings passed in for lookup, but since we don't enforce a normalization form by default, you can't rely on the stash keys being properly normalized either!
And what do you do about, say, labels? Or package names? What if a package has a different normalization form?
Plus, if I type in $::ni\x{F1}o and later do keys %::, I want that to come out, not "nin\x{303}o" - So it's WASUTF8 all over again there.
Is there a way out of this that doesn't require the core to normalize by default?
..And that's about it for the report, unfortunately. Back to rebasing.