Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion GSOC Status Report, Week 12

Received: by 10.204.137.88 with SMTP id v24mr13983bkt.10.1313810775905;
        Fri, 19 Aug 2011 20:26:15 -0700 (PDT)
X-BeenThere: tpf-gsoc-students@googlegroups.com
Received: by 10.204.39.211 with SMTP id h19ls3892877bke.2.gmail; Fri, 19 Aug
 2011 20:26:15 -0700 (PDT)
Received: by 10.204.132.22 with SMTP id z22mr14595bks.8.1313810774807;
        Fri, 19 Aug 2011 20:26:14 -0700 (PDT)
Received: by 10.204.132.22 with SMTP id z22mr14594bks.8.1313810774792;
        Fri, 19 Aug 2011 20:26:14 -0700 (PDT)
Return-Path: <frase...@gmail.com>
Received: from mail-bw0-f44.google.com (mail-bw0-f44.google.com [209.85.214.44])
        by gmr-mx.google.com with ESMTPS id af1si1993675bkc.0.2011.08.19.20.26.14
        (version=TLSv1/SSLv3 cipher=OTHER);
        Fri, 19 Aug 2011 20:26:14 -0700 (PDT)
Received-SPF: pass (google.com: domain of frase...@gmail.com designates 209.85.214.44 as permitted sender) client-ip=209.85.214.44;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of frase...@gmail.com designates 209.85.214.44 as permitted sender) smtp.mail=frase...@gmail.com; dkim=pass (test mode) header...@gmail.com
Received: by bkar4 with SMTP id r4so4353118bka.17
        for <tpf-gsoc-students@googlegroups.com>; Fri, 19 Aug 2011 20:26:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=mime-version:date:message-id:subject:from:to:cc:content-type;
        bh=c3Q4C3aXFWdRoBaSmJjvjaM21PC7J3It46ez4MXCZFg=;
        b=c6lQOYjtsyS7ssv2VlzyVX7WqUoakoi/AV4QNtQ64GxfcT/CiOA5cdWqyNVLGxCvOe
         /7KXq+aU9bOYu4Aj/DDLFTJF4O0ujK5USvGHWAoR4weC3EHscOo8xu/AmivVu1AlE96I
         F0IsSpTqlBUU6eKpUeYHbt1mXAxNytqcemT0o=
MIME-Version: 1.0
Received: by 10.204.10.74 with SMTP id o10mr35534bko.14.1313810774456; Fri, 19
 Aug 2011 20:26:14 -0700 (PDT)
Received: by 10.204.10.150 with HTTP; Fri, 19 Aug 2011 20:26:14 -0700 (PDT)
Date: Sat, 20 Aug 2011 00:26:14 -0300
Message-ID: <CA+nL+nb=Tr74Krazsba0yRyheoAiwuP7hcxN5eTVgyVhdFoDjg@mail.gmail.com>
Subject: GSOC Status Report, Week 12
From: Brian Fraser <frase...@gmail.com>
To: tpf-gsoc-students@googlegroups.com, 
	Perl5 Porters Mailing List <perl5-port...@perl.org>
Cc: Florian Ragwitz <r...@debian.org>, Father Chrysostomos <spr...@cpan.org>, Zefram <zef...@fysh.org>, 
	Karl Williamson <pub...@khwilliamson.com>
Content-Type: multipart/alternative; boundary=0015175cd2a6837ade04aae768ff

--0015175cd2a6837ade04aae768ff
Content-Type: text/plain; charset=UTF-8

Howdy all.

Apologies for the delay; Some family issues came up so I've been tangled up
in that.
Not much happened last week. Organized the commits, got a couple of warnings
that I had missed, reviewed things a bit (a handful of things I cleaned up
in toke.c weren't paying attention to XIDC, only XIDS), and kept trying to
figure out that do FILE failure.

About the latter: Preloading the swash breaks t/comp/utf.t - I'm not sure
why. There's a way to get both cases working, somewhat, but it's basically
just piling more hacks on top of the originally not-too-pleasing solution --
So nothing close to a resolution yet. I guess we'll deal with it afterwards.
I might be able to tie it with the swallow_bom part now, so it could be for
the best eventually, even if somewhat discouraging right now.

About swallow_bom(): I've been giving Nicholas' suggestion of changing the
custom filter to an encoding layer. That seems like a winner to me, but I
seem to recall reading that PerlIO & Encode don't handle BOM'd streams all
too well. Admittedly, it was a 5-6 year old post which I can't seem to track
right now, but would that be a worry here?

I also did some tinkering with normalization; With lexicals, as expected,
it's really rather trivial to implement (It only requires two calls to
Unicode::Normalize::Etc(), one when storing and one when fetching), but
that's low hanging fruit.
Having given GVs and stashes a thought, I've come to realize that Nicholas'
original assessment was spot-on, my original optimism be damned. Not only do
you have to normalize strings passed in for lookup, but since we don't
enforce a normalization form by default, you can't rely on the stash keys
being properly normalized either!
And what do you do about, say, labels? Or package names? What if a package
has a different normalization form?
Plus, if I type in $::ni\x{F1}o and later do keys %::, I want that to come
out, not "nin\x{303}o" - So it's WASUTF8 all over again there.

Is there a way out of this that doesn't require the core to normalize by
default?

..And that's about it for the report, unfortunately. Back to rebasing.

--0015175cd2a6837ade04aae768ff
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div>Howdy all.<br></div><div><br></div><div>Apologies for the delay; Some =
family issues came up so I&#39;ve been tangled up in that.</div><div>Not mu=
ch happened last week. Organized the commits, got a couple of warnings that=
 I had missed, reviewed things a bit (a handful of things I cleaned up in t=
oke.c weren&#39;t paying attention to XIDC, only XIDS), and kept trying to =
figure out that do FILE failure.</div>

<div><br></div><div>About the latter: Preloading the swash breaks t/comp/ut=
f.t - I&#39;m not sure why. There&#39;s a way to get both cases working, so=
mewhat, but it&#39;s basically just piling more hacks on top of the origina=
lly not-too-pleasing solution -- So nothing close to a resolution yet. I gu=
ess we&#39;ll deal with it afterwards. I might be able to tie it with the s=
wallow_bom part now, so it could be for the best eventually, even if somewh=
at discouraging right now.</div>

<div><br></div><div>About swallow_bom(): I&#39;ve been giving Nicholas&#39;=
 suggestion of changing the custom filter to an encoding layer. That seems =
like a winner to me, but I seem to recall reading that PerlIO &amp; Encode =
don&#39;t handle BOM&#39;d streams all too well. Admittedly, it was a 5-6 y=
ear old post which I can&#39;t seem to track right now, but would that be a=
 worry here?</div>

<div><br>I also did some tinkering with normalization; With lexicals, as ex=
pected, it&#39;s really rather trivial to implement (It only requires two c=
alls to Unicode::Normalize::Etc(), one when storing and one when fetching),=
 but that&#39;s low hanging fruit.<br>
Having given GVs and stashes a thought, I&#39;ve come to realize that Nicho=
las&#39; original assessment was spot-on, my original optimism be damned. N=
ot only do you have to normalize strings passed in for lookup, but since we=
 don&#39;t enforce a normalization form by default, you can&#39;t rely on t=
he stash keys being properly normalized either!<br>
And what do you do about, say, labels? Or package names? What if a package =
has a different normalization form?<br>Plus, if I type in $::ni\x{F1}o and =
later do keys %::, I want that to come out, not &quot;nin\x{303}o&quot; - S=
o it&#39;s WASUTF8 all over again there.<br>
<br>Is there a way out of this that doesn&#39;t require the core to normali=
ze by default?<br><br></div><div>..And that&#39;s about it for the report, =
unfortunately. Back to rebasing.<br><br></div>

--0015175cd2a6837ade04aae768ff--