Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Combining UTF-16 output with :crlf is awkward

4 views
Skip to first unread message

Jan Dubois

unread,
Mar 28, 2006, 3:07:51 PM3/28/06
to Perl5 Porters, Nick Ing-Simmons
I just noticed (in a mailing list posting by someone else) that adding
the :crlf layer after a Unicode layer turns back on the "Wide character
in print" warnings. You can get rid of them by turning the PERLIO_F_UTF8
bit on the :crlf layer on too:

open my $fh, ">:raw:encoding(UTF-16LE):crlf:utf8", $filename or die;
print $fh "\x{feff}";

But this isn't very intuitive. I wonder if either PerlIOCrlf_pushed()
should "inherit" the flag from the lower layer, or if PerlIO_isutf8()
should walk the layer stack?

Cheers,
-Jan


Nick Ing-Simmons

unread,
Mar 29, 2006, 12:37:34 PM3/29/06
to ja...@activestate.com, Nick Ing-Simmons, Perl5 Porters
Jan Dubois <ja...@ActiveState.com> writes:
>I just noticed (in a mailing list posting by someone else) that adding
>the :crlf layer after a Unicode layer turns back on the "Wide character
>in print" warnings. You can get rid of them by turning the PERLIO_F_UTF8
>bit on the :crlf layer on too:
>
> open my $fh, ">:raw:encoding(UTF-16LE):crlf:utf8", $filename or die;
> print $fh "\x{feff}";
>
>But this isn't very intuitive. I wonder if either PerlIOCrlf_pushed()
>should "inherit" the flag from the lower layer,

That would be my preference I think.
Perhaps that should be the default behaviour for a layer?

>or if PerlIO_isutf8()
>should walk the layer stack?

Snag with that (apart from overhead) is that something would have to
"know" which layers did or did not affect UTF8-ness. So it is
better for layer code to do it as it should know.

>
>Cheers,
>-Jan

Jan Dubois

unread,
Apr 6, 2006, 9:37:21 PM4/6/06
to Nick Ing-Simmons, Perl5 Porters
On Wed, 29 Mar 2006, Nick Ing-Simmons wrote:
> Jan Dubois <ja...@ActiveState.com> writes:
> >I just noticed (in a mailing list posting by someone else) that adding
> >the :crlf layer after a Unicode layer turns back on the "Wide character
> >in print" warnings. You can get rid of them by turning the PERLIO_F_UTF8
> >bit on the :crlf layer on too:
> >
> > open my $fh, ">:raw:encoding(UTF-16LE):crlf:utf8", $filename or die;
> > print $fh "\x{feff}";
> >
> >But this isn't very intuitive. I wonder if either PerlIOCrlf_pushed()
> >should "inherit" the flag from the lower layer,
>
> That would be my preference I think.

Does the attached patch look right to you?

> Perhaps that should be the default behaviour for a layer?

Probably, except for the :encoding, :raw and :utf8 layers. Any other
exceptions?

If you think the patch below is the right way to do it, then I can
try to add it to all the other PerlIOXxxx_pushed() functions too.
Or is there anything else that needs to be done?

Cheers,
-Jan

--- perlio.c.orig Wed Apr 05 07:47:13 2006
+++ perlio.c Thu Apr 06 18:06:24 2006
@@ -4192,6 +4192,21 @@
* buffer */
} PerlIOCrlf;

+/* Inherit the PERLIO_F_UTF8 flag from previous layer.
+ * Otherwise the :crlf layer would always revert back to
+ * raw mode.
+ */
+static void
+S_inherit_utf8_flag(PerlIO *f)
+{
+ PerlIO *g = PerlIONext(f);
+ if (PerlIOValid(g)) {
+ if (PerlIOBase(g)->flags & PERLIO_F_UTF8) {
+ PerlIOBase(f)->flags |= PERLIO_F_UTF8;
+ }
+ }
+}
+
IV
PerlIOCrlf_pushed(pTHX_ PerlIO *f, const char *mode, SV *arg, PerlIO_funcs *tab)
{
@@ -4209,17 +4224,19 @@
* any given moment at most one CRLF-capable layer being enabled
* in the whole layer stack. */
PerlIO *g = PerlIONext(f);
- while (g && *g) {
+ while (PerlIOValid(g)) {
PerlIOl *b = PerlIOBase(g);
if (b && b->tab == &PerlIO_crlf) {
if (!(b->flags & PERLIO_F_CRLF))
b->flags |= PERLIO_F_CRLF;
+ S_inherit_utf8_flag(g);
PerlIO_pop(aTHX_ f);
return code;
}
g = PerlIONext(g);
}
}
+ S_inherit_utf8_flag(f);
return code;
}
End of Patch.

Nick Ing-Simmons

unread,
Apr 7, 2006, 3:12:50 PM4/7/06
to ja...@activestate.com, Nick Ing-Simmons, Perl5 Porters
Jan Dubois <ja...@ActiveState.com> writes:
>On Wed, 29 Mar 2006, Nick Ing-Simmons wrote:
>> Jan Dubois <ja...@ActiveState.com> writes:
>> >I just noticed (in a mailing list posting by someone else) that adding
>> >the :crlf layer after a Unicode layer turns back on the "Wide character
>> >in print" warnings. You can get rid of them by turning the PERLIO_F_UTF8
>> >bit on the :crlf layer on too:
>> >
>> > open my $fh, ">:raw:encoding(UTF-16LE):crlf:utf8", $filename or die;
>> > print $fh "\x{feff}";
>> >
>> >But this isn't very intuitive. I wonder if either PerlIOCrlf_pushed()
>> >should "inherit" the flag from the lower layer,
>>
>> That would be my preference I think.
>
>Does the attached patch look right to you?
>
>> Perhaps that should be the default behaviour for a layer?
>
>Probably, except for the :encoding,

Certainly. That always sets UTF-8 in the perl side - that is its job.

>:raw and :utf8 layers.

Don't really exist ;-) trying to push them results in manipulation
of other layers. So yes they are exceptions too.

>Any other
>exceptions?

Well something like the gzip layer (not core) is a bit different.
As the zipped side is octets it should probably complain if downstream
was expecting UTF-8.

So I am now wondering if it is only "buffering layers" that should do this
and that a global default is premature.
So perhaps PerlIO_buf should have this added but leave the others?

>
>If you think the patch below is the right way to do it,

I was a little worried that S_inherit_utf8_flag only ever set and never
cleared the flag. Then I realized that at point of call nothing else would
have set it.

>then I can
>try to add it to all the other PerlIOXxxx_pushed() functions too.
>Or is there anything else that needs to be done?

I would not be surprised if some sequence of binmode-ing on layers
needs some more. But this seems like a good start.

>
>Cheers,
>-Jan
>
>--- perlio.c.orig Wed Apr 05 07:47:13 2006
>+++ perlio.c Thu Apr 06 18:06:24 2006
>@@ -4192,6 +4192,21 @@
> * buffer */
> } PerlIOCrlf;
>
>+/* Inherit the PERLIO_F_UTF8 flag from previous layer.
>+ * Otherwise the :crlf layer would always revert back to
>+ * raw mode.
>+ */
>+static void
>+S_inherit_utf8_flag(PerlIO *f)
>+{
>+ PerlIO *g = PerlIONext(f);
>+ if (PerlIOValid(g)) {
>+ if (PerlIOBase(g)->flags & PERLIO_F_UTF8) {
>+ PerlIOBase(f)->flags |= PERLIO_F_UTF8;

>+ } else {

Rafael Garcia-Suarez

unread,
Sep 21, 2006, 10:53:06 AM9/21/06
to Perl5 Porters
Jan Dubois wrote:
> On Wed, 29 Mar 2006, Nick Ing-Simmons wrote:
> > Jan Dubois <ja...@ActiveState.com> writes:
> > >I just noticed (in a mailing list posting by someone else) that adding
> > >the :crlf layer after a Unicode layer turns back on the "Wide character
> > >in print" warnings. You can get rid of them by turning the PERLIO_F_UTF8
> > >bit on the :crlf layer on too:
> > >
> > > open my $fh, ">:raw:encoding(UTF-16LE):crlf:utf8", $filename or die;
> > > print $fh "\x{feff}";
> > >
> > >But this isn't very intuitive. I wonder if either PerlIOCrlf_pushed()
> > >should "inherit" the flag from the lower layer,
> >
> > That would be my preference I think.
>
> Does the attached patch look right to you?
>
> > Perhaps that should be the default behaviour for a layer?
>
> Probably, except for the :encoding, :raw and :utf8 layers. Any other
> exceptions?
>
> If you think the patch below is the right way to do it, then I can
> try to add it to all the other PerlIOXxxx_pushed() functions too.
> Or is there anything else that needs to be done?
>
> Cheers,
> -Jan
>
> --- perlio.c.orig Wed Apr 05 07:47:13 2006
> +++ perlio.c Thu Apr 06 18:06:24 2006

Thanks, applied as change #28879.

0 new messages