Newsgroups: comp.lang.perl.misc
From: Ben Morrow <b...@morrow.me.uk>
Date: Thu, 25 Oct 2012 03:15:12 +0100
Local: Wed, Oct 24 2012 10:15 pm
Subject: Re: Why "Wide character in print"?
Quoth Eli the Bearded <*...@eli.users.panix.com>:
> In comp.lang.perl.misc, tcgo <tomeu...@gmail.com> wrote:
[You should set $/ = \1024 or something else appropriate before using <>
> > And it gives me a "warning" message: "Wide character in print at > ./unicode line 4". After > > adding "binmode(STDOUT, ":utf8");" the warning disappears, but why was > it showing before of > > adding the binmode? <snip> > The explanation in perldiag is a good start:
> =item Wide character in %s
> (S utf8) Perl met a wide character (>255) when it wasn't expecting
> But that (and the docs for binmode()) doesn't address why the warning
> echo "some binary stream with U+2639 in it" | \
on a binary file. By default <> reads newline-delimited lines, and there is no particular reason for newlines to occur in sensible places in a binary file. Of course, if the file is small enough it may be better to read the whole thing and skip the while loop altogether.] If you are dealing with :raw streams then your data needs to be in
use Encode "encode";
my $u2639 = encode "UTF-8", "\x{2639}";
s/U\+2639/$u2639/g;
Imagine you were trying to perform this replacement the other way
s/\x{2639}/U+2639/;
would never match, since the :raw layer would return a UTF8-encoded
> I've used the "while(<>) { s///g; print; }" construct to patch binary
A binary file cannot contain 'wide characters' as such, instead it
> files in the past (rename functions in compiled programs, etc). I haven't > yet needed to sub-in wide characters, but it doesn't seem unreasonable. contains some *encoding* of wide characters. Since Perl has no way to guess which encoding you want you need to be explicit, either by using Encode directly or by calling it indirectly using PerlIO::encoding. > I'm guessing that my binary stream situation is what "no warnings
No, not at all. If you review the (W utf8) warnings in perldiag, you
> 'utf8';" is intended to fix. will see they all to do with performing character operations on Unicode codepoints which are not valid characters (UTF-16 surrogates, codepoints which haven't been allocated yet, explicit non-characters like U+FFFF). They have nothing to do with ordinary Unicode IO. Ben
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
| ||||||||||||||