Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

JSON and Unicode, am I missing something?

884 views
Skip to first unread message

Eli the Bearded

unread,
Jun 5, 2015, 7:38:19 PM6/5/15
to
The JSON module claims to expect UTF-8, but it doesn't seem to like it.
I get the same "Wide character in subroutine entry" error for external
UTF-8 data files read { open(FH, '<:encoding(UTF-8)', $jsonfile) },
UTF-8 included in strings in "use utf8;" source, and \x{} escapes for
Unicode characters.

:r! cat /tmp/test-json
#!/usr/bin/perl -w
use JSON;
use strict;
use vars qw( $json $data );

# begin flailing for a fix to "Wide character in subroutine entry" {
use diagnostics;
use feature 'unicode_strings';
use utf8;
binmode STDIN, ':utf8';
binmode STDERR, ':utf8';
binmode STDOUT, ':utf8';
no warnings 'utf8';
# } end flailing

$json = qq![
{
"unicode": "U+2512",
"highbit": "\x{2512}"
}
]!;

$data = decode_json $json;
__END__

:r! env -i /usr/local/bin/perl5.14.1 /tmp/test-json
Wide character in subroutine entry at /tmp/test-json line 23 (#1)
(S utf8) Perl met a wide character (>255) when it wasn't expecting
one. This warning is by default on for I/O (like print). The easiest
way to quiet this warning is simply to add the :utf8 layer to the
output, e.g. binmode STDOUT, ':utf8'. Another way to turn off the
warning is to add no warnings 'utf8'; but that is often closer to
cheating. In general, you are supposed to explicitly mark the
filehandle with an encoding, see open and "binmode" in perlfunc.

Uncaught exception from user code:
Wide character in subroutine entry at /tmp/test-json line 23.
at /tmp/test-json line 23

:r! env -i /usr/local/bin/perl5.20.2 /tmp/test-json
Use of uninitialized value $^WARNING_BITS in bitwise xor (^) at /usr/local/lib/perl5/site_perl/5.14.1/common/sense.pm line 237.
Use of uninitialized value $^WARNING_BITS in bitwise xor (^) at /usr/local/lib/perl5/site_perl/5.14.1/common/sense.pm line 237.
Wide character in subroutine entry at /tmp/test-json line 23 (#1)
(S utf8) Perl met a wide character (>255) when it wasn't expecting
one. This warning is by default on for I/O (like print). The easiest
way to quiet this warning is simply to add the :utf8 layer to the
output, e.g. binmode STDOUT, ':utf8'. Another way to turn off the
warning is to add no warnings 'utf8'; but that is often closer to
cheating. In general, you are supposed to explicitly mark the
filehandle with an encoding, see open and "binmode" in perlfunc.

Uncaught exception from user code:
Wide character in subroutine entry at /tmp/test-json line 23.

:r! /usr/local/bin/perl5.20.2 -v

This is perl 5, version 20, subversion 2 (v5.20.2) built for i386-netbsd-thread-multi

Copyright 1987-2015, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.


That's running on the Panix hosts, where I have my personal webspace. Panix
keeps multiple versions of perl around, currently nine between 5.00403
and 5.20.2. I get the same results on Ubuntu (12.04.4) with the packaged
perl:

This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux-gnu-thread-multi
(with 57 registered patches, see perl -V for more detail)

$ /usr/bin/perl test-json
Wide character in subroutine entry at test-json line 21 (#1)
(S utf8) Perl met a wide character (>255) when it wasn't expecting
one. This warning is by default on for I/O (like print). The easiest
way to quiet this warning is simply to add the :utf8 layer to the
output, e.g. binmode STDOUT, ':utf8'. Another way to turn off the
warning is to add no warnings 'utf8'; but that is often closer to
cheating. In general, you are supposed to explicitly mark the
filehandle with an encoding, see open and "binmode" in perlfunc.

Uncaught exception from user code:
Wide character in subroutine entry at test-json line 21.
at test-json line 21
$

(That test-json doesn't have the two "flailing" comments, so different
line numbers.)

What am I missing here?

Elijah
------
has other code using the JSON module that just seems to work

Rainer Weikusat

unread,
Jun 6, 2015, 10:50:38 AM6/6/15
to
You aren't passing 'utf-8' into the function but a Perl string
containing wide characters. For this example, you'd either need to use
the interface which accepts unicode or 'encode' your data into UTF-8
which basically means turning off the 'utf8' flag. Example with
everything not serving any purpose removed (tested with 5.14.2)

--------
#!/usr/bin/perl -w
use JSON;
use strict;
use vars qw( $json $data );
use Encode;

# begin flailing for a fix to "Wide character in subroutine entry" {
use diagnostics;
use feature 'unicode_strings';
# } end flailing

$json = qq![
{
"unicode": "U+2512",
"highbit": "\x{2512}"
}
]!;

$data = from_json($json);
$data = decode_json(encode('utf-8', $json));
---------

There is no UTF-8 in your source code and none of the STD*-streams is
used for anything (in addition to being a weird idea, the virtual
top-secret internal Perl encoding also seems to be a tad bit too
complicated to be easily understood ...).


Eric Pozharski

unread,
Jun 6, 2015, 1:33:15 PM6/6/15
to
with <eli$15060...@qz.little-neck.ny.us> Eli the Bearded wrote:

*SKIP*
> Uncaught exception from user code:
> Wide character in subroutine entry at /tmp/test-json line 23.
> at /tmp/test-json line 23
*SKIP*
> What am I missing here?

Evidence, of course. I don't understand why on site perl hides this:

Uncaught exception from user code:
Wide character in subroutine entry at /home/whynot/foo.hgT42R.pl line 23
at /usr/share/perl5/JSON/backportPP.pm line 654
JSON::PP::PP_decode_json('JSON::PP=HASH(0x88b493c)', '[
{
"unicode": "U+2512",
"highbit": "┒"
...', 0) called at /usr/share/perl5/JSON/backportPP.pm line 149
JSON::PP::decode('JSON::PP=HASH(0x88b493c)', '[
{
"unicode": "U+2512",
"highbit": "┒"
...') called at /usr/share/perl5/JSON/backportPP.pm line 111
JSON::PP::decode_json('[
{
"unicode": "U+2512",
"highbit": "┒"
...') called at /home/whynot/foo.hgT42R.pl line 23

For me /usr/share/perl5/JSON/backportPP.pm around line#654 looks like
this:

651: ($utf8, $relaxed, $loose, $allow_bigint, $allow_barekey, $singlequote)
652: = @{$idx}[P_UTF8, P_RELAXED, P_LOOSE .. P_ALLOW_SINGLEQUOTE];
653
654: if ( $utf8 ) {
655: utf8::downgrade( $text, 1 ) or Carp::croak("Wide character in subroutine entry");
656 }
657 else {
658: utf8::upgrade( $text );
659 }
660

HTH?

--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom

Eli the Bearded

unread,
Jun 7, 2015, 2:47:53 AM6/7/15
to
In comp.lang.perl.misc,
Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:
> You aren't passing 'utf-8' into the function but a Perl string
> containing wide characters. For this example, you'd either need to use
> the interface which accepts unicode or 'encode' your data into UTF-8
> which basically means turning off the 'utf8' flag. Example with
> everything not serving any purpose removed (tested with 5.14.2)

I've lost you at "accepts unicode". UTF-8 is Unicode. It is not the
only Unicode encoding, but it is one of the more common ones. \x{2512}
is a reference to a defined Unicode code point. It is Unicode.

:r! cat mktest-json
#!/usr/bin/perl -w
use strict;
use vars qw( $file $json );
my $file = '/tmp/json-data';
my $json = qq![
{
"unicode": "U+2512",
"highbit": "\x{2512}"
}
]!;

if(!open(JSON, '>:encoding(UTF-8)', $file)) {
die "$0: oops: $file $!\n";
}

print JSON $json;
close JSON;
__END__

:r! perl5.14.2 mktest-json; file /tmp/json-data
/tmp/json-data: UTF-8 Unicode text

:r! cat test-json
#!/usr/bin/perl -w
use strict;
use JSON;
use vars qw( $json $data $file );
$file = '/tmp/json-data';
$json = '';

if(!open(JSON, '<:encoding(UTF-8)', $file)) {
die "$0: oops: $file $!\n";
}
while (<JSON>) { $json .= $_; }
close JSON;

$data = decode_json $json;
__END__

:r! perl5.14.2 test-json
Wide character in subroutine entry at test-json line 14.


But rereading this sentence:
> You aren't passing 'utf-8' into the function but a Perl string
> containing wide characters.

I think you are trying to say that because I've informed Perl of the
file encoding, I'm running into issues since Perl is decoding from UTF-8
to it's internal encoding, and then that internal encoding is breaking
the 'this is uninterpreted UTF-8' requirement of the decode_json()
function.

That's a subtlety that yes, I can see myself overlooking. And indeed,
that does seem to fix it:

:r! cat test-json-noencode
#!/usr/bin/perl -w
use strict;
use JSON;
use vars qw( $json $data $file );
$file = '/tmp/json-data';
$json = '';

if(!open(JSON, '<', $file)) {
die "$0: oops: $file $!\n";
}
while (<JSON>) { $json .= $_; }
close JSON;

$data = decode_json $json;
__END__

:r! perl5.14.2 test-json-noencode


(No output of all, of course, being the expected result of that test.)

Elijah
------
can now proceed to fix the actual code

Rainer Weikusat

unread,
Jun 7, 2015, 7:21:53 AM6/7/15
to
Eli the Bearded <*@eli.users.panix.com> writes:
> In comp.lang.perl.misc,
> Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:
>> You aren't passing 'utf-8' into the function but a Perl string
>> containing wide characters. For this example, you'd either need to use
>> the interface which accepts unicode or 'encode' your data into UTF-8
>> which basically means turning off the 'utf8' flag. Example with
>> everything not serving any purpose removed (tested with 5.14.2)
>
> I've lost you at "accepts unicode".

You've lost your understanding of the module documentation I was quoting
at that point: 'Accepts unicode' means 'accepts a string marked as
UTF-8' which is not the same as 'a UTF-8 string' because the latter is a
sequence of bytes without encoding information (NB: This is again an
almost verbatim quote from the documentation)

> UTF-8 is Unicode.

UTF-8 is scheme for encoding numbers whose values can't be
represented with only 7 value bits.

Rainer Weikusat

unread,
Jun 7, 2015, 7:40:41 AM6/7/15
to
Rainer Weikusat <rwei...@mobileactivedefense.com> writes:
> Eli the Bearded <*@eli.users.panix.com> writes:
>> In comp.lang.perl.misc,
>> Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:
>>> You aren't passing 'utf-8' into the function but a Perl string
>>> containing wide characters. For this example, you'd either need to use
>>> the interface which accepts unicode or 'encode' your data into UTF-8
>>> which basically means turning off the 'utf8' flag. Example with
>>> everything not serving any purpose removed (tested with 5.14.2)
>>
>> I've lost you at "accepts unicode".
>
> You've lost your understanding of the module documentation I was quoting
> at that point:

This is sort-of silly but "gebranntes Kind scheut den Ofen". The
corresponding passage is

# option-acceptable interfaces (expect/generate UNICODE by default)

$json_text = to_json( $perl_scalar, { ascii => 1, pretty => 1 } );
$perl_scalar = from_json( $json_text, { utf8 => 1 });


0 new messages