Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

MooseX::Storage::Format::JSON and utf8 encoding backwards??\rr

3 views
Skip to first unread message

Bill Moseley

unread,
Oct 1, 2012, 5:14:13 PM10/1/12
to mo...@perl.org
This code looks wrong to me. I don't think that utf8 line should be there.

sub freeze {
my ( $self, @args ) = @_;
my $json = JSON::Any->objToJson( $self->pack(@args) );
utf8::decode($json) if !utf8::is_utf8($json) and
utf8::valid($json); # if it's valid utf8 mark it as such
return $json;
}


Normally, I would want to take a structure that might include character
data (i.e utf8 flag is on) and serialize it into (utf8) encoded bytes. So
JSON::Any->objToJson does that. In other words:

my %utf8_char = ( foo => 'hello ' . "\N{U+263A}" );
my $return_trip = JSON::Any->jsonToObj( JSON::Any->objToJson( \%utf8_char )
);
print Encode::is_utf8( $return_trip->{foo} ) ? "Yes utf8\n" : "Nope, not
ut8\n";


Works as expected:

Yes utf8


So, the "freeze" method quoted above first returns the utf8-encoded string
then does this:

utf8::decode($json) if !utf8::is_utf8($json) and
utf8::valid($json); # if it's valid utf8 mark it as such


turns the string back into utf8 characters.

As a result when I send that frozen serialized string some place I get Wide
Character errors.



--
Bill Moseley
mos...@hank.org

Tomas Doran

unread,
Oct 2, 2012, 11:37:16 AM10/2/12
to Bill Moseley, mo...@perl.org
I entirely agree - there should never be explicit code like this needed.

However, currently, if you remove it - then the tests fail.

This may be as the tests are utterly wrong, or it may be as there is another bug we're not seeing / aware of.

If you could suggest a way to resolve this issue (either by fixing something more than just removing these lines, or by deciding the tests are wrong and correcting them), then I'd be fully in support of fixing this.

Cheers
t0m

Bill Moseley

unread,
Oct 2, 2012, 1:31:20 PM10/2/12
to Tomas Doran, mo...@perl.org
On Tue, Oct 2, 2012 at 8:37 AM, Tomas Doran <bobt...@bobtfish.net> wrote:

> I entirely agree - there should never be explicit code like this needed.
>
> However, currently, if you remove it - then the tests fail.
>

Funny, my tests start passing. ;)



>
> This may be as the tests are utterly wrong, or it may be as there is
> another bug we're not seeing / aware of.
>
> If you could suggest a way to resolve this issue (either by fixing
> something more than just removing these lines, or by deciding the tests are
> wrong and correcting them), then I'd be fully in support of fixing this.
>

I guess I was posting to ask why the code was added in the first place.
Thought I might be missing something.

Since then I looked at RJBS JSONpm and it just lets the json encoding do
its thing:

http://cpansearch.perl.org/src/RJBS/MooseX-Storage-Format-JSONpm-0.093092/lib/MooseX/Storage/Format/JSONpm.pm

I'll look at the tests.

Thanks,


--
Bill Moseley
mos...@hank.org

Bill Moseley

unread,
Oct 3, 2012, 10:27:39 AM10/3/12
to Tomas Doran, mo...@perl.org
On Tue, Oct 2, 2012 at 8:37 AM, Tomas Doran <bobt...@bobtfish.net> wrote:

> I entirely agree - there should never be explicit code like this needed.
>
> However, currently, if you remove it - then the tests fail.
>

I just looked at the tests. It's easy to make it mostly pass, but there's
parts of that test that I don't understand.

Just so we agree, the serialized data should be utf8-encoded octets, and
deserialized back into characters. That's what JSON does automatically.
And it's expected that character data is correctly flagged -- e.g. utf8
octets brought in from, say a database, is correctly decoded
(pg_enable_utf8 for Postgresql as an example).

I also use JSON (or JSON::XS) so not clear if JSON::ANY behaves the same.


The test has inline utf8 *character* strings but fails to set "use utf8" at
the start of the test. So, this attribute:

has 'utf8_string' => (
is => 'rw',
isa => 'Str',
default => sub { "ネットスーパー (Internet Shopping)" }
);

means that the utf8_string is not flagged as character data. That sets in
motion the failure of the rest of the tests.

So, I added "use utf8;" at the top of the test.

When comparing the serialized data then need to encode the test character
string to utf8 octets (because we are comparing to serialized octets).

is($json,
*encode_utf8(* '{"__CLASS__":"Foo","utf8_string":"ネットスーパー (Internet
Shopping)"}*')*,
'... got the right JSON');


But, I'm confused by the last test set that starts like this:

my $test_string;
{
use utf8;
$test_string = "ネットスーパー (Internet Shopping)";
no utf8;
}

Ok, so now we have a character string.

But, then the tests forces the utf8 bit off:

Encode::_utf8_off($test_string);

So, I'm just not clear what these tests are trying to do. What's the point
of testing that a character string with its utf8 flag forced off works
correctly?


--
Bill Moseley
mos...@hank.org

Chris Prather

unread,
Oct 3, 2012, 10:31:59 AM10/3/12
to Bill Moseley, Tomas Doran, mo...@perl.org
(Sorry Bill for the duplicate, I missed the list the first time)

On Wed, Oct 3, 2012 at 10:27 AM, Bill Moseley <mos...@hank.org> wrote:

> I also use JSON (or JSON::XS) so not clear if JSON::ANY behaves the same.

JSON::Any should behave identically to JSON or JSON::XS ... it is only
an API shim. If it doesn't that is a bug in JSON::Any and I would love
to have a fix for that.

That said, it sounds like the right answer here may be to drop
JSON::Any and just use JSON.

-Chris

Bill Moseley

unread,
Oct 3, 2012, 12:56:18 PM10/3/12
to Chris Prather, Tomas Doran, mo...@perl.org

Bill Moseley

unread,
Feb 27, 2013, 10:15:01 PM2/27/13
to Tomas Doran, mo...@perl.org
I had forgot about this until today until I saw errors in some new code.

Is there anything below that needs clarification? IIRC, I got stuck
because I didn't really follow what the tests were attempting to test.
Does the test need to do anything more than make sure wide characters get
turned into octets when freezing and then when thawed it's character data
again (i.e. the utf8 flag survives round trip)?

Thanks,
--
Bill Moseley
mos...@hank.org

Chris Prather

unread,
Feb 27, 2013, 10:59:02 PM2/27/13
to Bill Moseley, Tomas Doran, mo...@perl.org
On Wed, Feb 27, 2013 at 10:15 PM, Bill Moseley <mos...@hank.org> wrote:
> I had forgot about this until today until I saw errors in some new code.
>
> Is there anything below that needs clarification? IIRC, I got stuck
> because I didn't really follow what the tests were attempting to test.
> Does the test need to do anything more than make sure wide characters get
> turned into octets when freezing and then when thawed it's character data
> again (i.e. the utf8 flag survives round trip)?
>
> Thanks,

Looking at the JSON::Any code this *may* be a bug with that. At some
point in time (someone) had to add the line:

$conf->{utf8} = !$conf->{utf8}

for both the JSON and JSON::XS 2.x backends (they're handled
differently in 1.x). Can you remove that line and make sure that you
still see the issue otherwise this is a JSON::Any bug and I'll need to
investigate why that flag was neccessary (again) and whether we need
better resolution on when to apply that flag.

-Chris

Bill Moseley

unread,
Feb 27, 2013, 11:38:00 PM2/27/13
to Chris Prather, Tomas Doran, mo...@perl.org
On Wed, Feb 27, 2013 at 7:59 PM, Chris Prather <ch...@prather.org> wrote:

> On Wed, Feb 27, 2013 at 10:15 PM, Bill Moseley <mos...@hank.org> wrote:
> > I had forgot about this until today until I saw errors in some new code.
> >
> > Is there anything below that needs clarification? IIRC, I got stuck
> > because I didn't really follow what the tests were attempting to test.
> > Does the test need to do anything more than make sure wide characters get
> > turned into octets when freezing and then when thawed it's character data
> > again (i.e. the utf8 flag survives round trip)?
> >
> > Thanks,
>
> Looking at the JSON::Any code this *may* be a bug with that. At some
> point in time (someone) had to add the line:
>
> $conf->{utf8} = !$conf->{utf8}
>

Maybe I'm not following, but the bug is in MooseX::Storage::Format::JSON.
this line should just be removed.

utf8::decode($json) if !utf8::is_utf8($json) and utf8::valid($json); # if
it's valid utf8 mark it as such

That's just wrong a number of ways.

$ perl -le 'use utf8; print "Yes, ASCII is valid utf8" if utf8::valid(
"just ASCII" )'
Yes, ASCII is valid utf8

plus the fact that JSON just encoded into octets -- so why decode back to
characters?

$ perl -MJSON::Any -MEncode -wle 'print "Happy octets" unless
Encode::is_utf8( JSON::Any->new->objToJson( { smile => "I am wide
\N{U+263A}" } ))'
Happy octets

All the specific "utf8" code doesn't need to be there. Even in thaw. The
character data (flagged correctly with the utf8 flag) survives round trip
w/o needing to use "utf8" anywhere.

$ perl -MJSON::Any -MEncode -wle 'print "Happy characters" if
Encode::is_utf8( JSON::Any->new->jsonToObj( JSON::Any->new->objToJson( {
smile => "I am wide \N{U+263A}" } ))->{smile} )'
Happy characters

Sorry, if the one-liners are a bit hard to read.


--
Bill Moseley
mos...@hank.org

Bill Moseley

unread,
Feb 28, 2013, 12:35:33 AM2/28/13
to Chris Prather, Tomas Doran, mo...@perl.org
Or to be a bit more concise -- I would think this is enough:

sub thaw {
my ( $class, $json, @args ) = @_;
return $class->unpack( JSON::Any->decode( $json ), @args );
}

sub freeze {
my ( $self, @args ) = @_;
return JSON::Any->encode( $self->pack( @args ) );
--
Bill Moseley
mos...@hank.org

Bill Moseley

unread,
Mar 11, 2013, 1:06:06 PM3/11/13
to mo...@perl.org
Just a ping here. Does anyone see why this isn't the correct approach?
Are there some possible existing uses where this might break someone's
code? Seems more likely that the existing implementation would break
people's code.
--
Bill Moseley
mos...@hank.org
0 new messages