json vs simplejson

Showing 1-10 of 10 messages
json vs simplejson Luke Plant 6/11/12 2:51 PM
Hi all,

We've switched internally from json to simplejson. Our 1.5 release notes
say:

You can safely change any use of django.utils.simplejson to json

I just found a very big difference between json and simplejson

>>> simplejson.loads('{"x":"y"}')
{'x': 'y'}

>>> json.loads('{"x":"y"}')
{u'x': u'y'}

i.e. simplejson returns bytestrings if the string is ASCII (it returns
unicode objects otherwise), while json returns unicode objects always.

This was, unfortunately, a very unfortunate design decision on the part
of simplejson - json is definitely correct here - and a very big change
in semantics. It led to one very difficult to debug error for me already.

So, this is a shout out to other people to watch out for this, and a
call for ideas on what we can do to mitigate the impact of this. It's
likely to crop up in all kinds of horrible places, deep in libraries
that you can't do much about. In my case I was loading config, including
passwords, from a config file in JSON, and the password was now
exploding inside smtplib because it was a unicode object.

Yuck. Ideas?

Luke


--
OSBORN'S LAW
    Variables won't, constants aren't.

Luke Plant || http://lukeplant.me.uk/
schinckel 6/11/12 9:06 PM <This message has been deleted.>
Re: json vs simplejson Alex Ogier 6/11/12 10:14 PM
On Mon, Jun 11, 2012 at 5:51 PM, Luke Plant <L.Pla...@cantab.net> wrote:
>
> i.e. simplejson returns bytestrings if the string is ASCII (it returns
> unicode objects otherwise), while json returns unicode objects always.
>

This seemed strange to me because the standard library json shipping
with python 2.7.3 is in fact simplejson 2.0.9, so I did some digging.
It turns out that if the C extensions have been compiled and you pass
a str instance to loads(), then you get that behavior in both
versions. This isn't documented anywhere, but here's the offending
pieces:

http://hg.python.org/releasing/2.7.3/file/7bb96963d067/Modules/_json.c#l419
https://github.com/simplejson/simplejson/blob/master/simplejson/_speedups.c#L527

If the C extensions aren't enabled, or you pass a unicode string to
loads(), then you get the "proper" behavior as documented. I'm not
sure how you are triggering this optimized, iffy behavior in
django.utils.simplejson though, without also triggering it in the
standard library. Did you ever install simplejson with 'pip install
simplejson' such that Django picked it up? Can you try running 'from
django.utils import simplejson; print simplejson.__version__'?
Re: json vs simplejson Vinay Sajip 6/12/12 2:58 AM
On Jun 11, 10:51 pm, Luke Plant <L.Plant...@cantab.net> wrote:


> We've switched internally from json to simplejson. Our 1.5 release notes
> say:

Do you mean the other way around?

> You can safely change any use of django.utils.simplejson to json
>
> I just found a very big difference between json and simplejson
>
> >>> simplejson.loads('{"x":"y"}')
>
> {'x': 'y'}
>
> >>> json.loads('{"x":"y"}')
>
> {u'x': u'y'}
>
> i.e. simplejson returns bytestrings if the string is ASCII (it returns
> unicode objects otherwise), while json returns unicode objects always.
>
> This was, unfortunately, a very unfortunate design decision on the part
> of simplejson - json is definitely correct here - and a very big change
> in semantics. It led to one very difficult to debug error for me already.

Right. And on Python 3, the json module (correctly) doesn't accept
byte-strings at all.

> So, this is a shout out to other people to watch out for this, and a
> call for ideas on what we can do to mitigate the impact of this. It's
> likely to crop up in all kinds of horrible places, deep in libraries
> that you can't do much about. In my case I was loading config, including
> passwords, from a config file in JSON, and the password was now
> exploding inside smtplib because it was a unicode object.

This is one place where there are limitations in the 2.x stdlib -
other places include cStringIO and cookies. For example, if you pass a
Unicode object to a cStringIO.StringIO, it doesn't complain, but does
the wrong thing:

>>> from cStringIO import StringIO; StringIO(u'abc').getvalue()
'a\x00b\x00c\x00'
>>>

Fun and games ...

I'm not sure there's any easy way out, other than comprehensive
testing.

Regards,

Vinay Sajip
Re: json vs simplejson Luke Plant 6/12/12 3:53 AM
On 12/06/12 06:14, Alex Ogier wrote:

> This seemed strange to me because the standard library json shipping
> with python 2.7.3 is in fact simplejson 2.0.9, so I did some digging.
> It turns out that if the C extensions have been compiled and you pass
> a str instance to loads(), then you get that behavior in both
> versions. This isn't documented anywhere, but here's the offending
> pieces:
>
> http://hg.python.org/releasing/2.7.3/file/7bb96963d067/Modules/_json.c#l419
> https://github.com/simplejson/simplejson/blob/master/simplejson/_speedups.c#L527
>
> If the C extensions aren't enabled, or you pass a unicode string to
> loads(), then you get the "proper" behavior as documented. I'm not
> sure how you are triggering this optimized, iffy behavior in
> django.utils.simplejson though, without also triggering it in the
> standard library. Did you ever install simplejson with 'pip install
> simplejson' such that Django picked it up? Can you try running 'from
> django.utils import simplejson; print simplejson.__version__'?

Thanks for digging into that.

(BTW, in reply to Vinay, yes I meant "from simplejson to json", not the
other way around).

I've found the same difference of behaviour on both a production machine
where I'm running my app (CentOS machine, using a virtualenv, Python
2.7.3), and locally on my dev machine which is currently running Debian,
using the Debian Python 2.7.2 packages.

In both cases, json is always returning unicode objects, which implies I
don't have the C extensions for the json module according to your
analysis. I don't know enough about how this is supposed to work to
understand why.

It also implies I probably not the only one affected by this, if it's
happened on two quite different machines. Looking at this discussion:

http://stackoverflow.com/questions/712791/json-and-simplejson-module-differences-in-python

it seems that lots of people don't have the C extension for json
(reporting json 10x slower than simplejson).
Re: json vs simplejson Luke Plant 6/12/12 4:19 AM
On 12/06/12 10:58, Vinay Sajip wrote:
>
> I'm not sure there's any easy way out, other than comprehensive
> testing.

There is another issue I found.

Django's DateTimeAwareJSONEncoder now subclasses json.JSONEncoder
instead of simplejson.JSONEncoder. The two are not perfectly compatible.
simplejson.dumps() passes the keyword argument 'namedtuple_as_object' to
the JSON encoder class that you pass in, but json.JSONEncoder doesn't
accept that argument, resulting in a TypeError.

So any library that uses Django's JSONEncoder subclasses, but uses
simplejson.dumps() (either via 'import simplejson' or 'import
django.utils.simplejson') will break. I found this already with
django-piston.

I think we at least need a bigger section in the release notes about this.
Re: json vs simplejson Alex Ogier 6/12/12 5:19 AM
On Jun 12, 2012 6:54 AM, "Luke Plant" <L.Pla...@cantab.net> wrote:
> I've found the same difference of behaviour on both a production machine
> where I'm running my app (CentOS machine, using a virtualenv, Python
> 2.7.3), and locally on my dev machine which is currently running Debian,
> using the Debian Python 2.7.2 packages.
>
> In both cases, json is always returning unicode objects, which implies I
> don't have the C extensions for the json module according to your
> analysis. I don't know enough about how this is supposed to work to
> understand why.
>

I'm not sure why no one is getting speedups from simplejson, but I can
tell you that on python 2.6+ django.utils.simplejson.loads should be
an alias for json.loads:

>>> import json
>>> json.loads('{"a":"b"}')
{u'a': u'b'}
>>> from django.utils import simplejson
>>> simplejson.loads('{"a":"b"}')
{u'a': u'b'}
>>> json.loads == simplejson.loads
True

Best,
Alex Ogier
Re: json vs simplejson Alex Ogier 6/12/12 5:28 AM
On Tue, Jun 12, 2012 at 7:19 AM, Luke Plant <L.Pla...@cantab.net> wrote:
>
> There is another issue I found.
>
> Django's DateTimeAwareJSONEncoder now subclasses json.JSONEncoder
> instead of simplejson.JSONEncoder. The two are not perfectly compatible.
> simplejson.dumps() passes the keyword argument 'namedtuple_as_object' to
> the JSON encoder class that you pass in, but json.JSONEncoder doesn't
> accept that argument, resulting in a TypeError.
>
> So any library that uses Django's JSONEncoder subclasses, but uses
> simplejson.dumps() (either via 'import simplejson' or 'import
> django.utils.simplejson') will break. I found this already with
> django-piston.
>

Wait, 'import simplejson' works? Then that explains your problems. You
are using a library you installed yourself that has C extensions,
instead of the system json. If you switch to a system without
simplejson installed, then you should see the "proper" behavior from
django.utils.simplejson.loads(). If your program depends on some
optimized behavior of the C parser such as returning str instances
when it finds ASCII, it is bugged already on systems without
simplejson. If Django depends on optimized behavior, then it is a bug,
and a ticket should be filed.

Best,
Alex Ogier
Re: json vs simplejson Luke Plant 6/12/12 5:49 AM
On 12/06/12 13:28, Alex Ogier wrote:

> Wait, 'import simplejson' works? Then that explains your problems. You
> are using a library you installed yourself that has C extensions,
> instead of the system json. If you switch to a system without
> simplejson installed, then you should see the "proper" behavior from
> django.utils.simplejson.loads(). If your program depends on some
> optimized behavior of the C parser such as returning str instances
> when it finds ASCII, it is bugged already on systems without
> simplejson. If Django depends on optimized behavior, then it is a bug,
> and a ticket should be filed.

I agree my existing program had a bug. I had simplejson installed
because a dependency pulled it in (which means it can be difficult to
get rid of).

The thing I was flagging up was that the release notes say "You can
safely change any use of django.utils.simplejson to json." I'm just
saying the two differences I've found probably warrant at least some
documentation.

The second issue is difficult to argue as a bug in my program or
dependencies. Django has moved from a providing a JSONEncoder object
that supported a certain keyword argument to one that doesn't. We could
'fix' it to some extent:

class DjangoJSONEncoder(json.JSONEncoder):
    def __init__(self, *args, **kwargs):
        kwargs.pop('namedtuple_as_object')
        super(DjangoJSONEncoder, self).__init__(*args, **kwargs)

But like that, it would create more problems if the json module ever
gained that keyword argument in the future.
Re: json vs simplejson Alex Ogier 6/12/12 6:14 AM
On Tue, Jun 12, 2012 at 8:49 AM, Luke Plant <L.Pla...@cantab.net> wrote:
>
> I agree my existing program had a bug. I had simplejson installed
> because a dependency pulled it in (which means it can be difficult to
> get rid of).
>
> The thing I was flagging up was that the release notes say "You can
> safely change any use of django.utils.simplejson to json." I'm just
> saying the two differences I've found probably warrant at least some
> documentation.
>
> The second issue is difficult to argue as a bug in my program or
> dependencies. Django has moved from a providing a JSONEncoder object
> that supported a certain keyword argument to one that doesn't. We could
> 'fix' it to some extent:
>
> class DjangoJSONEncoder(json.JSONEncoder):
>    def __init__(self, *args, **kwargs):
>        kwargs.pop('namedtuple_as_object')
>        super(DjangoJSONEncoder, self).__init__(*args, **kwargs)
>
> But like that, it would create more problems if the json module ever
> gained that keyword argument in the future.
>

Like loads(), json.JSONEncoder is just an alias for
simplejson.JSONEncoder, and we need to support versions of simplejson
down to 1.9 which is what python 2.6 ships with. This
'namedtuple_as_object' thing seems to only appear as of simplejson
2.2, which means that depending on it is a bug that appears on any
system without a recent version of simplejson (for example, the
version that was bundled with Django doesn't support it). Depending on
this kwarg is a bug in Django, and should be fixed.

https://github.com/simplejson/simplejson/blob/namedtuple-object-gh6/simplejson/encoder.py

It's clear that people have begun to depend on the quirky ways in
which simplejson diverged from its earlier codebase. I found the place
where that unicode "proper behavior" was fixed, so apparently in
Python's stdlib they undid the C optimizations at some point. So I was
incorrect earlier, and the C speedups work "properly" with Python
stdlib's patch.

http://bugs.python.org/issue11982

Basically, anyone who depended on features of simplejson added after
1.9, or its wonky optimizations, already had arguably broken code in
that it only worked when simplejson is installed. I'm torn as to
whether we should add a note about these subtle problems when
switching to json, recommend that people switch to simplejson instead,
or undeprecate django.utils.simplejson as a necessary wart (we can
still stop vendoring simplejson though).

Best,
Alex Ogier