charset encoding for json responses

1,613 views
Skip to first unread message

smcoll

unread,
Sep 20, 2011, 10:55:03 AM9/20/11
to django-rest-framework
Presently, special characters are escaped in json responses, for
example, `\u00f1` instead of `ñ`. How can i get a utf-8 charset
encoding instead of ascii on those responses?

Tom Christie

unread,
Sep 20, 2011, 1:01:29 PM9/20/11
to django-res...@googlegroups.com
Looks like a similar question here: http://trac.turbogears.org/ticket/1480#comment:4

In which case I think the right approach would be to provide an alternative JSONRenderer, that doesn't force the output to ascii.


And plug that into your views using .renderers = (UnicodeJSONRenderer, ...others... )

(I've not tested this in the slightest!)

Let me know if that works out.  In either case it's a good case for some slight tweaking of the JSONRenderer class to allow for slightly easier overriding of it's behavior.

Tom Christie

unread,
Sep 20, 2011, 1:02:32 PM9/20/11
to django-res...@googlegroups.com
(Oops meant to link to comment #6 "In fact TurboJson creates only ascii because it uses simplejson with default parameters, which means ensure_ascii=True.")

smcoll

unread,
Sep 20, 2011, 5:56:31 PM9/20/11
to django-rest-framework
That's helpful. One thing i found is that if i replace the default
JSONRenderer with the UnicodeJSONRenderer, the _get_content() method
of DocumentingTemplateRenderer chokes because the rendered content is
not deemed "printable": i get `[%d bytes of binary content]`. This is
becase `content` is no longer a string in that case, and has no
printable() method. What is the best way to update the _get_content()
method? If i omit that check altogether, html/xhtml looks good but
txt looks like it could still use the escaping. Perhaps we could
implement a django.utils.encoding method on `content` and force
escaping in the case of DocumentingTemplateRenderer?

Tom Christie

unread,
Sep 21, 2011, 4:23:28 AM9/21/11
to django-res...@googlegroups.com
> What is the best way to update the _get_content() method?

How about if we split lines 201-202 of renderers.py into a separate method to allow for more easier overriding of the  behavior?

It's pretty clear that the printable check isn't quite really the right thing to be doing in any case, but it'd make for a better internal API this way around.

It might be better if that check escaped string objects, but not unicode objects, and we ensure that all the existing renderers return unicodes, not strings.  Then return bytes('abc') and return u'abc' would work as expected.
If the object is of neither type then a check of getattr(obj, 'printable', False) could be the default.

> If i omit that check altogether, html/xhtml looks good but txt looks like it could still use the escaping.

If you change the media_type on the DocumentingPlainTextRenderer to 'text/plain; charset="utf-8"', does that fix the text rendering?
I think the charset probably ought to be getting set throughout - the renderers probably need a 'charset' on them and some slightly improved behavior in the HttpResponse generation, to append that into the final Content-Type.





smcoll

unread,
Sep 21, 2011, 10:58:22 AM9/21/11
to django-rest-framework
The DocumentingPlainTextRenderer works well with "text/plain;
charset="utf-8" using a unicode renderer as the renderer for
_get_content(). And it makes sense to me that the renderers return
unicodes rather than strings by default.

BTW, since DocumentingTemplateRenderer is not in `__all__`, i can't
subclass it in another module.



On Sep 21, 4:23 am, Tom Christie <christie....@gmail.com> wrote:
> > What is the best way to update the _get_content() method?
>
> How about if we split lines 201-202 of renderers.py<https://github.com/tomchristie/django-rest-framework/blob/master/djan...> into
> a separate method to allow for more easier overriding of the  behavior?
> Eg:https://gist.github.com/1231497
>
> It's pretty clear that the printable check isn't quite really the right
> thing to be doing in any case, but it'd make for a better internal API this
> way around.
>
> It might be better if that check escaped string objects, but not unicode
> objects, and we ensure that all the existing renderers return unicodes, not
> strings.  Then return bytes('abc')<http://stackoverflow.com/questions/7320696/since-when-does-the-bytes-...> and

Tom Christie

unread,
Sep 23, 2011, 8:22:56 AM9/23/11
to django-res...@googlegroups.com
> The DocumentingPlainTextRenderer works well with "text/plain; 
> charset="utf-8" using a unicode renderer as the renderer for 
> _get_content().  And it makes sense to me that the renderers return 
> unicodes rather than strings by default. 

Cool, thanks for the feedback.

> BTW, since DocumentingTemplateRenderer is not in `__all__`, i can't subclass it in another module. 

Shouldn't be the case.  As far as I understand it, __all__ only affects what get's exposed if you do from ... import *,
you should still be able to import it explicitly.

smcoll

unread,
Sep 26, 2011, 6:08:41 PM9/26/11
to django-rest-framework
> Shouldn't be the case.  As far as I understand it, __all__ only affects what
> get's exposed if you do from ... import *,
> you should still be able to import it explicitly.

Oh, i must have had a typo on the import or something.

blp

unread,
Mar 25, 2013, 8:21:31 PM3/25/13
to django-res...@googlegroups.com
Hi,

Sorry to bump up such an old topic, but I have encountered this as well and wanted to ask something.
I have managed to solve it in a similiar manner to what you have described here:
1. Created my own UnicodeJSONRenderer (The one you linked didn't work, either it is out-dated or I just don't know how to use it properly). Its basically the same one as in renderers.py, just with ensure_ascii=False.
2. Created my own BrowsableAPIUnicodeRenderer and overloaded the "get_content" function and changed the content validation to a method described here.
    That question discusses removing the characters, but with a minor change - replaced "sub" with "match" - I switched it to detect non-printable characters.
    I was then able to display unicode characters with both .api and .json formats.

My question is: Are there any plans on making this available in the main release? I understand that the majority of users are probably going to write a website
that is completely in English but this seems so easy to fix for such a great value.

Tom Christie

unread,
Mar 30, 2013, 3:03:30 AM3/30/13
to django-res...@googlegroups.com
> My question is: Are there any plans on making this available in the main release?

I'd be more than happy to accept a pull request that dealt with this.

The current 'u'-escaped representation is valid JSON, and we wouldn't want to change the *default* style for backwards compat reasons, but if this could be turned on by flipping a flag on JSONRenderer that'd be great.

blp

unread,
Apr 5, 2013, 10:58:38 AM4/5/13
to django-res...@googlegroups.com
I'm pretty new to the whole open-source community thing, I'd love to help but I have no idea what "pull reuqests" are or how to make them :)
I liked that gist website you guys use, so I posted my unicode renderers here:
I hope you'll find them adequate :)
I have tried to find a way to allow this with just a switch but I'm not familiar enough with the source to find where. Anyway, thats alright in my opinion, just add those Unicode Renderers and let the user decide which one to add to his "renderer_classes".
Reply all
Reply to author
Forward
0 new messages