charset encoding and unicode

smcoll

unread,

Feb 17, 2012, 11:04:22 AM2/17/12

to django-rest-framework

Returning to the issue at
http://groups.google.com/group/django-rest-framework/browse_thread/thread/f1aa2a7e7575f2de/a188fd1967498d2c
- some recap:

Given a value with special characters like 'someñame', the current
implementation of DocumentingTemplateRenderers will all display this
instead for the whole object:

[%d bytes of binary content]

This is because the check used to validate the object is a test
whether it is printable or not:
https://github.com/tomchristie/django-rest-framework/blob/master/djangorestframework/renderers.py#L223

It was suggested to modify the DocumentingTemplateRenderer class (see
https://gist.github.com/1849267) and that has worked fine. i'd like
to see that adopted in the codebase, so i don't have to maintain my
own copy.

Also, the JSONRenderer will display that value as 'somei\u00f1ame'.
It was suggested to make a JSON rendered where ensure_ascii is false
(to make this easy to override, i offer the following: https://gist.github.com/1849103)
but even then, the value will render as 'someÃ±ame', which is still
incorrect. i'm not sure what to do there.

If you think these issues warrant a fork and pull request, let me
know.

shannon

Tom Christie

unread,

Feb 17, 2012, 12:29:32 PM2/17/12

to django-res...@googlegroups.com

> If you think these issues warrant a fork and pull request, let me know.

Yes. Absolutely they do.

I've looked into it, and it's slightly complicated, tho, there's several different things that ought to happen.

I think there's up to four seperate pull reqs needed here:

Firstly:

The escaping in DocumentingTemplateRenderer should definitely be refactored into an `escape_binary` method.

That'll let it be overridden as needed. I don't think it needs a separate boolean attribute, just refactoring out the method would be fine.

Secondly:

We need to differentiate between binary and string content in a more intelligent way.

Renderers that return string content should *always* return unicode.

Renderers that return binary content should return bytes. (Or str, which is the same thing in Python 2.x)

I think the template render will already do that. The json, jsonp, and yaml don't. I'm not sure about the xml, I don't think it does.

So...

Fix json: Add `ensure_ascii=False` in `dumps()`, wrap the entire result of dumps() up in 'unicode(dumps(...))'

Fix jsonp: Add 'u' prefix on the returned string, and fix json as above.

Fix yaml: Add encoding=None in `safe_dump()`.

Fix xml: Not totally sure.

Thirdly:

After that we can fix up the escape_binary test, so that it does not escape if `isinstance(obj, unicode)`, and it does otherwise.

Finally:

When we render the response, we ought to be setting the charset if it's a unicode object (and not otherwise).

https://github.com/tomchristie/django-rest-framework/blob/master/djangorestframework/mixins.py#L257

Something like:

def get_content_type(content, mimetype):

if isinstance(content, unicode):

return "%s; %s" % (response.media_type, settings.DEFAULT_CHARSET)

else:

return response.media_type

And generating the final response with:

content_type = self.get_content_type(content, response.media_type)

HttpResponse(content, status=response.status, content_type=content_type)

I would love a set of pull requests addressing those! (Or a single pull req I guess)

smcoll

unread,

Feb 28, 2012, 11:55:20 AM2/28/12

to django-rest-framework

Finally getting to this. i'd like to start with some tests, and
wanted to ask your preference for how i might integrate them.

i figure i could add a test in djangorestframework/tests/renderers.py
that compared an object to the output, such as you have with
_flat_repr for the JSON tests. i'm not sure where to put it- should i
add one to the test for each renderer? i could do something like
change _flat_repr to '{"foo": ["bar", "baz", "niña"]}' but i'm not
sure if i like that because character encoding seems like a distinct
issue.

Aside from testing the one character (ñ) that gave me trouble, i
wonder if there's a more standard way to test unicode rendering.

> https://github.com/tomchristie/django-rest-framework/blob/master/djan...

Tom Christie

unread,

Feb 28, 2012, 1:40:18 PM2/28/12

to django-res...@googlegroups.com

Wouldn't sweat it too much. Testing against one character, just against the JSONRenderer would be a good start.

smcoll

unread,

Feb 28, 2012, 2:32:21 PM2/28/12

to django-rest-framework

OK.

> Thirdly:
>
> After that we can fix up the escape_binary test, so that it does not escape
> if `isinstance(obj, unicode)`, and it does otherwise.

Are you saying we could change the _escape_binary method to something
like this?

def _escape_binary(self, content):
if isinstance(content, unicode):
return content
else:
return '[%d bytes of binary content]' % len(content)

smcoll

unread,

Feb 28, 2012, 3:14:38 PM2/28/12

to django-rest-framework

Not familiar enough with unicode to manage any proper tests. But how
does this look to you?

https://github.com/smcoll/django-rest-framework/commit/fa8efcb15cd2993f817f9d68d6468e575ddafe48

Tom Christie

unread,

Mar 15, 2012, 9:28:29 AM3/15/12

to django-res...@googlegroups.com

Looks good to me.

poswald

unread,

Mar 17, 2012, 11:48:39 PM3/17/12

to django-rest-framework

Does this fix the issue with non-latin characters being formatted as
'somei\u00f1ame' in the browsable UI? Since most of our api users are
using non-latin chars, I would love to see this come out decoded for
them at least in the web UI and I'd be willing to test it out as well.

On Mar 15, 10:28 pm, Tom Christie <christie....@gmail.com> wrote:
> Looks good to me.
>
>
>
>
>
>
>
> On Tuesday, 28 February 2012 20:14:38 UTC, smcoll wrote:
>
> > Not familiar enough with unicode to manage any proper tests. But how
> > does this look to you?
>

> >https://github.com/smcoll/django-rest-framework/commit/fa8efcb15cd299...

Tom Christie

unread,

Mar 18, 2012, 1:09:16 PM3/18/12

to django-res...@googlegroups.com

Yup it's intended to deal with properly encoding outputs both with the browse-able HTML and with other representations.

Tom Christie

unread,

Mar 18, 2012, 1:10:32 PM3/18/12

to django-res...@googlegroups.com

Last thing really is that it needs one or two simple tests. (Eg check that a unicode snowman character gets properly rendered both in HTML and in JSON.)

Max Arnold

unread,

Mar 29, 2012, 12:54:45 PM3/29/12

to django-rest-framework, arnold...@gmail.com

On 29 фев, 03:14, smcoll <smc...@gmail.com> wrote:
> Not familiar enough with unicode to manage any proper tests. But how
> does this look to you?
>

> https://github.com/smcoll/django-rest-framework/commit/fa8efcb15cd299...

I believe these two lines must be removed:

if not all(char in string.printable for char in content):

return '[%d bytes of binary content]'

(see https://github.com/max-arnold/django-rest-framework/commit/a86c547c960c73daad401f0218f1e42f41349c5a)

Max Arnold

unread,

Mar 29, 2012, 1:46:21 PM3/29/12

to django-rest-framework

On Thu, Mar 29, 2012 at 09:54:45AM -0700, Max Arnold wrote:
> On 29 фев, 03:14, smcoll <smc...@gmail.com> wrote:
> > Not familiar enough with unicode to manage any proper tests. But how
> > does this look to you?

Also these two commits actually fix content-type charset:

https://github.com/max-arnold/django-rest-framework/commit/7171129eb58246b308247122c03a0d3b3cb65e7f
https://github.com/max-arnold/django-rest-framework/commit/54b5104f715e4ef44911c7e8f3c5b0dca8d98c6c

I'm not sure if "else" branch of get_content_type is actually required:

def get_content_type(self, content, mimetype):
if isinstance(content, unicode):
return "%s; charset=%s" % (mimetype, settings.DEFAULT_CHARSET)
else:
return mimetype

Reply all

Reply to author

Forward