Should ugettext_lazy return instanceof unicode? Or are reusable apps responsible for calling force

Mike Edmunds

unread,

Oct 21, 2016, 5:33:02 PM10/21/16

to Django developers (Contributions to Django itself)

A user has reported an issue with a Django reusable app I maintain, where they're passing my app a ugettext_lazy object that I ultimately pass to the requests package. Because instanceof(lazystr, unicode) is False, requests and urllib.urlencode mis-handle the text, leading to a UnicodeEncodeError or an incorrectly-encoded query string.

I'm wondering what the right fix for this is:

Should the result of ugettext_lazy somehow inherit from unicode? If indeed the "result of a ugettext_lazy() call can be used wherever you would use a unicode string" (docs), then it would seem to be a problem that isinstance(lazystr, unicode) isn't true. Both requests and urllib.urlencode (also py3 urllib.parse.urlencode) use instanceof tests to detect text strings. (And unfortunately, that's probably a lot more common than duck-typing in this case.)
Or should my reusable app be calling force_text on everything it might receive from its callers before passing on to other packages? Essentially saying, lazy strings are really only valid while inside the Django world, and a (currently-undocumented) responsibility of reusable apps is to convert all lazy strings before handing them off to other (non-Django) python code.
Or should I just be telling my app's users to call force_text themselves if they're using ugettext_lazy? (Not thrilled with this idea, as missing it can lead to very subtle errors. See the 'p4' example below. And this might warrant a clarification to "... can be used wherever you would use a unicode string..." in the docs.)
Or...?

Here's an (extremely pared-down) example demonstrating the specific problem:

# My reusable app passes several string params from the caller to requests:
import requests
def my_reusable_app(params):
    return requests.post('http://example.com', params=params)

# Code in the calling app:
from django.utils.translation import ugettext, ugettext_lazy
response = my_reusable_app({
    'p1': u"alpha\u0391", # works correctly
    'p2': ugettext(u"beta\u0392"), # works correctly
    'p3': ugettext_lazy(u"gamma\u0393"), # requests: UnicodeEncodeError "in position 0" (!)
    'p4': ugettext_lazy(u"ASCII"), # urlencode: generates "p4=A&p4=S&p4=C&p4=I&p4=I" rather than "p4=ASCII"
})

print(response.request.url)

The UnicodeEncodeError in p3 results from requests.models._encode_params not realizing the ugettext_lazy object is unicode, and failing to encode it to utf-8 before handing off to urlencode.

If you comment p3 out, the exception goes away, but urlencode fails to realize the p4 ugettext_lazy object is text, and incorrectly encodes it as a sequence of individual character params.

[Above is all Python 2.7, but also applies to python3; substitute "str" wherever I wrote "unicode". Django 1.8--1.10, and probably others.]

Thanks for any advice. Happy to take a shot at proposing doc changes, if that's the right answer.

Mike

Moritz Sichert

unread,

Oct 22, 2016, 9:13:12 AM10/22/16

to django-d...@googlegroups.com

> 1. Should the result of ugettext_lazy somehow inherit from unicode? If indeed

> the "result of a ugettext_lazy() call can be used wherever you would use a
> unicode string"

I don't think making isinstance(lazy_str, unicode) return True would really fix
things, as it will probably break somewhere deeper then. In essence I think this
boils down to the other libs not duck-typing "correctly". However it is
definitely worth mentioning those limitations in the docs.

> 2. Or should my reusable app be calling force_text on /everything/ it might

> receive from its callers before passing on to other packages? Essentially
> saying, lazy strings are really only valid while inside the Django world,
> and a (currently-undocumented) responsibility of reusable apps is to convert
> all lazy strings before handing them off to other (non-Django) python code.

You don't need force_text() for that, calling str(my_lazy_str) is enough (or
six.text_type(my_lazy_str) if you want to support Python 2.7).
I think this would be actually the best duck-typing approach.

> 3. Or should I just be telling my app's users to call force_text themselves if

> they're using ugettext_lazy? (Not thrilled with this idea, as missing it can
> lead to very subtle errors. See the 'p4' example below. And this might
> warrant a clarification to "... can be used wherever you would use a unicode
> string..." in the docs.)

I don't think this error is subtle, I mean the function name tells you exactly
that it is lazy. I would say it is the responsibility of the programmer using
ugettext_lazy() to transform it to a string when using libraries that know
nothing about Django. Still, I'd say a reusable Django app should probably be
able to deal with lazy strings.

The best "fix" in my opinion is to add a note in the docs.

Moritz

signature.asc

gilberto dos santos alves

unread,

Oct 23, 2016, 10:00:02 AM10/23/16

to django-d...@googlegroups.com

hi. IMHO here single line on your app and code like

# -*- coding: utf-8 -*-

put it explicity in calling app of your source code.

or could use (for python 2.7)
[1]

https://docs.python.org/2/library/codecs.html#module-codecs

.ps i used this with python-sphinx and solve lot of issues with pt-BR strings.

--
gilberto dos santos alves
+55(11)9-8646-5049
sao paulo - sp - brasil

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/0ad8b57f-d158-438c-b822-ca791577ba34%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Raphael Michel

unread,

Oct 23, 2016, 12:38:58 PM10/23/16

to Mike Edmunds, django-d...@googlegroups.com

Hello,

Am Fri, 21 Oct 2016 13:49:16 -0700 (PDT)
schrieb Mike Edmunds <medm...@gmail.com>:
> 1. Should the result of ugettext_lazy somehow inherit from
> unicode?

I believe this would break giant measures of code out there that use
"not isinstace(lazystr, unicode)" exectly to detect that it is a lazy
string and not a regular one.

Cheers
rami

Mike Edmunds

unread,

Oct 24, 2016, 8:13:31 PM10/24/16

to Django developers (Contributions to Django itself), medm...@gmail.com

Thanks for the helpful responses.

A more-succinct statement of the underlying issue:

ugettext_lazy proxies len() and other methods of unicode, but not __class__. So isinstance(ugettext_lazy(), unicode) is False.
urlencode in the Python standard library liberally mixes duck typing and isinstance type-testing. As a result, it misinterprets ugettext_lazy() objects as a sequences, rather than text strings. (And right or wrong, there are many cases of isinstance text-type detection in Python library code -- and in other popular packages.)
My reusable Django app is caught in the middle. (I don't control the calling code that's using ugettext_lazy, and I certainly don't control the Python standard library code.)
The calling code can be forgiven for assuming that this should work, because the ugettext_lazy docs state it can be used "wherever you would use a unicode string ... in Python code."

Per Moritz's comments, it sounds like really the only practical resolution is adding a note to the docs. I'll open a ticket/PR to update the ugettext_lazy docs, clarifying that the result is not necessarily usable "wherever" you can use unicode.

Cheers,

Mike

P.S., FWIW, making isinstance(ugettext_lazy(), unicode) return True does, in fact, seem to solve this particular problem. Here's what it might look like, with tests. But as Raphael Michel points out, there's likely a lot of code out there that's depending on the opposite behavior. I found at least one example in Django itself (in that linked patch). The change also breaks pickling lazy objects in Python 2. And even if there's some way to fix that, given the age of ugettext_lazy, any changes to it are likely to cause all kinds of downstream problems.

Reply all

Reply to author

Forward

Should ugettext_lazy return instanceof unicode? Or are reusable apps responsible for calling force_text a lot?

Mike Edmunds

Moritz Sichert

gilberto dos santos alves

Raphael Michel

Mike Edmunds