Python 3: should we apply unicode_literals everywhere?

1,444 views
Skip to first unread message

Aymeric Augustin

unread,
Aug 21, 2012, 6:46:47 AM8/21/12
to django-d...@googlegroups.com
Hello,

The first steps of porting Django to Python 3 was to switch on
unicode_literals, as explained here [1]. This change was discussed in
ticket #18269 [2] and committed in changeset 4a103086d5 [3].

This changeset added `from __future__ import unicode_literals` only
where necessary, ie. in modules that contained explicit unicode
literals. This choice absolutely makes sense. Switching
unicode_literals on everywhere in Django would have resulted in tons
of b"" prefixes and/or in an incredible number of changes. Both
options were unrealistic.

However, it has an unfortunate side effect. In master, some modules
have unicode_literals and others don't. I find myself constantly
checking which mode is in effect.

So we have two options at this point.

(1) The status quo

Pros:
- less work in the short term
- avoiding the cons of solution (2)

Cons:
- check-top-of-file syndrom
- different behavior in Python 2 and Python 3 in some modules
- different behavior between some modules and others (eg.
moving code isn't safe)
- cognitive overhead

(2) Progressively turn unicode_literals on throughout the codebase. If
we do it in small steps, it becomes easier to ensure that the change
from str literals to unicode literals doesn't result in regressions on
Python 2. That's how we handled the entire Python 3 port and it worked
well — several regressions were quickly caught and fixed.

Pros:
- consistent codebase, easier to maintain in the long term
- avoiding the cons of solution (1)

Cons:
- "native strings" have to be expressed as str("...") in
modules that need them
- more changes, higher risk of regressions on Python 2

In my opinion, option (2) is a logical move at this point. However I
believe it deserves a public discussion (or at least an explanation).
What do you think?

Best regards,

--
Aymeric.


[1] https://docs.djangoproject.com/en/dev/topics/python3/#unicode-literals
[2] https://code.djangoproject.com/ticket/18269
[3] https://github.com/django/django/commit/4a103086d5c67fa4fcc53c106c9fdf644c742dd8

Anssi Kääriäinen

unread,
Aug 21, 2012, 7:02:37 AM8/21/12
to Django developers
On 21 elo, 13:46, Aymeric Augustin
I did some benchmark runs some time ago, and it seems the
unicode_literals caused a small performance regression in many
queryset related benchmarks. The only one I have available is this:
http://users.tkk.fi/~akaariai/djbench/queryannotate.html

I remembered doing the benchmarks when reading this post, and thought
to mention this. The regression is small and I don't have any ideas
how to solve them. It might be it is just a testing artefact. So, this
is in no way a complaint against unicode_literals, just something I
though to share.

BTW if there happens to be some unused hardware available I could
automate such benchmarks as above. The hardware needs to be dedicated,
and a virtual machine will not do. Benchmarking on shared/virtual
machine will lead to inaccurate results. However the performance of
the HW isn't important at all, actually an older machine might be
better for this purpose...

For the actual question: I vote we move every non-empty file to
unicode_literals. If we use smaller steps we can bisect breakages
easier.

- Anssi

Daniel Sokolowski

unread,
Aug 21, 2012, 9:50:13 AM8/21/12
to django-d...@googlegroups.com
Hi Aymeric, I prefer the eventual resulting consistency of option 2 and
less gotchas when coding; thanks for asking.
> well � several regressions were quickly caught and fixed.
>
> Pros:
> - consistent codebase, easier to maintain in the long term
> - avoiding the cons of solution (1)
>
> Cons:
> - "native strings" have to be expressed as str("...") in
> modules that need them
> - more changes, higher risk of regressions on Python 2
>
> In my opinion, option (2) is a logical move at this point. However I
> believe it deserves a public discussion (or at least an explanation).
> What do you think?
>
> Best regards,
>


--
Daniel Sokolowski
Web Engineer
Danols Web Engineering
http://webdesign.danols.com/
Office: 613-817-6833
Fax: 613-817-5340
Toll Free: 1-855-5DANOLS
Kingston, ON K7L 1H3, Canada


Notice of Confidentiality:
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review re-transmission dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error please contact the sender immediately by return electronic transmission and then immediately delete this transmission including all attachments without copying distributing or disclosing same.

Adrian Holovaty

unread,
Aug 21, 2012, 5:32:19 PM8/21/12
to django-d...@googlegroups.com
On Tue, Aug 21, 2012 at 5:46 AM, Aymeric Augustin
<aymeric....@polytechnique.org> wrote:
> In my opinion, option (2) is a logical move at this point. However I
> believe it deserves a public discussion (or at least an explanation).
> What do you think?

I prefer option 2 as well, because it seems like the Right Thing To
Do. Of course, there's no rush to do everything -- we can just nibble
off bits here and there.

I'll have some free time soon and would be happy to help out migrating
code. (Relatively) mindless refactoring like this is one of my
favorite things to do. :-)

Adrian

Simon Meers

unread,
Aug 21, 2012, 6:03:57 PM8/21/12
to django-d...@googlegroups.com
It's a shame we couldn't skip straight to Python 3.3 and take
advantage of PEP414...
> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>

VernonCole

unread,
Aug 22, 2012, 11:06:37 AM8/22/12
to django-d...@googlegroups.com
That seems to me (in my dark status as a lurker here) to be a brilliant idea.
It is already established practice to say something like: "version 1.n of django requires 2.m or later of Python".
The practice then would change to: "version 1.n of django requires 2.m of Python or 3.3 or later".
I see from reading the text of PEP414 that there is an import hook available to make this feature also work in Python 3.2.
Would there be any advantage to requiring support for older versions of Python 3?  I can't think of any.
Python 3.3 will be an established thing long before a django version using it gains production status. We developers can use hooks and beta versions.
--
Vernon Cole

Mikhail Korobov

unread,
Aug 22, 2012, 11:26:26 AM8/22/12
to django-d...@googlegroups.com
Python 3.2 is a default python in Ububtu 12.04 LTS so I think Python 3.2 support is pretty important. 

And what are the gains of having "u" prefixes all over the codebase? This makes the codebase less Python3-like. With PEP414-based code there must be explicit "b" and explicit "u" prefixes all over the code; the sweet unprefixed variant will be reserved for "naive strings" which are rarely useful. With unicode_literals explicit "b" prefix is needed if byte strings and explicit "str(foo)" call is needed for "native strings"; unicode strings are implicit and default in both Python 2.x code and Python 3.x code. This is more porting work but I think it is more rewarding because it leads to a cleaner code.

среда, 22 августа 2012 г., 21:06:37 UTC+6 пользователь VernonCole написал:

Aymeric Augustin

unread,
Aug 22, 2012, 12:49:03 PM8/22/12
to django-d...@googlegroups.com
> 2012/8/22 VernonCole <verno...@gmail.com>:
>
> On Tuesday, August 21, 2012 4:03:57 PM UTC-6, DrMeers wrote:
>>
>> It's a shame we couldn't skip straight to Python 3.3 and take
>> advantage of PEP414...
>
> That seems to me (in my dark status as a lurker here) to be a brilliant
> idea.


Well, this point is moot as far as Django is concerned: we already
went through the effort of removing the `u` prefixes!

However I'd like to explain why this PEP is at odds with the porting
philosophy I've applied to Django, and why I would have vetoed taking
advantage of it.

I believe that aiming for a Python 2 codebase with Python 3
compatibility hacks is a counter-productive way to port a project. You
end up with all the drawbacks of Python 2 (including the legacy `u`
prefixes) and none of the advantages Python 3 (especially the sane
string handling).

Working to write Python 3 code, with legacy compatibility for Python
2, is much more rewarding. Of course it takes more effort, but the
results are much cleaner and much more maintainable. It's really about
looking towards the future or towards the past.

I understand the reasons why PEP 414 was proposed and why it was
accepted. It makes sense for legacy software that is minimally
maintained. I hope nobody puts Django in this category!

--
Aymeric.

Vinay Sajip

unread,
Aug 24, 2012, 10:53:12 AM8/24/12
to django-d...@googlegroups.com
I would also prefer Option 2, as the places where str('...') are needed are not all that many.

Regards,

Vinay Sajip

Felipe Prenholato

unread,
Aug 30, 2012, 4:08:22 PM8/30/12
to django-d...@googlegroups.com
Here at PDG (Brazil) we are migrating our software to Djang 1.4 and already using unicode_literals. I can count in my fingers places that I needed to use 'b' for byte code string (most on settings.py).

In my experience, maintain byte code strings isn't that hard and we should than go to option 2.

Felipe 'chronos' Prenholato.
Linux User nº 405489
Home page: http://devwithpassion.com | http://chronosbox.org/blog
GitHub: http://github.com/chronossc/ | Twitter: http://twitter.com/chronossc


2012/8/24 Vinay Sajip <vinay...@yahoo.co.uk>
I would also prefer Option 2, as the places where str('...') are needed are not all that many.

Regards,

Vinay Sajip

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/WLtnInRyKyAJ.
Reply all
Reply to author
Forward
0 new messages