os.listdir() not returning unicode

1,013 views
Skip to first unread message

Marc

unread,
Nov 20, 2011, 12:57:13 PM11/20/11
to Django users
Hello,

I have an application that uses os.listdir(u'some path') that should
return a list of unicode strings.

This works fine as long as I use the manage.py runserver server.

As soon as I deploy on Apache with mod_wsgi, I get the Caught
UnicodeDecodeError while rendering: 'ascii' codec can't decode byte
0xc3 in position 15: ordinal not in range(128) for the same piece of
code.

I discovered that the problem is that the os.listdir() does not return
unicode but byte strings although the requested path is unicode.

What could be wrong please ?

Python 2.7.2
Linux 3.1.0-4-pae
Apache/2.2.21 (Unix) mod_ssl/2.2.21 OpenSSL/1.0.0e DAV/2 PHP/5.3.8
with Suhosin-Patch mod_wsgi/3.3 Python/2.7.2


Calvin Spealman

unread,
Nov 20, 2011, 3:22:27 PM11/20/11
to django...@googlegroups.com
You may get more appropriate responses on the python-list mailing
list, as this is not really a django question, but a python one.

http://mail.python.org/mailman/listinfo/python-list

> --
> You received this message because you are subscribed to the Google Groups "Django users" group.
> To post to this group, send email to django...@googlegroups.com.
> To unsubscribe from this group, send email to django-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
>
>

--
Read my blog! I depend on your acceptance of my opinion! I am interesting!
http://techblog.ironfroggy.com/
Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

Petr Přikryl

unread,
Nov 20, 2011, 5:14:43 PM11/20/11
to django...@googlegroups.com

I disagree. I think this (here) is a good place to ask. I am a beginner in Django,
but I am advanced in Python. The truth is that os.listdir() with unicode
argument should return unicode strings (and it really does when
executed at console -- which is also the case of running local server
through manage.py). This is likely the problem of Apache or something
related, and asking here is closer to the source than asking in Python groups.

______________________________________________________________
> Od: "Calvin Spealman"
> Komu: <django...@googlegroups.com>
> Datum: 20.11.2011 21:23
> Předmět: Re: os.listdir() not returning unicode

Marc

unread,
Nov 21, 2011, 2:05:29 AM11/21/11
to Django users
OK I found a clue regarding that issue:
print(sys.getfilesystemencoding()) at the console prompt produces
'UTF-8' whereas in the failing script ' ANSI_X3.4-1968'. I now
understand why the conversion fails (in that case Python returns a
byte string and this is not documented in Python doc).

Is the problem in mod_wsgi ?

Thanks

Tom Evans

unread,
Nov 21, 2011, 6:30:23 AM11/21/11
to django...@googlegroups.com

No.

sys.getfilesystemencoding()
Return the name of the encoding used to convert Unicode filenames into
system file names, or None if the system default encoding is used. The
result value depends on the operating system:

On Mac OS X, the encoding is 'utf-8'.
On Unix, the encoding is the user’s preference according to the result
of nl_langinfo(CODESET), or None if the nl_langinfo(CODESET) failed.

It returns what you have configured it to return. If you want Apache
(and hence, mod_wsgi and then your python interpreter) to handle the
file system as UTF-8, then configure it to do so by setting the LANG
environment variable, either in an Apache startup script (envvars
exists for this purpose), or with SetEnv in the main conf file.

Cheers

Tom

Marc

unread,
Nov 22, 2011, 7:33:11 AM11/22/11
to Django users
Thanks Tom for your post.

I had read an interesting thread on that subject on
groups.google.com/group/modwsgi/browse_thread/thread/
ac729cc408ca516b/
but could get any success trying to put some locale related
environment
variables in /usr/sbin/envvars. In fact the only required variable to
return the proper encoding is to set export LC_LANG="fr_FR.UTF-8"
(LC_CTYPE="fr_FR.UTF-8" does not work).

Python Doc regarding sys.getfilesystemencoding() says the value is
from
nl_langinfo(CODESET). man nl_langinfo says that this value is related
to
LC_CTYPE

It seems to me that there should be some improvement in both mod_wsgi
doc and Python doc (not to mention Django doc) regarding that aspect.

This problem is now solved for me, but I spent a considerable amount
of
time to solve it. I will post the problem to mod_wsgi and Python
lists.

Thanks again

Tom Evans

unread,
Nov 22, 2011, 7:46:52 AM11/22/11
to django...@googlegroups.com

Yep, I agree 100%. It's nonsensical that a file system these days
doesn't have an associated encoding, or that how files are presented
to a user depends upon one environment variable (or the absence of
one). It seems crazy to me that you can write a perfectly correct
python app that generates unicode and have it fail because LANG is
undefined and python assumes that you want ASCII - what year is this,
1985?.

Character encodings drive me batty!

Cheers

Tom

Reply all
Reply to author
Forward
0 new messages