Unicode filenames and the standard os library

912 views
Skip to first unread message

Andreas

unread,
May 20, 2012, 12:36:12 PM5/20/12
to mod...@googlegroups.com
Hello,

I have come across an issue and I think it's a mod_wsgi issue.
First my setup:
Apache 2.2.14, mod_wsgi 3.3, python 2.6.5, django 1.3.1 on
Ubuntu Lucid 64.

I have a view which gets, with POST, a directory name, in order to create it.
After all the necessary security checks, the code is something like this:

PATH = '/home/andreas'
newDir = os.path.join(PATH, name)
if not os.path.exists(newDir):
    os.mkdir(newDir)

When name is a latin only characters string everything is fine,
but when name is unicode I am getting exceptions.

os.path.join works with no problem and newDIr is tested with:

>>type(newDir) is unicode
True

os.path.exists complains about : ascii codec can't decode etc....
when changed to: os.path.exists(u'%r'%newDir) it worked with no exceptions.
( strangely os.path.exists(unicode(newDir)) gives the same "ascii codec" exception !! )

os.mkdir does not work at all:
without any conversion it raises "ascii codec..."
with os.mkdir(u'%r'%newDir) it raises: "[Error 2]: Cannot find directory /home/andreas/τεστ" <-greek letters
(this is mkdir, it's not supposed to find it !!!)

Testing the above with the development server works flawlessly.
Also with cherrypy as a wsgi server everything runs ok.

That's why I have come to believe this is an apache+mod_wsgi problem.
But how can mod_wsgi change the behavior of a python function completely?
(Needless to say, when using the exact same commands at a python shell everything runs smoothly.)

Thank you for your time reading this.
If you need other tests, please let me know. I would be glad to help figuring this out.

Best regards,
Andreas

PS
django is started with graham's script:
http://blog.dscpl.com.au/2010/03/improved-wsgi-script-for-use-with.html


Deron Meranda

unread,
May 20, 2012, 1:41:36 PM5/20/12
to mod...@googlegroups.com
What does sys.getfilesystemencoding() return? Both from the python
interactive command line, and from within your web app.

Under most (modern) Linux's it should return 'UTF-8'. However if it
returns 'ascii' (which is most likely the fault of your Apache config,
not modwsgi), then you'll have to explicitly do the encoding, e.g.,

os.mkdir( newDir.encode('UTF-8') )


Under my setup (Fedora linux) running in modwsgi seems just fine with
Unicode path names.
--
Deron Meranda
http://deron.meranda.us/

Andreas

unread,
May 20, 2012, 3:43:31 PM5/20/12
to mod...@googlegroups.com
Hi Deron,
thank you for your answer.
You are correct.
While every other combination (interpreter, cherrypy etc)
returns UTF-8, the apache+mod_wsgi returns: ANSI_X3.4-1968
(which, in fact, I haven't ever seen it again !!)

So it must be an apache config problem?
And how can I change it?
(don't tell me I have to compile from sources :-(  )

Best regards,
Andreas

Deron Meranda

unread,
May 20, 2012, 4:06:27 PM5/20/12
to mod...@googlegroups.com
> ... the apache+mod_wsgi returns: ANSI_X3.4-1968
> (which, in fact, I haven't ever seen it again !!)

That's just the "official" name for what is commonly called "ASCII".


> So it must be an apache config problem?
> And how can I change it?

Python determines it's encoding defaults from environment variables;
and if unset will usually revert to the compiled-in default of ascii.
And since modwsgi is essentially Python embedded into Apache, those
variables must be set by Apache; or into the boot scripts that start
Apache.

So it appears as if your Linux distro's Apache configuration is
probably not setting your locale environment variables. I'm not
familiar with Ubuntu Lucid's Apache package. Generally you want the
$LANG and $LC_ALL variables set, such as:

LC_ALL='en_US.UTF-8'
LANG='en_US.UTF-8'

(Replacing the en_US with whatever your locale language specifier is).


You may want to look at this old posting....

https://groups.google.com/forum/?fromgroups#!topic/modwsgi/MRsMc9yehBI

Andreas

unread,
May 20, 2012, 4:30:48 PM5/20/12
to mod...@googlegroups.com
Thank you Deron very much for your input.
I 'll see to it right now.

Best regards,
Andreas
Reply all
Reply to author
Forward
0 new messages