[Django] #17816: UnicodeEncodeError in Image- and FileFields

26 views
Skip to first unread message

Django

unread,
Mar 2, 2012, 1:24:50 PM3/2/12
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
--------------------------------------+--------------------
Reporter: andi@… | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Forms | Version: 1.3
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+--------------------
Uploading files (and images) containing non-ASCII characters (e.g. German
umlauts) with a form containing an ImageField or FileField causes a
UnicodeEncodeError on recent versions of Ubuntu servers. Curiously this
does not happen on the development-server or older Debian 5 servers.

In order to avoid this error and non-ASCII characters in URLs I'd like to
suggest a built-in (optional) conversion of the filename in the Image-
and/or FileField class (or corresponding base class).

{{{
class MyImageField(ImageField):

def __init__(self, *args, **kwargs):
super(MyImageField, self).__init__(*args, **kwargs)

def clean(self, *args, **kwargs):
data = super(MyImageField, self).clean(*args, **kwargs)
filename = os.path.splitext(data.name)
data.name = unicode(data.name)
if len(filename[1]):
data.name += u'.'+slugify(filename[1])
return data
}}}
where slugify is e.g. the slugify function from
django.template.defaultfilters

--
Ticket URL: <https://code.djangoproject.com/ticket/17816>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Mar 2, 2012, 2:41:50 PM3/2/12
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: 1.3
Component: Forms | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by jezdez):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

What version of Ubuntu did you use? Which version of Python? Which of
Django? What kind of deployment did you use?

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:1>

Django

unread,
Mar 3, 2012, 6:42:08 AM3/3/12
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: 1.3
Component: Forms | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by andi@…):

The answers to your questions:
Ubuntu 10.04
Python 2.6.5
Django 1.3.1
Deployment with mod_wsgi

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:2>

Django

unread,
Mar 4, 2012, 7:44:59 AM3/4/12
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: 1.3
Component: Forms | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by andi@…):

Another elegant solution woud be:



{{{
class MyImageField(ImageField):

def __init__(self, *args, **kwargs):
super(MyImageField, self).__init__(*args, **kwargs)

def clean(self, *args, **kwargs):
data = super(MyImageField, self).clean(*args, **kwargs)

data.name = data.name.encode('ascii','ignore')
return data
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:3>

Django

unread,
Mar 4, 2012, 8:01:17 AM3/4/12
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: 1.3
Component: Forms | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by aaugustin):

If this is a bug in Django, it must be fixed, not worked-around with a
data-destroying operation.

It may also be a problem in your setup (environment, locale, etc.)

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:4>

Django

unread,
Mar 4, 2012, 8:06:55 AM3/4/12
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: 1.3
Component: Forms | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by anonymous):

I would not call it a "bug", as it was working absolutely fine on Debian
Lenny. I've tried to locate the problem with help of the Django users
group, but I found no way of getting it to work on a Ubuntu server.

In my particular case I'm uploading images for a gallery app. And having
nice URL-friendly filenames would be a "feature".
Having non-ASCII characters in an URL (even if allowed) can get quite
messy, especially when server and browser speak different dialects, e.g.
de_DE.utf8 and de_DE.latin1.

So, please look at this "bug report" as a feature request.

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:5>

Django

unread,
Mar 4, 2012, 8:25:59 AM3/4/12
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: 1.3
Component: Forms | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by akaariai):

* cc: anssi.kaariainen@… (added)


Comment:

Isn't this the nice "you must set your locale correctly for your web-
server user" error. Something like this:
http://stackoverflow.com/questions/6171278/unicode-in-django-admin. If you
Google for this there are lot more similar errors to be found.

I, too, was hit with this issue some time ago. I have two suggestions:
- Issue a warning when Django is ran in non-UTF8 environment. Granted,
this will be hidden in the log files, but still gives developers a chance
to fix this before bug-reports from production. This is one of those bugs
which are hard to spot in testing...
- When the unicode error happens in file saving, convert it to a more
explanatory one. Link to documentation explaining this issue if possible.

You could also issue an warning always when a file save operation gets an
unicode string and the server is not in UTF-8 locale setting, even if
there isn't any actual error.

Of course, adding more warnings about this in the documentation is one way
forward, too.

I am not too sure if the optional conversion is a good solution. The
problem here is that you will still get the error in production. You will
not remember to check that option _before_ you are hit with this, and
having it default to on is not going to work. If there are enough users
who want that option then why not, but it will not magically solve this
problem.

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:6>

Django

unread,
Mar 4, 2012, 8:31:33 AM3/4/12
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: 1.3
Component: Forms | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by aaugustin):

(I wrote this before reading Anssi's comment.)

Many developers would be surprised if Django automatically altered the
names of uploaded files, and it would be backwards incompatible, so we
won't do that.

Should we offer this feature as an option? "Normalizing" file names is
easy -- the snippet you pasted above shows how to do it -- but it's also a
matter of taste. For instance, one could prefer:

{{{
data.name = unicodedata.normalize('NFKD', data.name).encode('ascii',
'ignore')
}}}

Therefore, I'm reluctant to provide this feature in Django.

----

Still, I'm surprised that you got an `UnicodeEncodeError`. It may reveal a
bug in Django.

I suppose you already set
[https://docs.djangoproject.com/en/1.3/howto/deployment/modpython/#if-you-
get-a-unicodeencodeerror LANG and LC_ALL] correctly?

Can you provide a link to the django-users discussion?

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:7>

Django

unread,
Mar 4, 2012, 8:39:10 AM3/4/12
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: 1.3
Component: Forms | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by akaariai):

One more idea: would it be possible to actually try to set the locale to
UTF8 based one when the server is started and the locale isn't one
already? That would be a new setting: something like LOCALE='en_US.UTF8'.
The global_settings default would be None for "use whatever configured",
and the settings template would need to have OS-dependent LOCALE. There is
already a precedent: the TIME_ZONE setting alters os.environ...

The way to do this is use locale.setlocale(LC_ALL, wanted_locale) in
django.conf.__init__.py.

I haven't tried this idea, so I don't know if this really works, but maybe
worth a try in 1.5.

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:8>

Django

unread,
Mar 4, 2012, 8:47:57 AM3/4/12
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: 1.3
Component: Forms | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by andi@…):

Checking the locale was '''the''' first thing to check. No matter what I
did: UnicodeEncodeError.
The system, Apache and python (os.environ) all report locale to be as set
de_DE.UTF-8.

As I said, it was working on Debian 5 (Python 2.5, Django 1.3.1). It just
fails on Ubuntu (Python 2.6, Django 1.3.1; haven't tried other server OS).

This is the link to the group discussion: http://groups.google.com/group
/django-users/browse_thread/thread/444e92fffbac31ae

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:9>

Django

unread,
Mar 4, 2012, 2:03:20 PM3/4/12
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: 1.3
Component: Forms | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by claudep):

Could you test with Django 1.4? I'm quite sure at least one bug related
with Unicode file names has been fixed, but was unable to find the ticket.

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:10>

Django

unread,
Mar 6, 2012, 3:29:30 AM3/6/12
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: 1.3
Component: Forms | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by akaariai):

Seems like I was mistaken about this being the system locale mismatch.

I quickly checked the idea of altering the locale on process startup. In
short: seems like a bad idea. However improved error messages and warnings
could help other people solve locale mismatch issues as painlessly as
possible. Should I open another ticket for these improvements?

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:11>

Django

unread,
Mar 11, 2012, 3:17:04 PM3/11/12
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: closed
Cleanup/optimization | Version: 1.3
Component: Forms | Resolution: needsinfo
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by aaugustin):

* status: new => closed
* resolution: => needsinfo


Comment:

In order to rule out locale configuration issues, could you insert in a
view :

{{{
import locale
locales = "Current locale: %s %s -- Default locale: %s %s" %
(locale.getlocale() + locale.getdefaultlocale())
}}}

and echo the contents of `locales` in a template? (Do that on a test page
or in a HTML comment so it doesn't show up for regular users.)

I expect:
{{{
Current locale: None None -- Default locale: de_DE UTF8
}}}

This should be done on the same server that exhibits the problem, of
course.

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:12>

Django

unread,
Mar 13, 2012, 3:46:36 PM3/13/12
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: closed
Cleanup/optimization | Version: 1.3
Component: Forms | Resolution: needsinfo
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by andi@…):

Locale settings are all ok.
As I suspected it was a bug in the apache package of Ubuntu. I have
reverted to the original code and uploaded (after Apache restart) a file
containing a German umlaut. Everything went fine.

Sorry for bothering you with this issue...

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:13>

Django

unread,
Mar 7, 2014, 8:02:00 PM3/7/14
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody

Type: | Status: closed
Cleanup/optimization | Version: 1.3
Component: Forms | Resolution: needsinfo
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by clime7@…):

I have encountered a similar problem and I'd like to add some info for
this. Two things can cause unicode errors like this:

1) non-utf8 encoding returned by sys.getdefaultencoding() causes unicode
errors in cases like: str(unicode_string_with_accents), i.e. whenever
there is a conversion from unicode string to byte string without
explicitly specifying encoding like this:
str(unicode_string_with_accents.encode('utf-8')). However, ascii is
default for python 2 and it shouldn't be fiddled with so this is an
expected problem.

2) non-utf8 encoding returned by sys.getfilesystemencoding(). This should
on the other hand really return an utf-8 encoding because otherwise you
get unicode errors in cases like os.stat(unicode_string_with_accents). os
module looks at filesystem encoding when trying to interpret unicode
strings. And on linux file system encoding is inferred from locale of
python interpretter. Specifically there should be LANG=something.utf8 in
the environment.

I have resolved my problem by adding "env = LANG=en_US.utf8" to my
uwsgi.ini. I believe other people with this problem might need to do
something similar.

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:14>

Django

unread,
Nov 14, 2015, 12:02:21 PM11/14/15
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Forms | Version: 1.3
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham <timograham@…>):

In [changeset:"25b912abbe31fa440e702b5273c18cf74e2d6e0b" 25b912ab]:
{{{
#!CommitTicketReference repository=""
revision="25b912abbe31fa440e702b5273c18cf74e2d6e0b"
Fixed #17686, refs #17816 -- Added "Files" section to Unicode topic.

Thanks Fako Berkers for help with the patch.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:15>

Django

unread,
Nov 14, 2015, 12:03:23 PM11/14/15
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Forms | Version: 1.3
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham <timograham@…>):

In [changeset:"da20004a61f73a104bbac41ccb08b9f94f008171" da20004a]:
{{{
#!CommitTicketReference repository=""
revision="da20004a61f73a104bbac41ccb08b9f94f008171"
[1.8.x] Fixed #17686, refs #17816 -- Added "Files" section to Unicode
topic.

Thanks Fako Berkers for help with the patch.

Backport of 25b912abbe31fa440e702b5273c18cf74e2d6e0b from master
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:16>

Django

unread,
Nov 14, 2015, 12:03:24 PM11/14/15
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Forms | Version: 1.3
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham <timograham@…>):

In [changeset:"84006fda55ffcaf272ca4fcd4addf7874302e884" 84006fd]:
{{{
#!CommitTicketReference repository=""
revision="84006fda55ffcaf272ca4fcd4addf7874302e884"
[1.9.x] Fixed #17686, refs #17816 -- Added "Files" section to Unicode
topic.

Thanks Fako Berkers for help with the patch.

Backport of 25b912abbe31fa440e702b5273c18cf74e2d6e0b from master
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:17>

Django

unread,
Nov 14, 2015, 12:04:30 PM11/14/15
to django-...@googlegroups.com
#17816: UnicodeEncodeError in Image- and FileFields
-------------------------------------+-------------------------------------
Reporter: andi@… | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Forms | Version: 1.3
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* resolution: needsinfo => invalid


--
Ticket URL: <https://code.djangoproject.com/ticket/17816#comment:18>

Reply all
Reply to author
Forward
0 new messages