Using python manage.py shell might shed more light, I fear the tool here is assuming an incorrect bytestring encoding and getting in the way.
I cannot recreate anything like what you are seeing. I have a model Thing stored in a MySQL DB (using a utf-8 encoded table) with CharField name. There are two instances of this Thing in the DB that contain für in the name. From a python manage.py shell, using Django 1.1.1:
>>> from ttt.models import Thing
>>> import django
>>> django.get_version()
'1.1.1'
>>> ufur = u'f\u00fcr'
>>> print ufur
für
>>> ufur
u'f\xfcr'
>>> ufur.encode('utf-8')
'f\xc3\xbcr'
>>> ufur.encode('iso-8859-1')
'f\xfcr'
small-u with umlaut is U+00FC, encoded in utf-8 that takes 2 bytes C3BC, encoded in iso-8859-1 it is the 1 byte FC.
Filtering with icontains, using either the Unicode object or the utf-8 encode bytestring version, works properly:
>>> Thing.objects.filter(name__icontains=ufur)
[<Thing: für inserted as unicode>, <Thing: für inserted as utf8 bytestring>]
>>> Thing.objects.filter(name__icontains=ufur.encode('utf-8'))
[<Thing: für inserted as unicode>, <Thing: für inserted as utf8 bytestring>]
Attempting to filter with an iso-8859-1 encoded bytestring raises an error:
>>> Thing.objects.filter(name__icontains=ufur.encode('iso-8859-1'))
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/lib/python2.5/site-packages/django/db/models/manager.py", line 129, in filter
return self.get_query_set().filter(*args, **kwargs)
File "/usr/lib/python2.5/site-packages/django/db/models/query.py", line 498, in filter
return self._filter_or_exclude(False, *args, **kwargs)
File "/usr/lib/python2.5/site-packages/django/db/models/query.py", line 516, in _filter_or_exclude
clone.query.add_q(Q(*args, **kwargs))
File "/usr/lib/python2.5/site-packages/django/db/models/sql/query.py", line 1675, in add_q
can_reuse=used_aliases)
File "/usr/lib/python2.5/site-packages/django/db/models/sql/query.py", line 1614, in add_filter
connector)
File "/usr/lib/python2.5/site-packages/django/db/models/sql/where.py", line 56, in add
obj, params = obj.process(lookup_type, value)
File "/usr/lib/python2.5/site-packages/django/db/models/sql/where.py", line 269, in process
params = self.field.get_db_prep_lookup(lookup_type, value)
File "/usr/lib/python2.5/site-packages/django/db/models/fields/__init__.py", line 214, in get_db_prep_lookup
return ["%%%s%%" % connection.ops.prep_for_like_query(value)]
File "/usr/lib/python2.5/site-packages/django/db/backends/__init__.py", line 364, in prep_for_like_query
return smart_unicode(x).replace("\\", "\\\\").replace("%", "\%").replace("_", "\_")
File "/usr/lib/python2.5/site-packages/django/utils/encoding.py", line 44, in smart_unicode
return force_unicode(s, encoding, strings_only, errors)
File "/usr/lib/python2.5/site-packages/django/utils/encoding.py", line 92, in force_unicode
raise DjangoUnicodeDecodeError(s, *e.args)
DjangoUnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: unexpected end of data. You passed in 'f\xfcr' (<type 'str'>)
This is because Django assumes the bytestring is utf-8 encoded, and runs into trouble attempting to convert to unicode specifying utf-8 as the string's encoding, since it is not valid utf-8 data.
The only way I have been able to recreate anything like what you are describing is to incorrectly construct the original unicode object from a utf-8 bytestring assuming a iso-8859-1 encoding:
>>> badufur = ufur.encode('utf-8').decode('iso-8859-1')
>>> badufur
u'f\xc3\xbcr'
>>> print badufur
für
>>> print badufur.encode('utf-8')
für
>>> print badufur.encode('iso-8859-1')
für
Using that unicode object doesn't produce any hits in the DB:
>>> Thing.objects.filter(name__icontains=badufur)
[]
But encoding it to iso-8859-1 does, because that has the effect of restoring the original utf-8 bytestring:
>>> Thing.objects.filter(name__icontains=badufur.encode('iso-8859-1'))
[<Thing: für inserted as unicode>, <Thing: für inserted as utf8 bytestring>]
However, the debug info you show above doesn't show an incorrectly-built unicode object, so I'm very confused by it.
Karen