Unicode-branch: testers wanted

96 views
Skip to first unread message

Malcolm Tredinnick

unread,
May 24, 2007, 9:06:12 AM5/24/07
to django...@googlegroups.com
Hi folks,

The unicode branch, [1], is now at a point where it is essentially
feature-complete and could do with a bit of heavy testing from the wider
community.

So if you have some applications that work against Django's current
trunk and would like to try them out on the unicode branch, I'd
appreciate your efforts. The porting effort should be very minimal
(almost zero, in many cases).

For code that is only meant to work with ASCII data, there are probably
no changes required at all. For code that is meant to work with all
kinds of input (essentially, arbitrary strings), there are a few quick
porting steps required.

See [2] for the short list (5 steps, maximum!) of changes you might need
to make. For more detailed information, have a read through the
unicode.txt document in the docs/ directory of the branch.

Any bugs you find should be filed in Trac. Put "[unicode]" at the start
of the summary title so that I can search for them later. No need to put
any special keywords or anything like that in (the "version" field
should be set to "other branch", if you remember).

A couple of things to watch out for when you're testing:

(A) Strings that seem to mysteriously disappear, but when you
examine the source, you see something like
"<django.utils.functional.__proxy__ object at 0x2aaaaf87a750>".
These shouldn't be too common and will mostly be restricted to
places like the admin interface that do introspection.

(B) Translations that happen too early. If you have translations
available and use your app in a language that is different from
the LANGUAGE_CODE setting, watch out for any strings that are
translated into LANGUAGE_CODE, instead of your current locale.
This is a sign that ugettext() is being used somewhere that
ugettext_lazy() should be used.

(C) If you're using Python 2.3, look for strings that don't make
much sense when printed. That is a sign that a bytestring is
being used where a unicode string was needed (not your fault;
it's an oversight in Django). Python 2.3 has some
"interesting" (I could use nastier words) behaviour when it
tries to interpolate non-string objects into unicode strings (it
doesn't call the __unicode__ method!!) and we have to work
around them explicitly. I think I've got most of them, but I'll
bet I have overlooked some.

Most bugs that people are finding at the moment fit into one of these
categories and they are very easy to fix once we find them. I've tried
to nail most of them in advance, but you can probably imagine how
exciting it is to read every line of source code and try to find all the
strings that are in a precise form that need changing. My attention may
have drifted from time to time.

Have realistic expectations about this branch, too. It is meant to be as
close to 100% backwards-compatible as we can make it. So, for example,
usernames still have to use normal ASCII alphabetic characters, etc.
Similarly, the slugify filter still behaves as it did before. At some
point it will be extended to handle a _few_ more non-ASCII characters,
but it's never going to be a full transliteration function. They are the
two big items I expect people would otherwise try to extend beyond what
is intended. There may be others and I'm sure we'll discover what they
are as the questions pop up.

[1] http://code.djangoproject.com/wiki/UnicodeBranch
[2]
http://code.djangoproject.com/wiki/UnicodeBranch#PortingApplicationsTheQuickChecklist

Regards,
Malcolm

sandro dentella

unread,
May 24, 2007, 12:13:53 PM5/24/07
to Django users
Hi Malcom,

I really welcome this branch and thank you all for the effort.

Before I consider a bug what follows I'd ask if this should entitle
me to use
non ASCII letters in tests with test.client.

I tried something like self.client.get(url, dict(name=u'F\xf2')) to
get back
an error from urlparse

File "/misc/src/django/branches/unicode/django/test/client.py", line
196, in get
r = {
File "/usr/lib/python2.4/urllib.py", line 1162, in urlencode
v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf2' in
position 1: ordinal not in range(128)

sandro
*:-)

alex.v...@gmail.com

unread,
May 24, 2007, 3:26:43 PM5/24/07
to Django users
Thanks for your work! Definitely will test new branch soon & will tell
the results.

Ivan Sagalaev

unread,
May 24, 2007, 4:24:43 PM5/24/07
to django...@googlegroups.com
Malcolm Tredinnick wrote:
> The unicode branch, [1], is now at a point where it is essentially
> feature-complete and could do with a bit of heavy testing from the wider
> community.

Switched my site today to the branch. Works like a charm (translations,
admin, multilingual content).

Malcolm Tredinnick

unread,
May 24, 2007, 7:06:22 PM5/24/07
to django...@googlegroups.com

Excellent stuff. That's a bug. Python's urllib.quote_plus doesn't handle
unicode characters (with some reasonably good reasons) and calling str()
on anything is not such a hot idea any longer. So Django has it's own
django.utils.html.urlquote_plus() function that we can use as a
replacement. Not sure how I overlooked that one, but I'll fix it
shortly.

Regards,
Malcolm

Malcolm Tredinnick

unread,
May 25, 2007, 3:26:10 AM5/25/07
to django...@googlegroups.com
On Thu, 2007-05-24 at 09:13 -0700, sandro dentella wrote:

This should be fixed in [5338].

Regards,
Malcolm


Sandro Dentella

unread,
May 25, 2007, 4:10:27 AM5/25/07
to django...@googlegroups.com
> > File "/usr/lib/python2.4/urllib.py", line 1162, in urlencode
> > v = quote_plus(str(v))
> > UnicodeEncodeError: 'ascii' codec can't encode character u'\xf2' in
> > position 1: ordinal not in range(128)
>
> This should be fixed in [5338].


well... you already know it works like a charm!
grazie!

*:-)

David Larlet

unread,
May 25, 2007, 4:31:46 AM5/25/07
to django...@googlegroups.com
2007/5/24, Malcolm Tredinnick <mal...@pointy-stick.com>:

>
> Hi folks,
>
> The unicode branch, [1], is now at a point where it is essentially
> feature-complete and could do with a bit of heavy testing from the wider
> community.

Thank you so much for this branch!

> Similarly, the slugify filter still behaves as it did before. At some
> point it will be extended to handle a _few_ more non-ASCII characters,
> but it's never going to be a full transliteration function. They are the
> two big items I expect people would otherwise try to extend beyond what
> is intended. There may be others and I'm sure we'll discover what they
> are as the questions pop up.

Why don't we use the slughifi function:
http://amisphere.com/contrib/python-django/ ?
I already use it to replace the django one and it's very useful (at
least for french titles).

David

Malcolm Tredinnick

unread,
May 25, 2007, 4:44:31 AM5/25/07
to django...@googlegroups.com

If the author wanted to contribute that under a new-BSD license, we
could use something like that, certainly (at least as the Python
replacement; the Javascript enhancement should probably be smaller so
that we don't have to ship so much data around, but that's a minor
issue). There's been a lot of work put into that table of mappings,
which is the bit we can really use.

If you are the author, or you know the author and think they want to
submit it for inclusion, please open a ticket in Trac. We really only
need the table of mappings.

Regards,
Malcolm


Sam

unread,
May 25, 2007, 5:20:57 AM5/25/07
to Django users
Most of the table mapping is taken from a GPL project.

I've just emailed the authors to see if they would relicense the file
to include it inside django.

I'll update as soon as i have their replies.

On 25 mai, 10:44, Malcolm Tredinnick <malc...@pointy-stick.com> wrote:
> On Fri, 2007-05-25 at 10:31 +0200, David Larlet wrote:

> > 2007/5/24, Malcolm Tredinnick <malc...@pointy-stick.com>:

Malcolm Tredinnick

unread,
May 25, 2007, 5:27:21 AM5/25/07
to django...@googlegroups.com
On Fri, 2007-05-25 at 09:20 +0000, Sam wrote:
> Most of the table mapping is taken from a GPL project.
>
> I've just emailed the authors to see if they would relicense the file
> to include it inside django.
>
> I'll update as soon as i have their replies.

Thanks. We only need the mapping table. We'd want to rewrite most of the
function for stylistic, consistency and correctness reasons anyway.

Regards,
Malcolm


Michael Radziej

unread,
May 25, 2007, 11:17:09 AM5/25/07
to django...@googlegroups.com
Hi Malcolm,

A short disclaimer: I'm currently trying the unicode branch with the autoescape patch and a
couple of other patches, so my problems might really be my own problems,
but I don't expect it.


First, I found that I have a problem with commit 5255 together with the test
client. It breaks loading the modules, probably due to recursive imports.

- management activates translation
- this loads all apps
- One of my apps loads the test Client (I'm use a different testing
framework that uses the django test client)
- test client loads contrib.session
- the model meta class starts translation in contribute_to_class
- this loads all apps --> doesn't work

I moved the import statement in my app into the function --> works.

I suggest to change the test client so that it imports other models
only in a function and not at compile time.

-*-

Second, I have a map of view tags, verbose names for these and how to build
the url (it was born before the regex reverser). This map uses gettext_lazy
for the verbose names, which is used later with the % operator. This fails
because

In [44]: "%s" % gettext_lazy("Dienste")
Out[44]: '<django.utils.functional.__proxy__ object at 0xb70dacac>'

With proper unicode objects, though, it works:

In [45]: u"%s" % ugettext_lazy("Dienste")
Out[45]: u'Services'

(It really requires both that the pattern is unicode and that ugettext_lazy is
used and not gettext_lazy)

I'm now working to work around this, but it's a lot of replacements from
"gettext_lazy" --> "ugettext_lazy" and also to promote all the patterns to
unicode.

I wonder, can this be changed so that it works the old way, too? This seems
to be related with commit 5239:


@@ -32,6 +32,8 @@ def lazy(func, *resultclasses):
self.__dispatch[resultclass] = {}
for (k, v) in resultclass.__dict__.items():
setattr(self, k, self.__promise__(resultclass, k, v))
+ if unicode in resultclasses:
+ setattr(self, '__unicode__', self.__unicode_cast)

def __promise__(self, klass, funcname, func):
# Builds a wrapper around some magic method and registers that
magic
@@ -47,6 +49,9 @@ def lazy(func, *resultclasses):
self.__dispatch[klass][funcname] = func
return __wrapper__

+ def __unicode_cast(self):
+ return self.__func(*self.__args, **self.__kw)
+
def __wrapper__(*args, **kw):
# Creates the proxy object, instead of the actual value.
return __proxy__(args, kw)


this makes unicode() work for the proxies, but not str(). I tried to add a
similar hook for str(), but I failed (and I really don't understand how all
the various parts play together here ...)

That's for now, I'm still trying to get over this before I can start more
serious testing.

So long,

Michael

--
noris network AG - Deutschherrnstraße 15-19 - D-90429 Nürnberg -
Tel +49-911-9352-0 - Fax +49-911-9352-100
http://www.noris.de - The IT-Outsourcing Company

Vorstand: Ingo Kraupa (Vorsitzender), Joachim Astel, Hansjochen Klenk -
Vorsitzender des Aufsichtsrats: Stefan Schnabel - AG Nürnberg HRB 17689

Malcolm Tredinnick

unread,
May 25, 2007, 7:57:23 PM5/25/07
to django...@googlegroups.com
On Fri, 2007-05-25 at 17:17 +0200, Michael Radziej wrote:
> Hi Malcolm,
>
> A short disclaimer: I'm currently trying the unicode branch with the autoescape patch and a
> couple of other patches, so my problems might really be my own problems,
> but I don't expect it.
>
>
> First, I found that I have a problem with commit 5255 together with the test
> client. It breaks loading the modules, probably due to recursive imports.
>
> - management activates translation
> - this loads all apps
> - One of my apps loads the test Client (I'm use a different testing
> framework that uses the django test client)
> - test client loads contrib.session
> - the model meta class starts translation in contribute_to_class
> - this loads all apps --> doesn't work
>
> I moved the import statement in my app into the function --> works.
>
> I suggest to change the test client so that it imports other models
> only in a function and not at compile time.

I'd rather try and fix the root problem first, since having to order
your code in a particular way to avoid import problems is fragile.
Certainly needs to be looked at, though. Will do.


>
> -*-
>
> Second, I have a map of view tags, verbose names for these and how to build
> the url (it was born before the regex reverser). This map uses gettext_lazy
> for the verbose names, which is used later with the % operator. This fails
> because
>
> In [44]: "%s" % gettext_lazy("Dienste")
> Out[44]: '<django.utils.functional.__proxy__ object at 0xb70dacac>'

This is an issue in lazy() that is very hard to fix, because __str__ is
used for so many things in Python, I'm not going to call it a bug; it's
just unbelievably annoying.

>
> With proper unicode objects, though, it works:
>
> In [45]: u"%s" % ugettext_lazy("Dienste")
> Out[45]: u'Services'
>
> (It really requires both that the pattern is unicode and that ugettext_lazy is
> used and not gettext_lazy)
>
> I'm now working to work around this, but it's a lot of replacements from
> "gettext_lazy" --> "ugettext_lazy" and also to promote all the patterns to
> unicode.

I'll have a look at this. Commit 5239 is absolutely required.. I think
we may need to make a Promise-variant for translations only so that we
can make __str__ work properly for them, too (we can't do it in general,
because their are non-translation-related places where lazy() is used
and I don't want to break __str__ for them).

I don't feel too bad about people have to move gettext_lazy to
ugettext_lazy (it's the 21st century, global search and replace has
existed for 30 years), but the promotion to unicode strings can take a
few minutes, agreed.

Regards,
Malcolm


Malcolm Tredinnick

unread,
May 26, 2007, 3:40:41 AM5/26/07
to django...@googlegroups.com
On Fri, 2007-05-25 at 17:17 +0200, Michael Radziej wrote:
> Hi Malcolm,
>
> A short disclaimer: I'm currently trying the unicode branch with the autoescape patch and a
> couple of other patches, so my problems might really be my own problems,
> but I don't expect it.
>
>
> First, I found that I have a problem with commit 5255 together with the test
> client. It breaks loading the modules, probably due to recursive imports.
>
> - management activates translation
> - this loads all apps
> - One of my apps loads the test Client (I'm use a different testing
> framework that uses the django test client)
> - test client loads contrib.session
> - the model meta class starts translation in contribute_to_class
> - this loads all apps --> doesn't work
>
> I moved the import statement in my app into the function --> works.
>
> I suggest to change the test client so that it imports other models
> only in a function and not at compile time.

Fixed at the source of the problem (django.db.models.options) in [5345].
At least, I'm pretty sure that will fix it. Let me know if the problem
persists (and why, because then it's not as you describe).

>
> -*-
>
> Second, I have a map of view tags, verbose names for these and how to build
> the url (it was born before the regex reverser). This map uses gettext_lazy
> for the verbose names, which is used later with the % operator. This fails
> because
>
> In [44]: "%s" % gettext_lazy("Dienste")
> Out[44]: '<django.utils.functional.__proxy__ object at 0xb70dacac>'
>
> With proper unicode objects, though, it works:
>
> In [45]: u"%s" % ugettext_lazy("Dienste")
> Out[45]: u'Services'
>
> (It really requires both that the pattern is unicode and that ugettext_lazy is
> used and not gettext_lazy)
>
> I'm now working to work around this, but it's a lot of replacements from
> "gettext_lazy" --> "ugettext_lazy" and also to promote all the patterns to
> unicode.

Fixed in [5344]. '%s' % gettext_lazy('Dienste') will do what you expect
now.

Regards,
Malcolm


itsnotvalid

unread,
May 27, 2007, 4:34:52 PM5/27/07
to Django users
I got an error when I am using admin interface to submit some forms as
following the Django book (CH6).

When the form in admin interface saving.
Traceback (most recent call last):
File "F:\python25\Lib\site-packages\django-svn\unicode\django\core
\handlers\base.py" in get_response
77. response = callback(request, *callback_args, **callback_kwargs)
File "F:\python25\Lib\site-packages\django-svn\unicode\django\contrib
\admin\views\decorators.py" in _checklogin
55. return view_func(request, *args, **kwargs)
File "F:\python25\Lib\site-packages\django-svn\unicode\django\views
\decorators\cache.py" in _wrapped_view_func
39. response = view_func(request, *args, **kwargs)
File "F:\python25\Lib\site-packages\django-svn\unicode\django\contrib
\admin\views\main.py" in add_stage
258. LogEntry.objects.log_action(request.user.id,
ContentType.objects.get_for_model(model).id, pk_value,
force_unicode(new_object), ADDITION)
File "F:\python25\Lib\site-packages\django-svn\unicode\django\utils
\encoding.py" in force_unicode
32. s = unicode(s)

UnicodeDecodeError at /admin/books/author/add/
'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in
range(128)

Not sure if that is only my n00b skillz kicking on, it should be a
direct result of the return string as here:

from django.db import models
# ...
class Author(models.Model):
salutation = models.CharField(maxlength=10)
first_name = models.CharField(maxlength=30)
last_name = models.CharField(maxlength=40)
email = models.EmailField()
headshot = models.ImageField(upload_to= ('/tmp'))
class Admin:
pass

def __unicode__(self):
return self.first_name # this string is what I am talking
about

I am using postgresSQL in utf-8 so I thought some non-ascii input
would pass through nicely in the admin interface. But it didn't.
The item would still be saved into the database and viewable in the
admin interface.
After some testing, only the one declared in return clause
(self.first_name in this case) being filled with utf-8 text would
generate this error. If all other fields is filled with utf-8 char and
not the one returned, it would go normally.
And putting "return unicode(self.first_name)" wouldn't help either.

Please have a look to this, and should I repost this as a ticket?

Regards,

Alan

Malcolm Tredinnick

unread,
May 27, 2007, 7:52:19 PM5/27/07
to django...@googlegroups.com

This suggests that self.first_name hasn't been converted to a unicode
string for some reason and is still a sequence of UTF-8 bytes. That
shouldn't be happening.

>
> Not sure if that is only my n00b skillz kicking on, it should be a
> direct result of the return string as here:
>
> from django.db import models
> # ...
> class Author(models.Model):
> salutation = models.CharField(maxlength=10)
> first_name = models.CharField(maxlength=30)
> last_name = models.CharField(maxlength=40)
> email = models.EmailField()
> headshot = models.ImageField(upload_to= ('/tmp'))
> class Admin:
> pass
>
> def __unicode__(self):
> return self.first_name # this string is what I am talking
> about
>
> I am using postgresSQL in utf-8 so I thought some non-ascii input
> would pass through nicely in the admin interface. But it didn't.
> The item would still be saved into the database and viewable in the
> admin interface.

This is certainly a bit odd and it should be working. I've hammered on
the admin interface quite a bit, saving and loading all kinds of weird
data and so have some other testers. Your model looks like it should be
perfect, too.

I'll have a look at this today when I can make some time.

Regards,
Malcolm


Michal

unread,
May 28, 2007, 3:53:50 AM5/28/07
to django...@googlegroups.com
Hello Malcolm,
I try to make tests on my application, but after update of unicode
branch (now I am synced to revision 5371), I am unable to do it due to
some errors:

michal@lentilka app $./manage.py test staticpages
Creating test database...
Creating table auth_message
Creating table auth_group
Creating table auth_user
Creating table auth_permission
Creating table django_content_type
Creating table django_session
Creating table django_site
Creating table django_admin_log
Creating table staticpages_staticpage
Creating table news_subscriber
Creating table news_new
Creating table news_tag
Creating table partners_partneruser
Creating table parameters_parameter
Creating table pressreleases_pressrelease


Traceback (most recent call last):

File "./manage.py", line 11, in ?
execute_manager(settings)
File
"/usr/local/lib/python2.4/site-packages/django/core/management.py", line
1678, in execute_manager
execute_from_command_line(action_mapping, argv)
File
"/usr/local/lib/python2.4/site-packages/django/core/management.py", line
1592, in execute_from_command_line
action_mapping[action](args[1:], int(options.verbosity))
File
"/usr/local/lib/python2.4/site-packages/django/core/management.py", line
1309, in test
failures = test_runner(app_list, verbosity)
File "/usr/local/lib/python2.4/site-packages/django/test/simple.py",
line 84, in run_tests
create_test_db(verbosity)
File "/usr/local/lib/python2.4/site-packages/django/test/utils.py",
line 118, in create_test_db
management.syncdb(verbosity, interactive=False)
File
"/usr/local/lib/python2.4/site-packages/django/core/management.py", line
537, in syncdb
_emit_post_sync_signal(created_models, verbosity, interactive)
File
"/usr/local/lib/python2.4/site-packages/django/core/management.py", line
464, in _emit_post_sync_signal
verbosity=verbosity, interactive=interactive)
File
"/usr/local/lib/python2.4/site-packages/django/dispatch/dispatcher.py",
line 358, in send
sender=sender,
File
"/usr/local/lib/python2.4/site-packages/django/dispatch/robustapply.py",
line 47, in robustApply
return receiver(*arguments, **named)
File
"/usr/local/lib/python2.4/site-packages/django/contrib/auth/management.py",
line 26, in create_permissions
ctype = ContentType.objects.get_for_model(klass)
File
"/usr/local/lib/python2.4/site-packages/django/contrib/contenttypes/models.py",
line 20, in get_for_model
model=key[1], defaults={'name': smart_unicode(opts.verbose_name_raw)})
File
"/usr/local/lib/python2.4/site-packages/django/db/models/options.py",
line 105, in verbose_name_raw
raw = unicode(self.verbose_name)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7:
ordinal not in range(128)


Regards
Michal

itsnotvalid

unread,
May 28, 2007, 5:09:51 AM5/28/07
to Django users
Michal, your error looks nearly identical to the one I encounter with
admin interface. Except the byte offending is not the same.

I guess there is something between the model class and database
adapters which adds those strange bytes.

Malcolm Tredinnick

unread,
May 28, 2007, 5:17:40 AM5/28/07
to django...@googlegroups.com

No, there's nothing like that. These are just standard ASCII decoding
errors that Python reports. A byte starting with 'C' in the high nibble
(e.g. Michal's 0xc3) is the start of a two-byte UTF-8 sequence,
something like your 0xe4 is the first byte of three byte UTF-8 sequence
(or possibly some non-UTF-8 bytes altogether, in both cases).

Regards,
Malcolm


Andreas Ahlenstorf

unread,
May 28, 2007, 5:27:30 AM5/28/07
to django...@googlegroups.com
Hi,

I'm having issues with the Unicode Branch and mod_python (the
development server ist working fine). That's what's comming from
mod_python:

Phase: 'PythonHandler'
Handler: 'django.core.handlers.modpython'

Traceback (most recent call last):

File "/usr/lib/python2.4/site-packages/mod_python/importer.py",
line 1537, in HandlerDispatch
default=default_handler, arg=req, silent=hlist.silent)

File "/usr/lib/python2.4/site-packages/mod_python/importer.py",
line 1229, in _process_target
result = _execute_target(config, req, object, arg)

File "/usr/lib/python2.4/site-packages/mod_python/importer.py",
line 1128, in _execute_target
result = object(arg)

File "/usr/lib/python2.4/site-packages/django/core/handlers/
modpython.py", line 177, in handler
return ModPythonHandler()(req)

File "/usr/lib/python2.4/site-packages/django/core/handlers/
modpython.py", line 163, in __call__
req.headers_out[key] = value

TypeError: table values must be strings

The checkout of the Unicode Branch is from yesterday evening (cannot
remember the revision number) and I'm using Apache 2.0.59, Python
2.4, MySQL-python-1.2.2 and MySQL 5.0.38. Is this a bug or am I doing
something wrong? Unfortunately, I wasn't able to figure it out.

Regards,
A.

Michal

unread,
May 28, 2007, 5:27:36 AM5/28/07
to django...@googlegroups.com

Hmm... Sorry, but I could't locate this problem (before my last update
to revivision 5371) tests fails too, but not so early (they fails when
fixtures was loadaded to DB). So, I think that there is some problem in
Django Unicode branch.

I have plan to:
1) update unicode branch to latest revision
2) run separate all of my tests
3) locate unicode problems

But now (after update), I couldn't run any of the test.

Malcolm Tredinnick

unread,
May 28, 2007, 6:37:54 AM5/28/07
to django...@googlegroups.com
On Mon, 2007-05-28 at 11:27 +0200, Michal wrote:
> Malcolm Tredinnick wrote:
> > On Mon, 2007-05-28 at 02:09 -0700, itsnotvalid wrote:
> >> Michal, your error looks nearly identical to the one I encounter with
> >> admin interface. Except the byte offending is not the same.
> >>
> >> I guess there is something between the model class and database
> >> adapters which adds those strange bytes.
> >
> > No, there's nothing like that. These are just standard ASCII decoding
> > errors that Python reports. A byte starting with 'C' in the high nibble
> > (e.g. Michal's 0xc3) is the start of a two-byte UTF-8 sequence,
> > something like your 0xe4 is the first byte of three byte UTF-8 sequence
> > (or possibly some non-UTF-8 bytes altogether, in both cases).
>
> Hmm... Sorry, but I could't locate this problem (before my last update
> to revivision 5371)

When you say "this problem", which problem do you mean? The one you
reported in your first email? Because in the email fragment you quoted
from me, there is no "problem" reported.

I'm very happy to help fix these errors people are seeing; they are all
small things and easy to nail with a good explanation, but you have to
help me help you: what is failing? Remember that there are about four
different sub-threads going on under this topic, so giving replying to
the right email so that the right replies thread together is going to be
useful, too.

If this is the problem you reported in your first email, the simplest
thing you can do to help is work out which model's verbose name is
causing problems. From reading your email, I suspect you have a UTF-8
string being used as a verbose_name somewhere (which is perfectly fine)
and it uses some codepoints outside the ASCII range. I've just finished
eating dinner and am about to try and test that theory, because my gut
feeling is that will cause a traceback, just from reading the code.

Assuming it's what I think it is, this will be fixed in about 30
minutes.

Regards,
Malcolm

Malcolm Tredinnick

unread,
May 28, 2007, 6:39:23 AM5/28/07
to django...@googlegroups.com

Aah.. I didn't think to check that (that's why other people are helping
with the tests .. thanks). That's easy enough to fix.

Regards,
Malcolm


Michal

unread,
May 28, 2007, 6:59:19 AM5/28/07
to django...@googlegroups.com

Sorry for confusions Malcolm.

My note was in relation with latest error (ie. I have problem with
execution of tests due to verbose_name error).

I am just after dinner too, so I will try to find what is wrong in my
application... :)


Once again, sorry for my obscure latest report and english.

Michal

Malcolm Tredinnick

unread,
May 28, 2007, 7:05:20 AM5/28/07
to django...@googlegroups.com
On Mon, 2007-05-28 at 12:59 +0200, Michal wrote:
[...]

>
> My note was in relation with latest error (ie. I have problem with
> execution of tests due to verbose_name error).
>
> I am just after dinner too, so I will try to find what is wrong in my
> application... :)
>
>
> Once again, sorry for my obscure latest report and english.

No worries. :-)

I think I've fixed this problem (non-ASCII bytestrings for verbose_name)
in [5372], which I've just committed.

Regards,
Malcolm

Michal

unread,
May 28, 2007, 7:16:58 AM5/28/07
to django...@googlegroups.com

I am just rewrite all my string like:

verbose_name='něco'
fields = (
(None, {'fields': ('title', 'slug', 'annotation', 'content',)}),
('Hiearchie', {'fields': ('parent', 'order')}),
('Pokročilé nastavení', {'fields': ('short_title','template_name',
'person', 'info_box', 'show_menu')}),
('Omezení přístupu na stránku', {'fields':
('registration_required', 'groups')}),
)
order = models.IntegerField("Pořadí", help_text="Pořadí stránky v
rámci sourozenců, tj. stránek které mají stejného rodiče.")

to:

verbose_name=u'něco'
fields = (
(None, {'fields': ('title', 'slug', 'annotation', 'content',)}),
(u'Hiearchie', {'fields': ('parent', 'order')}),
(u'Pokročilé nastavení', {'fields': ('short_title','template_name',
'person', 'info_box', 'show_menu')}),
(u'Omezení přístupu na stránku', {'fields':
('registration_required', 'groups')}),
)
order = models.IntegerField(u"Pořadí", help_text=u"Pořadí stránky v
rámci sourozenců, tj. stránek které mají stejného rodiče.")


I am also update my unicode branch to revision [5372] and now I get
another error messages:


michal@lentilka app $./manage.py test
Creating test database...

"/usr/local/lib/python2.4/site-packages/django/db/models/manager.py",
line 76, in get_or_create
return self.get_query_set().get_or_create(**kwargs)
File
"/usr/local/lib/python2.4/site-packages/django/db/models/query.py", line
280, in get_or_create
obj.save()
File
"/usr/local/lib/python2.4/site-packages/django/db/models/base.py", line
246, in save
','.join(placeholders)), db_values)
File
"/usr/local/lib/python2.4/site-packages/django/db/backends/postgresql/base.py",
line 54, in execute
return self.cursor.execute(smart_str(sql, self.charset),
self.format_params(params))
File
"/usr/local/lib/python2.4/site-packages/django/db/backends/postgresql/base.py",
line 51, in format_params
return tuple([smart_str(p, self.charset, True) for p in params])
File
"/usr/local/lib/python2.4/site-packages/django/utils/encoding.py", line
55, in smart_str
return s.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in

position 7: ordinal not in range(128)


I try to call all test (./manage.py test) and also each app by specifing
its name (ie. ./manage.py test staticpages). Everything ends with error
above.


Regards
Michal


Malcolm Tredinnick

unread,
May 28, 2007, 7:25:18 AM5/28/07
to django...@googlegroups.com

That sounds like your database client encoding is set to ASCII for some
reason, which isn't something Django is going to be able to handle.

Have a look in django/db/backends/postgresql/base.py, line 97, where is
says

cursor.execute("SHOW client_encoding")
encoding = ENCODING_MAP[cursor.fetchone()[0]]

and print out the value of encoding (maybe even assign cursor.fetchone()
to a temporary variable and print that out, too). That will at least
confirm that the problem is where we think it is.

If the client encoding is not set to something that can handle non-ASCII
characters, there is no hope of putting non-ASCII chars in there in the
first place. I'll have to look up how you configure that (maybe you can
do it in the meantime), but it's not something Django should touching, I
suspect.

I'm working on isnotvalid's problem right at the moment; I'll come back
to this once I've got that fixed (which will be soon).

Regards,
Malcolm


Michal

unread,
May 28, 2007, 7:55:53 AM5/28/07
to django...@googlegroups.com

You are right, the problem is in the database.

It seems like the test database is created in SQL_ASCII encoding. I
looked into psql terminal and found:

List of databases
Name | Owner | Encoding
-----------------+------------+-----------
gr4unicode | pgsql | UNICODE
test_gr4unicode | gr4unicode | SQL_ASCII

DB gr4unicode was created by me, manually:

CREATE DATABASE gr4unicode WITH ENCODING 'UNICODE';

Database test_gr4unicode was created dynamically by calling ./manage.py test

I don't know, how to tell to test framework to create database with
UNICODE charset... :(

> I'm working on isnotvalid's problem right at the moment; I'll come back
> to this once I've got that fixed (which will be soon).

Don't be in a hurry due to my problems! :) Primarily I would like help
you with unicode branch testing...

I am attached output "dump" of the latest call of ./manage.py test (I am
printing fetchone and encodings variables).

Regards
Michal

dump.txt

Malcolm Tredinnick

unread,
May 28, 2007, 7:56:26 AM5/28/07
to django...@googlegroups.com
On Sun, 2007-05-27 at 13:34 -0700, itsnotvalid wrote:
> I got an error when I am using admin interface to submit some forms as
> following the Django book (CH6).
>
> When the form in admin interface saving.
[... snip...]

>
> UnicodeDecodeError at /admin/books/author/add/
> 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in
> range(128)
>
> Not sure if that is only my n00b skillz kicking on, it should be a
> direct result of the return string as here:
>
> from django.db import models
> # ...
> class Author(models.Model):
> salutation = models.CharField(maxlength=10)
> first_name = models.CharField(maxlength=30)
> last_name = models.CharField(maxlength=40)
> email = models.EmailField()
> headshot = models.ImageField(upload_to= ('/tmp'))
> class Admin:
> pass
>
> def __unicode__(self):
> return self.first_name # this string is what I am talking
> about

That was a little bit of a tricky example. The problem only occurs when
a file upload field is included in the form, which is why I'd never seen
it before.

Should be fixed in [5373].

Regards,
Malcolm


Malcolm Tredinnick

unread,
May 28, 2007, 8:02:23 AM5/28/07
to django...@googlegroups.com
On Mon, 2007-05-28 at 13:55 +0200, Michal wrote:
[...]

>
> You are right, the problem is in the database.
>
> It seems like the test database is created in SQL_ASCII encoding. I
> looked into psql terminal and found:
>
> List of databases
> Name | Owner | Encoding
> -----------------+------------+-----------
> gr4unicode | pgsql | UNICODE
> test_gr4unicode | gr4unicode | SQL_ASCII
>
> DB gr4unicode was created by me, manually:
>
> CREATE DATABASE gr4unicode WITH ENCODING 'UNICODE';
>
> Database test_gr4unicode was created dynamically by calling ./manage.py test


Aaah! :-(

I've been fighting this problem a bit when testing with MySQL, too,
because my system creates the databases in LATIN1 if I don't tell it
anything special and so the test database can't hold the full unicode
range of characters. It creates PostgreSQL database in UTF-8 on my end,
though, so I've never seen it with that database.

Okay... time to fix that problem then. Probably need to introduce a
settings for tests only for database encoding. I should have done that
when I first saw the problem instead of trying to dodge around it.

I hate it when being lazy doesn't work. :-(

I'll put this one on my list. Nice debugging job. Thanks.

Regards,
Malcolm


Ivan Sagalaev

unread,
May 28, 2007, 8:04:01 AM5/28/07
to django...@googlegroups.com
Michal wrote:
> It seems like the test database is created in SQL_ASCII encoding. I
> looked into psql terminal and found:
>
> List of databases
> Name | Owner | Encoding
> -----------------+------------+-----------
> gr4unicode | pgsql | UNICODE
> test_gr4unicode | gr4unicode | SQL_ASCII
>
> DB gr4unicode was created by me, manually:
>
> CREATE DATABASE gr4unicode WITH ENCODING 'UNICODE';
>
> Database test_gr4unicode was created dynamically by calling ./manage.py test

Ah! Yes I've stepped on this one too (wrong collation in my case). I
don't know if it could be worked around currently... Looks like we need
a way to specify db creation parameters.

Michal

unread,
May 28, 2007, 8:11:12 AM5/28/07
to django...@googlegroups.com

It was my pleasure :)

Regards,
Michal

Malcolm Tredinnick

unread,
May 28, 2007, 9:51:55 AM5/28/07
to django...@googlegroups.com
On Mon, 2007-05-28 at 11:27 +0200, Andreas Ahlenstorf wrote:
> Hi,
>
> I'm having issues with the Unicode Branch and mod_python (the
> development server ist working fine). That's what's comming from
> mod_python:
>
> Phase: 'PythonHandler'
> Handler: 'django.core.handlers.modpython'
>
> Traceback (most recent call last):
>
> File "/usr/lib/python2.4/site-packages/mod_python/importer.py",
> line 1537, in HandlerDispatch
> default=default_handler, arg=req, silent=hlist.silent)
>
> File "/usr/lib/python2.4/site-packages/mod_python/importer.py",
> line 1229, in _process_target
> result = _execute_target(config, req, object, arg)
>
> File "/usr/lib/python2.4/site-packages/mod_python/importer.py",
> line 1128, in _execute_target
> result = object(arg)
>
> File "/usr/lib/python2.4/site-packages/django/core/handlers/
> modpython.py", line 177, in handler
> return ModPythonHandler()(req)
>
> File "/usr/lib/python2.4/site-packages/django/core/handlers/
> modpython.py", line 163, in __call__
> req.headers_out[key] = value
>
> TypeError: table values must be strings

I can't replicate this problem, but I can take a guess at what is going
on. In [5377] I've checked in what is probably a fix for the problem.
Could you try it and see if it changes things for you?

If you still get the traceback, try modifying the source just before
that last line in the exception traceback
(django/core/handlers/modpython.py) and print out what "key" and "value"
are. I am guessing they have a type of unicode, but they should still be
ASCII characters, because you can't put anything else into HTTP headers.
So if for some reason there are non-ASCII characters in there, we need
to work out where they are coming from.

However, I suspect [5377] is going to fix the main problem by coercing
both "key" and "value" to string types.

Regards,
Malcolm


Andreas Ahlenstorf

unread,
May 28, 2007, 10:25:37 AM5/28/07
to django...@googlegroups.com

Am 28.05.2007 um 15:51 schrieb Malcolm Tredinnick:

> I can't replicate this problem, but I can take a guess at what is
> going
> on. In [5377] I've checked in what is probably a fix for the problem.
> Could you try it and see if it changes things for you?

Looks good so far. I'll report if the error pops up again.

Thank you!

A.

itsnotvalid

unread,
May 28, 2007, 12:20:20 PM5/28/07
to Django users
Thanks for fixing that.

I also find using the smart_str() really handy, for cases where stuff
getting out of python.

Regards,

itsnotvalid

Michal

unread,
May 28, 2007, 3:30:18 PM5/28/07
to django...@googlegroups.com
>> Aaah! :-(
>>
>> I've been fighting this problem a bit when testing with MySQL, too,
>> because my system creates the databases in LATIN1 if I don't tell it
>> anything special and so the test database can't hold the full unicode
>> range of characters. It creates PostgreSQL database in UTF-8 on my end,
>> though, so I've never seen it with that database.
>>
>> Okay... time to fix that problem then. Probably need to introduce a
>> settings for tests only for database encoding. I should have done that
>> when I first saw the problem instead of trying to dodge around it.
>>
>> I hate it when being lazy doesn't work. :-(
>>
>> I'll put this one on my list. Nice debugging job. Thanks.

Hello again,
I temporarily patched Django source code (django/test/utils.py, lines 96
and 107) to:

cursor.execute("CREATE DATABASE %s WITH ENCODING 'UNICODE'" %
backend.quote_name(TEST_DATABASE_NAME))

So, now I could run my tests. And here is some experience which I get
during debuging (my advices are dedicated mainly for other testers; I am
developing application in Czech language, in utf-8 encoding):

* check *all* your strings (I have a lot of strings like 'něco' or
'%s-123' % var; most of them I must to rewrite to u'něco' and u'%s-123'
% var); check them on all possible places (models, views, tests,
settings, custom tags, ...)

* if you use Client in test (django.test.client), make sure, that you
recode content with smart_unicode function. For example:
response = self.submitHelper('www.example.com')
self.failUnlessEqual(response.status_code, 200)
self.failUnless(smart_unicode(response.content).find(u'nějaký
rětězec') != -1)

* make sure, that data, which you post via client.post, are correctly
encoded. For example:
post_data = {
'item1': u"První položka",
'item2': u"Druhá položka",
'item3': u"Třetí položka"
}
response = self.client.post('/url/', post_data)

Hope this will help to somebody.

Regards
Michal

Jeremy Dunck

unread,
May 28, 2007, 5:26:17 PM5/28/07
to django...@googlegroups.com
On 5/28/07, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
> Okay... time to fix that problem then. Probably need to introduce a
> settings for tests only for database encoding. I should have done that
> when I first saw the problem instead of trying to dodge around it.

FWIW, as a workaround, in Mysql's my.cnf, you can set:
character_set_database = 'utf8'

In postgres, new databases are created from the template1 system
database; new databases will have whatever encoding that database has.
(template0 is the pristine DB shipped with postgres and should never
be changed, but you should feel free to change template1 as is
useful).

Sandro Dentella

unread,
May 28, 2007, 6:30:25 PM5/28/07
to django...@googlegroups.com
On Mon, May 28, 2007 at 04:26:17PM -0500, Jeremy Dunck wrote:
>
> On 5/28/07, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
> > Okay... time to fix that problem then. Probably need to introduce a
> > settings for tests only for database encoding. I should have done that
> > when I first saw the problem instead of trying to dodge around it.

Do we need such a settings or we really need to *copy* database encoding so
that tests are done exactly as the application database. (if it's possible
to use other than utf8...).

That would prevent people from runnnig wanderfull tests on a well configured
db when the real db is still "SQL_ASCII" just becouse template1 was shipped
that way!

sandro
*:-)

Malcolm Tredinnick

unread,
May 28, 2007, 9:33:47 PM5/28/07
to django...@googlegroups.com
On Mon, 2007-05-28 at 16:26 -0500, Jeremy Dunck wrote:
> On 5/28/07, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
> > Okay... time to fix that problem then. Probably need to introduce a
> > settings for tests only for database encoding. I should have done that
> > when I first saw the problem instead of trying to dodge around it.
>
> FWIW, as a workaround, in Mysql's my.cnf, you can set:
> character_set_database = 'utf8'

Yeah, I'm aware of this. It's only a workaround, as you say, though,
since it makes tests dependent on configuration outside of Django.

There are already some implicit assumptions like that in the tests (if
you database can't hold characters that are also in the latin1 charater
set, encoded suitably to the encoding of your database, tests will fail
mysteriously). But don't tell anybody that. We'll keep as just our
little secret. :-)

> In postgres, new databases are created from the template1 system
> database; new databases will have whatever encoding that database has.
> (template0 is the pristine DB shipped with postgres and should never
> be changed, but you should feel free to change template1 as is
> useful).

Agreed.

Since both servers allow you to specify the encoding at creation time,
I'll add support for TEST_DATABASE_CHARSET and TEST_DATABASE_COLLATION
settings today (to trunk, since this isn't Unicode specific). That
should make things more portable.

Regards,
Malcolm


Malcolm Tredinnick

unread,
May 28, 2007, 9:37:12 PM5/28/07
to django...@googlegroups.com
On Tue, 2007-05-29 at 00:30 +0200, Sandro Dentella wrote:
> On Mon, May 28, 2007 at 04:26:17PM -0500, Jeremy Dunck wrote:
> >
> > On 5/28/07, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
> > > Okay... time to fix that problem then. Probably need to introduce a
> > > settings for tests only for database encoding. I should have done that
> > > when I first saw the problem instead of trying to dodge around it.
>
> Do we need such a settings or we really need to *copy* database encoding so
> that tests are done exactly as the application database. (if it's possible
> to use other than utf8...).

That assumes there is an application database to copy and/or that is
configured sensibly. It's a bit of a wart that DATABASE_NAME is required
at all in the test settings file -- it's just that it's woven pretty
deeply into various places in the code. For fun, somebody could look
into fixing that; it would only take an hour or so for somebody with
reasonable Pyhthon familiarity and not much Django internals
familiarity, I would guess.

Short verison: tests should be as independent and run in as uniform an
environment as possible.

> That would prevent people from runnnig wanderfull tests on a well configured
> db when the real db is still "SQL_ASCII" just becouse template1 was shipped
> that way!

The test framework can't realistically insulate people against mistakes
they make outside of Django. You create test databases much more often
than project databases (unless you don't run your tests). So I'm -1 on
trying to do anything like this.

Regards,
Malcolm


Sandro Dentella

unread,
May 29, 2007, 3:27:55 AM5/29/07
to django...@googlegroups.com
> > Do we need such a settings or we really need to *copy* database encoding so
> > that tests are done exactly as the application database. (if it's possible
> > to use other than utf8...).
>
> That assumes there is an application database to copy and/or that is
> configured sensibly. It's a bit of a wart that DATABASE_NAME is required
> at all in the test settings file -- it's just that it's woven pretty
> deeply into various places in the code. For fun, somebody could look
> into fixing that; it would only take an hour or so for somebody with
> reasonable Pyhthon familiarity and not much Django internals
> familiarity, I would guess.
>
> Short verison: tests should be as independent and run in as uniform an
> environment as possible.

Well I may be thinking at a different use case... but when you run
'manage.py test' you *do* have a project configured, don't you?

> The test framework can't realistically insulate people against mistakes
> they make outside of Django. You create test databases much more often
> than project databases (unless you don't run your tests). So I'm -1 on
> trying to do anything like this.

I consider this a slippery situation. I forsee many cases in which
application db and test db will be different and a simple check in
run_test/create_test_db could prevent people from scratchind their head to
understand why tests pass and application raises DecodeError... as has been
in this same thread around message 25.

sandro
*:-)


--
Sandro Dentella *:-)
e-mail: san...@e-den.it
http://www.tksql.org TkSQL Home page - My GPL work

Malcolm Tredinnick

unread,
May 29, 2007, 3:39:12 AM5/29/07
to django...@googlegroups.com
On Tue, 2007-05-29 at 09:27 +0200, Sandro Dentella wrote:
> > > Do we need such a settings or we really need to *copy* database encoding so
> > > that tests are done exactly as the application database. (if it's possible
> > > to use other than utf8...).
> >
> > That assumes there is an application database to copy and/or that is
> > configured sensibly. It's a bit of a wart that DATABASE_NAME is required
> > at all in the test settings file -- it's just that it's woven pretty
> > deeply into various places in the code. For fun, somebody could look
> > into fixing that; it would only take an hour or so for somebody with
> > reasonable Pyhthon familiarity and not much Django internals
> > familiarity, I would guess.
> >
> > Short verison: tests should be as independent and run in as uniform an
> > environment as possible.
>
> Well I may be thinking at a different use case... but when you run
> 'manage.py test' you *do* have a project configured, don't you?

Not necessarily for production setup and usually not with my production
settings file. I will typically be running the tests on my development
machines, far away from any production installation.

>
> > The test framework can't realistically insulate people against mistakes
> > they make outside of Django. You create test databases much more often
> > than project databases (unless you don't run your tests). So I'm -1 on
> > trying to do anything like this.
>
> I consider this a slippery situation. I forsee many cases in which
> application db and test db will be different and a simple check in
> run_test/create_test_db could prevent people from scratchind their head to
> understand why tests pass and application raises DecodeError... as has been
> in this same thread around message 25.

Since the tests won't have access to production setups in realistic
situations (development doesn't happen on production machines), this
isn't possible to enforce.

Fortunately, the good news is that if somebody like you wants to enforce
this equivalence because you are working with access to your production
db all the time, you can easily extract the necessary settings for the
tests from your production settings. It's just Python code underneath,
after all. You can do whatever you like in settings.py.

Regards,
Malcolm


Marc Fargas

unread,
May 29, 2007, 5:58:23 AM5/29/07
to django...@googlegroups.com
Hi there,
When using i18n I'm getting a nice stacktrace for every request, this
only happens once you set the language for a client and the exception
raises due to Content-Language being a Unicode string for some reason.

Backtrace:


Traceback (most recent call last):
File

"/var/lib/python-support/python2.4/django/core/servers/basehttp.py", line 272, in run
self.result = application(self.environ, self.start_response)
File
"/var/lib/python-support/python2.4/django/core/servers/basehttp.py", line 614, in __call__
return self.application(environ, start_response)
File
"/var/lib/python-support/python2.4/django/core/handlers/wsgi.py", line 206, in __call__
start_response(status, response_headers)
File
"/var/lib/python-support/python2.4/django/core/servers/basehttp.py", line 360, in start_response
assert type(val) is StringType,"Header values must be
strings"
AssertionError: Header values must be strings

If I place "import pdb; pdb.set_trace()" on basehttp.py before the
assert it's the Content-Language header the one that is not a string:

> /var/lib/python-support/python2.4/django/core/servers/basehttp.py(359)start_response()
-> for name,val in headers:
(Pdb) p headers
[('Vary', 'Accept-Language, Cookie'), ('Content-Type',
'text/html; charset=utf-8'), ('Content-Language', u'ca'),
('Set-Cookie', ' sessionid=9935d4ee987c1a0cc42f051fda2bbe09;
expires=Tue, 12-Jun-2007 10:03:45 GMT; Max-Age=1209600;
Path=/;')]

Not sure if that's my fault that made Content-Language to be a unicode
string or if it's a bug on unicode branch.

Cheers,
Marc
El jue, 24-05-2007 a las 23:06 +1000, Malcolm Tredinnick escribió:
> Hi folks,
>
> The unicode branch, [1], is now at a point where it is essentially
> feature-complete and could do with a bit of heavy testing from the wider
> community.
>
> So if you have some applications that work against Django's current
> trunk and would like to try them out on the unicode branch, I'd
> appreciate your efforts. The porting effort should be very minimal
> (almost zero, in many cases).
>
> For code that is only meant to work with ASCII data, there are probably
> no changes required at all. For code that is meant to work with all
> kinds of input (essentially, arbitrary strings), there are a few quick
> porting steps required.
>
> See [2] for the short list (5 steps, maximum!) of changes you might need
> to make. For more detailed information, have a read through the
> unicode.txt document in the docs/ directory of the branch.
>
> Any bugs you find should be filed in Trac. Put "[unicode]" at the start
> of the summary title so that I can search for them later. No need to put
> any special keywords or anything like that in (the "version" field
> should be set to "other branch", if you remember).
>
> A couple of things to watch out for when you're testing:
>
> (A) Strings that seem to mysteriously disappear, but when you
> examine the source, you see something like
> "<django.utils.functional.__proxy__ object at 0x2aaaaf87a750>".
> These shouldn't be too common and will mostly be restricted to
> places like the admin interface that do introspection.
>
> (B) Translations that happen too early. If you have translations
> available and use your app in a language that is different from
> the LANGUAGE_CODE setting, watch out for any strings that are
> translated into LANGUAGE_CODE, instead of your current locale.
> This is a sign that ugettext() is being used somewhere that
> ugettext_lazy() should be used.
>
> (C) If you're using Python 2.3, look for strings that don't make
> much sense when printed. That is a sign that a bytestring is
> being used where a unicode string was needed (not your fault;
> it's an oversight in Django). Python 2.3 has some
> "interesting" (I could use nastier words) behaviour when it
> tries to interpolate non-string objects into unicode strings (it
> doesn't call the __unicode__ method!!) and we have to work
> around them explicitly. I think I've got most of them, but I'll
> bet I have overlooked some.
>
> Most bugs that people are finding at the moment fit into one of these
> categories and they are very easy to fix once we find them. I've tried
> to nail most of them in advance, but you can probably imagine how
> exciting it is to read every line of source code and try to find all the
> strings that are in a precise form that need changing. My attention may
> have drifted from time to time.
>
> Have realistic expectations about this branch, too. It is meant to be as
> close to 100% backwards-compatible as we can make it. So, for example,
> usernames still have to use normal ASCII alphabetic characters, etc.
> Similarly, the slugify filter still behaves as it did before. At some
> point it will be extended to handle a _few_ more non-ASCII characters,
> but it's never going to be a full transliteration function. They are the
> two big items I expect people would otherwise try to extend beyond what
> is intended. There may be others and I'm sure we'll discover what they
> are as the questions pop up.
>
> [1] http://code.djangoproject.com/wiki/UnicodeBranch
> [2]
> http://code.djangoproject.com/wiki/UnicodeBranch#PortingApplicationsTheQuickChecklist
>
> Regards,
> Malcolm
>
>
>

Malcolm Tredinnick

unread,
May 29, 2007, 6:03:12 AM5/29/07
to django...@googlegroups.com

There are no bugs on the unicode branch; so it's clearly your fault. :-)

However, just this once, we can adjust the code so that this doesn't
happen.

If anybody doesn't believe software development is hard, this is a
beautiful example. I fixed a very similar problem for modpython last
night. At the same time I looked for analogous problems in the other
handlers and didn't see any. I even checked the tests and
http/__init__.py. Tonight, the same sort of problem pops up but in the
development *server*, despite the fact that I have been testing things
in different locales all along (which is really strange).

It's a trivial fix. Wait 30 minutes and try again.

Regards,
Malcolm

Marc Fargas

unread,
May 29, 2007, 6:08:55 AM5/29/07
to django...@googlegroups.com
El mar, 29-05-2007 a las 20:03 +1000, Malcolm Tredinnick escribió:
> There are no bugs on the unicode branch; so it's clearly your fault. :-)
Ahh! I knew I knew... :)

> However, just this once, we can adjust the code so that this doesn't
> happen.

Oh, thanks!!! :)

> If anybody doesn't believe software development is hard, this is a
> beautiful example. I fixed a very similar problem for modpython last
> night. At the same time I looked for analogous problems in the other
> handlers and didn't see any. I even checked the tests and
> http/__init__.py. Tonight, the same sort of problem pops up but in the
> development *server*, despite the fact that I have been testing things
> in different locales all along (which is really strange).

Yes, I forgot to mention It was happening in the development server ;)

> It's a trivial fix. Wait 30 minutes and try again.

So in 30 minutes I'll svn up and continue migrating to the shiny bug
free unicode branch!

cheers,
Marc

Malcolm Tredinnick

unread,
May 29, 2007, 6:22:12 AM5/29/07
to django...@googlegroups.com
On Tue, 2007-05-29 at 20:03 +1000, Malcolm Tredinnick wrote:
[...]

>
> If anybody doesn't believe software development is hard, this is a
> beautiful example. I fixed a very similar problem for modpython last
> night. At the same time I looked for analogous problems in the other
> handlers and didn't see any. I even checked the tests and
> http/__init__.py. Tonight, the same sort of problem pops up but in the
> development *server*, despite the fact that I have been testing things
> in different locales all along (which is really strange).
>
> It's a trivial fix. Wait 30 minutes and try again.

Fixed in [5378].

It's actually a good bug to find; we were violating the WSGI spec, so it
actually was an error in our wsgi handler. Fortunately, the dev server
is very aggressive about enforcing "MUST" requirements in that area, at
least.

Note that if you actually *try* to pass something that is not ASCII data
in a header, you'll get an error, but that's an error in the code,
rather than just a problem with passing the wrong type. Shouldn't be an
issue, though.

Regards,
Malcolm

Marc Fargas

unread,
May 29, 2007, 6:32:54 AM5/29/07
to django...@googlegroups.com
El mar, 29-05-2007 a las 20:22 +1000, Malcolm Tredinnick escribió:
> Fixed in [5378].

Thanks, it works perfectly now! ;)

> It's actually a good bug to find; we were violating the WSGI spec, so it
> actually was an error in our wsgi handler. Fortunately, the dev server
> is very aggressive about enforcing "MUST" requirements in that area, at
> least.

Good live to the dev server :)

Cheers,
Marc

itsnotvalid

unread,
May 29, 2007, 6:37:28 AM5/29/07
to Django users
Oh man... look like we are not going to file any tickets to
code.djangoproject.com... I should feel sorry for that because I was
one of them ;-)

Malcolm Tredinnick

unread,
May 29, 2007, 9:12:10 AM5/29/07
to django...@googlegroups.com

There are now TEST_DATABASE_CHARSET and TEST_DATABASE_COLLATION (the
latter for MySQL only) settings. These are [5380] for trunk and [5381]
in the unicode branch.

I've tested it in as far as seeing that the test databases are created
with the right encodings and collations, where appropriate. Please
report any other bugs to Trac.

Regards,
Malcolm


Michal

unread,
May 29, 2007, 9:21:58 AM5/29/07
to django...@googlegroups.com
Malcolm Tredinnick wrote:
> There are now TEST_DATABASE_CHARSET and TEST_DATABASE_COLLATION (the
> latter for MySQL only) settings. These are [5380] for trunk and [5381]
> in the unicode branch.
>
> I've tested it in as far as seeing that the test databases are created
> with the right encodings and collations, where appropriate. Please
> report any other bugs to Trac.

I have just test it, and everything is working fine.

Thank you

Regards
Michal

Almad

unread,
May 29, 2007, 11:38:29 AM5/29/07
to Django users
Hi,

I'm trying to migrate to unicode branch, but Syndication framework
won't work for me (usual UnicodeDecodeError). I made sure that all
strings are u'' ones.

Is this middleware ready for unicode?

Thank You,

Almad

Malcolm Tredinnick

unread,
May 29, 2007, 9:09:54 PM5/29/07
to django...@googlegroups.com

Everything is ready (you can see all the pieces that were ported over in
the "TODO" list on the UnicdeBranch wiki page). Syndication has
definitely been tested with non-ASCII content, so it should also work.
Any failures are bugs (either in Django or your code).

Please open a ticket in Trac with a (simple) example of what is going
wrong.

Regards,
Malcolm


Almad

unread,
May 30, 2007, 8:03:22 AM5/30/07
to Django users

On May 30, 3:09 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:


> Everything is ready (you can see all the pieces that were ported over in
> the "TODO" list on the UnicdeBranch wiki page). Syndication has
> definitely been tested with non-ASCII content, so it should also work.
> Any failures are bugs (either in Django or your code).
>
> Please open a ticket in Trac with a (simple) example of what is going
> wrong.

I played with it a bit and discovered that unicode description won't
work for me.

Filled as #4430

> Regards,
> Malcolm

Regards (and Thank You for branch),

Almad

Jason Davies

unread,
May 30, 2007, 9:20:07 AM5/30/07
to Django users
On May 24, 2:06 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

> The unicode branch, [1], is now at a point where it is essentially
> feature-complete and could do with a bit of heavy testing from the wider
> community.
>
> So if you have some applications that work against Django's current
> trunk and would like to try them out on the unicode branch, I'd
> appreciate your efforts. The porting effort should be very minimal
> (almost zero, in many cases).

We've ported our code over and no problems so far!

Jason

Michael Radziej

unread,
May 30, 2007, 11:04:45 AM5/30/07
to django...@googlegroups.com
Hi Malcolm!

On Sat, May 26, Malcolm Tredinnick wrote:
> On Fri, 2007-05-25 at 17:17 +0200, Michael Radziej wrote:
> > First, I found that I have a problem with commit 5255 together with the test
> > client. It breaks loading the modules, probably due to recursive imports.
> >
> > - management activates translation
> > - this loads all apps
> > - One of my apps loads the test Client (I'm use a different testing
> > framework that uses the django test client)
> > - test client loads contrib.session
> > - the model meta class starts translation in contribute_to_class
> > - this loads all apps --> doesn't work
> >
> > I moved the import statement in my app into the function --> works.
> >
> > I suggest to change the test client so that it imports other models
> > only in a function and not at compile time.
>
> Fixed at the source of the problem (django.db.models.options) in [5345].
> At least, I'm pretty sure that will fix it. Let me know if the problem
> persists (and why, because then it's not as you describe).

Very nice, I can confirm that it works.

> > Second, I have a map of view tags, verbose names for these and how to build
> > the url (it was born before the regex reverser). This map uses gettext_lazy
> > for the verbose names, which is used later with the % operator. This fails
> > because
> >
> > In [44]: "%s" % gettext_lazy("Dienste")
> > Out[44]: '<django.utils.functional.__proxy__ object at 0xb70dacac>'
> >
> > With proper unicode objects, though, it works:
> >
> > In [45]: u"%s" % ugettext_lazy("Dienste")
> > Out[45]: u'Services'
> >
> > (It really requires both that the pattern is unicode and that ugettext_lazy is
> > used and not gettext_lazy)
> >
> > I'm now working to work around this, but it's a lot of replacements from
> > "gettext_lazy" --> "ugettext_lazy" and also to promote all the patterns to
> > unicode.
>
> Fixed in [5344]. '%s' % gettext_lazy('Dienste') will do what you expect
> now.

That's terrific! I already had resigned myself with boring hours to change
hundreds of strings to unicode! Should we ever meet in person, that's one
beer, at least ;-)

BTW, are you aware that unicode will finally retire mysql_old? I'll now
switch to mysql and then continue testing.

Michael

Eugene Morozov

unread,
Jun 2, 2007, 5:46:41 PM6/2/07
to Django users

On 24 , 17:06, Malcolm Tredinnick <malc...@pointy-stick.com> wrote:
> Hi folks,


>
> The unicode branch, [1], is now at a point where it is essentially
> feature-complete and could do with a bit of heavy testing from the wider
> community.
>
> So if you have some applications that work against Django's current
> trunk and would like to try them out on the unicode branch, I'd
> appreciate your efforts. The porting effort should be very minimal
> (almost zero, in many cases).

Hello,
I've checked out unicode branch today and immediately found two bugs.
This code doesn't work:
def __unicode__(self):
langs = dict(settings.LANGUAGES)
return _("%s text of the page %s") % (langs[self.language],
self.page.url)

(I get TypeError: unsupported operand type(s) for %: '__proxy__' and
'tuple')

The second bug is actually the unicode bug that was present in non-
unicode django and still persists in unicode branch. Unicode data
fetched from postgresql using psycopg2 is invalid under some
circumstances. I'll provide more details when I'll have time.
Eugene

> [2]http://code.djangoproject.com/wiki/UnicodeBranch#PortingApplicationsT...
>
> Regards,
> Malcolm

Malcolm Tredinnick

unread,
Jun 2, 2007, 11:39:57 PM6/2/07
to django...@googlegroups.com
Hi Eugene,

On Sat, 2007-06-02 at 21:46 +0000, Eugene Morozov wrote:
>
>
> On 24 , 17:06, Malcolm Tredinnick <malc...@pointy-stick.com> wrote:
> > Hi folks,
> >
> > The unicode branch, [1], is now at a point where it is essentially
> > feature-complete and could do with a bit of heavy testing from the wider
> > community.
> >
> > So if you have some applications that work against Django's current
> > trunk and would like to try them out on the unicode branch, I'd
> > appreciate your efforts. The porting effort should be very minimal
> > (almost zero, in many cases).
>
> Hello,
> I've checked out unicode branch today and immediately found two bugs.
> This code doesn't work:
> def __unicode__(self):
> langs = dict(settings.LANGUAGES)
> return _("%s text of the page %s") % (langs[self.language],
> self.page.url)
>
> (I get TypeError: unsupported operand type(s) for %: '__proxy__' and
> 'tuple')

Just a tip for next time: a bit more information would have been really
useful for working this out: which line gives the exception? What is _()
an alias for? What is the code surrounding this? You are using all sorts
of variables here that could come from anywhere.

After a bit of scratching my head and experimenting, I figured out that
you were saying that

ugettext_lazy("some string %s") % some_variable

is not doing what you expect.

It's probably not a great idea to use ugettext_lazy and then immediately
substitute in variables (using ugettext() would be faster), but we can
probably hack something up (and it will be a hack, because Python
doesn't supply enough information to be able to override the '%'
operator perfectly in these cases) in order to have this work for
ugettext_lazy and gettext_lazy and friends so that they behave more like
strings there.

> The second bug is actually the unicode bug that was present in non-
> unicode django and still persists in unicode branch. Unicode data
> fetched from postgresql using psycopg2 is invalid under some
> circumstances. I'll provide more details when I'll have time.

Please open a ticket when you have the necessary information (don't
report it here, it will just get lost in the traffic) -- nobody has ever
reported this before, that I'm aware of. It sounds very strange -- all
the conversion to unicode is done inside psycopg2 and once it's
converted to a unicode object I'm not sure what "invalid" would mean in
this context -- the data is either converted from whatever PostgreSQL
stores it as or it isn't. So we'll need an example of exactly what is
going wrong to be able to reproduce this. The example should include
what you expect to see and what actually happens.

Thanks for taking the time to test the code. It's appreciated.

Regards,
Malcolm


Malcolm Tredinnick

unread,
Jun 3, 2007, 1:38:11 AM6/3/07
to django...@googlegroups.com
On Sun, 2007-06-03 at 13:39 +1000, Malcolm Tredinnick wrote:
> Hi Eugene,
>
> On Sat, 2007-06-02 at 21:46 +0000, Eugene Morozov wrote:
[...]

> > I've checked out unicode branch today and immediately found two bugs.
> > This code doesn't work:
> > def __unicode__(self):
> > langs = dict(settings.LANGUAGES)
> > return _("%s text of the page %s") % (langs[self.language],
> > self.page.url)
> >
> > (I get TypeError: unsupported operand type(s) for %: '__proxy__' and
> > 'tuple')

[...]


> ugettext_lazy("some string %s") % some_variable
>
> is not doing what you expect.
>
> It's probably not a great idea to use ugettext_lazy and then immediately
> substitute in variables (using ugettext() would be faster), but we can
> probably hack something up (and it will be a hack, because Python
> doesn't supply enough information to be able to override the '%'
> operator perfectly in these cases) in order to have this work for
> ugettext_lazy and gettext_lazy and friends so that they behave more like
> strings there.

This is fixed in [5420].

Regards,
Malcolm


ZebZiggle

unread,
Jun 13, 2007, 8:50:41 AM6/13/07
to Django users
Hi, I'm really stuck on a bug (decode exceptions on attempting to save
utf-8 to postgres text fields) and suspect I'm going to need the
unicode branch to solve it. But I have some questions:

1. When is it likely that this branch will be merged with trunk?

2. How would you rank the stability of this branch currently?

3. I've read http://code.djangoproject.com/wiki/UnicodeBranch but are
there any more docs/examples of the conversion process available.

I'm getting close to production for my project and fear adopting this
branch will be a major setback if it's not ready for prime time.

Any feedback would be great.

Thanks!
Sandy

Malcolm Tredinnick

unread,
Jun 13, 2007, 9:05:03 AM6/13/07
to django...@googlegroups.com
On Wed, 2007-06-13 at 05:50 -0700, ZebZiggle wrote:
> Hi, I'm really stuck on a bug (decode exceptions on attempting to save
> utf-8 to postgres text fields) and suspect I'm going to need the
> unicode branch to solve it. But I have some questions:
>
> 1. When is it likely that this branch will be merged with trunk?

A couple of weeks, maybe. The Oracle branch is going first and that's
probably blocking on me installing Oracle at home to test it during the
merge.

However, it's also a bit irrelevant on many levels, since the unicode
branch is synced with trunk about twice a week. So it's just like using
trunk.

>
> 2. How would you rank the stability of this branch currently?

Excellent.

>
> 3. I've read http://code.djangoproject.com/wiki/UnicodeBranch but are
> there any more docs/examples of the conversion process available.

Read the UnicodeBranch wiki page.

>
> I'm getting close to production for my project and fear adopting this
> branch will be a major setback if it's not ready for prime time.

Have a look at the rest of the posts in this thread. Notice that all
reported bugs are relatively trivial and have been fixed very quickly.
Should give you a feeling for how likely there are to be huge
insurmountable problems.

Regards,
Malcolm

Malcolm Tredinnick

unread,
Jun 13, 2007, 9:07:04 AM6/13/07
to django...@googlegroups.com
On Wed, 2007-06-13 at 23:05 +1000, Malcolm Tredinnick wrote:
> On Wed, 2007-06-13 at 05:50 -0700, ZebZiggle wrote:
[...]

> >
> > 3. I've read http://code.djangoproject.com/wiki/UnicodeBranch but are
> > there any more docs/examples of the conversion process available.
>
> Read the UnicodeBranch wiki page.

Sorry, brain failure on my part ... read the unicode.txt file in the
docs directory. That explains all the details of the new features. The
only conversaion information is the short section on the wiki page
because that's really all there is to it.

Regards,
Malcolm

ZebZiggle

unread,
Jun 13, 2007, 10:00:10 AM6/13/07
to Django users
Super ... thanks Malcolm!

I'll start later today.

-Sandy

PS> Congratulations ... looks like a great addition.

Reply all
Reply to author
Forward
0 new messages