I was trying to "unicodify" TurboGears, starting with wiki20 tutorial.
Here are the results.
(Please note that I follow "unicode everywhere" principle here)
Feel free to do what you want with my code (submit relevant tickets,
patches etc.)
Wiki20 code itself
==================
1. content = content.encode('utf8') must be removed, because Kid deals
with unicode->utf8 itself.
2. WikiWords regular expression obviously doesn't work with non-Latin
alphabet, but I didn't have time to fix it. I created test page with
unicode name and data manually, but that is probably not enough to
capture all unicode-related bugs...
3. wiki20.tgz is not is sync with latest TurboGears and tutorial!
Kid
===
Kid was the easiest to deal with. It just accepted unicode and
automatically encoded html to utf8.
CherryPy
========
Decoding incoming arguments
---------------------------
I had to add the following lines to dev.cfg (so that CP would decode
incoming parameters to unicode):
decodingFilter.on = True
decodingFilter.encoding = "utf8"
(Note: encodingFilter should NOT be enabled, because of Kid.)
Decoding and encoding urls
--------------------------
Specially encoded urls (like
http://localhost:8080/%D0%A2%D0%B5%D1%81%D1%82) was processed from/to
utf8 by CP automatically. But unicode<->utf8 had to be done manually:
...
def default(self, pagename):
pagename = unicode(pagename,'utf8')
return self.index(pagename)
...
and
...
raise cherrypy.HTTPRedirect("/%s" % pagename.encode('utf8'))
Decoding and encoding cookies (turbogears.flash)
------------------------------------------------
I wanted to translate "Changes saved!" to Russian. To do this I had to
make turbogears.flash to work with unicode parameters. I patched
Turbogears (just encoding message before calling flash doesn't work --
http://groups.google.com/group/turbogears/browse_frm/thread/13539eb82b6fa60c).
Also take a look at http://www.cherrypy.org/ticket/353.
--- controllers.py.original Thu Oct 20 04:12:42 2005
+++ controllers.py Thu Oct 20 04:16:14 2005
@@ -128,6 +128,7 @@
def flash(message):
"""Set a message to be displayed in the browser on next page
display"""
+ message = message.encode('utf8')
cherrypy.response.simpleCookie['tg_flash'] = message
cherrypy.response.simpleCookie['tg_flash']['path'] = '/'
@@ -136,6 +137,7 @@
after retrieval."""
try:
message = cherrypy.request.simpleCookie["tg_flash"].value
+ message = unicode(message,'utf8')
cherrypy.response.simpleCookie["tg_flash"] = ""
cherrypy.response.simpleCookie["tg_flash"]['expires'] = 0
cherrypy.response.simpleCookie['tg_flash']['path'] = '/'
SQLObject
=========
Changing StringCol to UnicodeCol doesn't help, because SQLObject won't
automatically encode queries (e.g. Page.byPagename wouldn't work
properly).
Instead I patched SQLObject as suggested by Stuart Bishop:
http://article.gmane.org/gmane.comp.python.sqlobject/2027
--- dbconnection.py.original Thu Oct 20 04:43:08 2005
+++ dbconnection.py Wed Oct 19 23:38:14 2005
@@ -292,6 +292,11 @@
def _executeRetry(self, conn, cursor, query):
if self.debug:
self.printDebug(conn, query, 'QueryR')
+ if isinstance(query, unicode):
+ query = query.encode('utf8')
+ else:
+ # raise UnicodeError if it is not valid utf8 already
+ query.decode('utf8')
return cursor.execute(query)
def _query(self, conn, s):
--- col.py.original Thu Oct 20 04:45:30 2005
+++ col.py Thu Oct 20 04:46:50 2005
@@ -503,17 +485,15 @@
def to_python(self, value, state):
if value is None:
return None
- if isinstance(value, unicode):
- return value.encode("ascii")
+ if isinstance(value, str):
+ return unicode(value,"utf8")
return value
def from_python(self, value, state):
if value is None:
return None
- if isinstance(value, str):
- return value
if isinstance(value, unicode):
- return value.encode("ascii")
+ return value.encode("utf8")
return value
class SOStringCol(SOStringLikeCol):
Probably it's not a very good way to fix SQLObject, because _every_
query is encoded into utf8. There was a discussion on the mailing list,
but they didn't come to any conclusion:
http://thread.gmane.org/gmane.comp.python.sqlobject/2156
FormEncode
==========
validators.StringBoolean doesn't work when parameters coming from the
browser are decoded using CherryPy's decodingFilter. I patched it:
--- validators.py.original Thu Oct 20 04:29:48 2005
+++ validators.py Thu Oct 20 04:31:08 2005
@@ -1516,7 +1516,7 @@
messages = { "string" : "Value should be %(true)r or %(false)r" }
def _to_python(self, value, state):
- if isinstance(value, str):
+ if isinstance(value, basestring):
value = value.strip().lower()
if value in self.true_values:
return True
Alexey
First of all, I finished converting wiki20 to unicode. The last bit was
to change regular expression to accept [[Wikipedia style links]],
instead of WikiWords:
wikiwords = re.compile(r"\[\[(.+)\]\]")
> This is great! In fact, this would even make an excellent follow-on
> tutorial (after fixing up the problems you've found in the process, of
> course).
To my mind, it's better to patch TurboGears (and its "components") in a
way so that wiki tutorial will work with unicode without any changes.
(Things that definitely must be done: removal of "content =
content.encode('utf8')" and proper regular expression for wiki links --
like the one above.)
> I'll have to take a better look and respond to the other bits below tomorrow...
I am looking forward for that!
Note, that I have already posted to FormEncode regarding StringBoolean
(http://sourceforge.net/mailarchive/forum.php?thread_id=8704576&forum_id=37497)
and to CherryPy regarding cookies and request line
(http://www.cherrypy.org/ticket/353). The only thing that is left is
SQLObject, because patch I have posted here is based on patch that was
made about a year ago. It is still not commited...
> By the way, does anyone have a copy of the final wiki20 project? I did
> a quick, messy run through before releasing 0.8 but I didn't save it.
> I would like to update the tarball to match.
Has anyone replied to your request? If no, I can prepair proper tarball
for you.
Probably you want to keep code for wiki20 inside a repository...
Alexey