python api and unicode data

39 views
Skip to first unread message

Didier Bretin

unread,
Feb 8, 2008, 7:23:22 AM2/8/08
to remembert...@googlegroups.com
Hi,

I'm trying to use the python api under windows with python 2.5.1. It works
and I can have the tasks retreived. But if I look at the name of the tasks
I got things like this:
t\u00e9l\u00e9charger

In french the \u00e9 is a é.

How can I convert the string to have the é printed and not the \u00e9 ?

I tried to do:
task = "t\u00e9l\u00e9charger"
print task.encode('latin-1')

And I still have:
t\u00e9l\u00e9charger

and not:
télécharger

Can you help me with this issue ?

Regards.
--
Didier BRETIN
http://www.bretin.net/

Mariano Draghi

unread,
Mar 23, 2008, 12:01:25 PM3/23/08
to Remember The Milk API
On 8 feb, 09:23, Didier Bretin <did...@bretin.net> wrote:
> Hi,
>
> I'm trying to use thepythonapi under windows withpython2.5.1. It works
> and I can have the tasks retreived. But if I look at the name of the tasks
> I got things like this:
> t\u00e9l\u00e9charger
>
> In french the \u00e9 is a é.
>
> How can I convert the string to have the é printed and not the \u00e9 ?

Hi,

I've just starting to play arround with the RTM API from Python, and
came across the exact same issue.
The problem is that the JSON strings are not being parsed at all, and
are interpreted "as is".
For example, when the JSON object contains the string
t\u00e9l\u00e9charger
it is used "as is", i.e, "\", "u", "0", "0", "e" and "9" are treated
as separated chars, and NOT as the unicode sequence "\u00e9" (which is
a single unicode character).

As I didn't want to reinvent the wheel, and there are a few Python
JSON parsers out there (which handle Unicode literals correctly), I
solved the issue using one of those parsers.

1) I installed "simplejson":
http://pypi.python.org/pypi/simplejson

2) I made a small change in rtm.py so it uses simplejson to parse the
response. First, I added a line to import simplejson, like this:
import new
import warnings
import urllib
from md5 import md5
import simplejson # --> this is the line I added in rtm.py

And then, I changed the get() method replacing the line which creates
the "data" object from the JSON response, like this:
def get(self, **params):
[...]
#data = dottedJSON(json) --> I commented out this line
data = dottedDict('ROOT', simplejson.loads(json)) # --> I
added this line, which parses the response using simplejson
[...]

With that little change, the unicode sequences are correctly handled,
so you get unicode objects when needed.

I really don't know if this is the correct approach, as I've just
started to play with this. But I'm sure RTM API handles *all* the data
in UTF-8, and that the actual implementation of rtm.py is being a
little naïve with the JSON des-serialization as regards encoding.
So I think rtm.py needs to use a 3rd party complete JSON parser (like
simplejson), implement its own custom (but complete!!!) parser (I
really don't see the point, as there are already very good and
maintained JSON parsers out there), or use the REST response format
instead of JSON (handling unicode responses properly, anyway...)


Regards,


--
Mariano

Mariano Draghi

unread,
Mar 25, 2008, 9:27:30 AM3/25/08
to Remember The Milk API
On Feb 8, 9:23 am, Didier Bretin <did...@bretin.net> wrote:
> Hi,
>
> I'm trying to use the python api under windows with python 2.5.1. It works
> and I can have the tasks retreived. But if I look at the name of the tasks
> I got things like this:
> t\u00e9l\u00e9charger
>
> In french the \u00e9 is a é.
>
> How can I convert the string to have the é printed and not the \u00e9 ?

Hi,

I've made some improvements to pyrtm. Please see:
http://groups.google.com/group/rememberthemilk-api/browse_thread/thread/e6abf3872c074462

Regards,


--
Mariano Draghi
Reply all
Reply to author
Forward
0 new messages