Unicodeerror

2 views
Skip to first unread message

chris

unread,
Oct 23, 2010, 6:08:01 AM10/23/10
to Mappa - Topic Maps
Hi there,

I am getting the dreaded Unicodeerror when writing a topicmap from
mappa. Is there any setting that I can use to make sure that the
output is actually written in UTF-8?

What I currently get is this:

>>> conn.write('http://www.example.org/schema/map',
out=out,
format='xtm',prettify=True)
... ... Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/Mappa-0.1.6-py2.6.egg/mappa/_internal/
enhancer.py", line 146, in _write
writer.write(tm)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/Mappa-0.1.6-py2.6.egg/mappa/writer/xtm/
xtm2.py", line 100, in write
write_topic(topic)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/Mappa-0.1.6-py2.6.egg/mappa/writer/xtm/
xtm2.py", line 139, in _write_topic
write_name(name)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/Mappa-0.1.6-py2.6.egg/mappa/writer/xtm/
xtm2.py", line 173, in _write_name
self._writer.dataElement('value', name.value)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/tm-0.1.5-py2.6.egg/tm/mio/xmlutils.py",
line 115, in dataElement
self.characters(data)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/tm-0.1.5-py2.6.egg/tm/mio/xmlutils.py",
line 132, in characters
self._out.write(escape(content))
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-7: ordinal not in range(128)

Lars Heuer

unread,
Oct 23, 2010, 10:28:16 AM10/23/10
to ma...@googlegroups.com
Hi Chris,

[...]


> I am getting the dreaded Unicodeerror when writing a topicmap from
> mappa.  Is there any setting that I can use to make sure that the
> output is actually written in UTF-8?
>
> What I currently get is this:
>
>>>> conn.write('http://www.example.org/schema/map',
>           out=out,
>           format='xtm',prettify=True)

[...]

Mappa should keep all strings as Unicode strings internally, but makes
no attempt to encode the strings during serialization.

What's "out" in your use case above?

Please try:

import codecs
out = codecs.open('mymap.xtm', encoding='utf-8', mode='w')


conn.write('http://www.example.org/schema/map',
          out=out,
          format='xtm',prettify=True)

Best regards,
Lars
--
Semagia
<http://www.semagia.com/>

Christian Wittern

unread,
Oct 24, 2010, 9:40:37 PM10/24/10
to ma...@googlegroups.com
Hi Lars,

On 2010-10-23 23:28, Lars Heuer wrote:
>>>>> conn.write('http://www.example.org/schema/map',
>>>>>
>> out=out,
>> format='xtm',prettify=True)
>>
> [...]
>
> Mappa should keep all strings as Unicode strings internally, but makes
> no attempt to encode the strings during serialization.
>

Since the XML files contain the explicit statement "encoding='utf-8', it
might be worthwhile to consider making this the default.


> What's "out" in your use case above?
>

I just copied and tried the example from the Quickstart guide.

> Please try:
>
> import codecs
> out = codecs.open('mymap.xtm', encoding='utf-8', mode='w')
> conn.write('http://www.example.org/schema/map',
> out=out,
> format='xtm',prettify=True)
>

OK, now I see how to do this. Thanks a lot!

Chris


Lars Heuer

unread,
Oct 25, 2010, 3:02:08 AM10/25/10
to ma...@googlegroups.com
Hi Christian,

[...]


> Since the XML files contain the explicit statement "encoding='utf-8', it
> might be worthwhile to consider making this the default.

[...]

Well, the serializer writes as encoding whatever you specify as
"encoding". But you're right, the XML writer should write everything
in the correct encoding.

I added an issue for it.
<https://code.google.com/p/mappa/issues/detail?id=59>

Thanks for the hint and best regards,

Lars Heuer

unread,
Oct 25, 2010, 12:08:54 PM10/25/10
to Christian Wittern
Hi Christian,

[Unicode encoding error]


> Since the XML files contain the explicit statement "encoding='utf-8', it
> might be worthwhile to consider making this the default.

I fixed the problem in rev. 378. This change does not satisfy a new
release, but you may copy
<https://code.google.com/p/mappa/source/browse/tm/trunk/src/tm/xmlutils.py>
into your site-packages/tm/mio/ directory and the XTM serializer
should work as expected without the codecs.open(...) work-around.

Christian Wittern

unread,
Oct 25, 2010, 6:29:45 PM10/25/10
to ma...@googlegroups.com
Hi Lars,

Wow, that was fast! Thanks a lot,

Christian

Reply all
Reply to author
Forward
0 new messages