[...]
> Here is the relevant part of my test file:
> #!/usr/bin/env python -*- coding: utf-8 -*-
I think this is the source of the failure. You should put
# -*- coding: utf-8 -*-
into the 2nd line of your source file and not into the same line as
the interpreter. Python does not detect the correct encoding of the
source file.
This works for me:
>>> import mappa
>>> conn = mappa.connect()
>>> src = 'http://cxtm-tests.svn.sourceforge.net/viewvc/cxtm-tests/trunk/ltm/in/utf-8.ltm'
>>> conn.load(src, into='http://www.example.org/my.map', format='ltm')
>>>
And the referenced file uses also UTF-8.
Best regards,
Lars
--
Semagia
<http://www.semagia.com>
... and this too:
>>> import mappa
>>> from urllib import urlopen
>>> s = urlopen('http://cxtm-tests.svn.sourceforge.net/viewvc/cxtm-tests/trunk/ltm/in/utf-8.ltm').read()
>>> s
'@"utf-8"\n[hiragana = "\xe3\x81\xb2\xe3\x82\x89\xe3\x81\x8c\xe3\x81\xaa"\n @"http://psi.ontopia.net/iso/15924.xtm#hira"]\n'
>>> conn = mappa.connect()
>>> conn.loads(s, into='http://example.org/testmap', format='ltm')
>>>
I think it's an encoding issue of your Python file.
On 2010-10-28 06:39, Lars Heuer wrote:
>
>> #!/usr/bin/env python -*- coding: utf-8 -*-
>>
> I think this is the source of the failure.
Unfortunately not, in this case you are wrong. Since the file does
contain non-ASCII utf-8 characters, the Python interpreter immediately
would complain about these, if the encoding declaration where not
recocgnized.
> You should put
>
> # -*- coding: utf-8 -*-
>
> into the 2nd line of your source file and not into the same line as
> the interpreter. Python does not detect the correct encoding of the
> source file.
>
In fact I tried this, just to be 100% sure -- it does not change anything.
> This works for me:
>
> >>> import mappa
> >>> conn = mappa.connect()
> >>> src = 'http://cxtm-tests.svn.sourceforge.net/viewvc/cxtm-tests/trunk/ltm/in/utf-8.ltm'
> >>> conn.load(src, into='http://www.example.org/my.map', format='ltm')
> >>>
>
>
Then it depends on what is meant by "works". Please also note that in
my example, I am using the string loading, which might have different
problems from the file loading. In fact, I have used the file loading
on my files all the time, so I can confirm that there is no problem there.
All the best,
Christian
--
Christian Wittern
Institute for Research in Humanities, Kyoto University
47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN
On 2010-10-28 06:50, Lars Heuer wrote:
>
> ... and this too:
>
> >>> import mappa
> >>> from urllib import urlopen
> >>> s = urlopen('http://cxtm-tests.svn.sourceforge.net/viewvc/cxtm-tests/trunk/ltm/in/utf-8.ltm').read()
> >>> s
> '@"utf-8"\n[hiragana = "\xe3\x81\xb2\xe3\x82\x89\xe3\x81\x8c\xe3\x81\xaa"\n @"http://psi.ontopia.net/iso/15924.xtm#hira"]\n'
>
This in fact proves that the file is being read as 8-bit bytes and not
decoded into unicode. Try open it using the codecs module:
>>> import codecs
>>> f=codecs.open('/tmp/utf-8.ltm', 'r', 'utf-8')
>>> s=f.read()
>>> s
u'@"utf-8"\n[hiragana = "\u3072\u3089\u304c\u306a"\n
@"http://psi.ontopia.net/iso/15924.xtm#hira"]\n'
This is the correct representation of the file.
If you now do this:
> >>> conn = mappa.connect()
> >>> conn.loads(s, into='http://example.org/testmap', format='ltm')
> >>>
>
>
You will get exactly the error I reported.
> I think it's an encoding issue of your Python file.
>
Which proves that this is an encoding issue within the mio reader somewhere.
All the best,
Chris, who has had its share of encoding issues to deal with:-)
[...]
> u'@"utf-8"\n[hiragana = "\u3072\u3089\u304c\u306a"\n
> @"http://psi.ontopia.net/iso/15924.xtm#hira"]\n'
> This is the correct representation of the file.
You're right, of course.
> If you now do this:
>> >>> conn = mappa.connect()
>> >>> conn.loads(s, into='http://example.org/testmap', format='ltm')
>> >>>
>>
>>
> You will get exactly the error I reported.
Exactly :(
[...]
> who has had its share of encoding issues to deal with:-)
I'll fix it asap.
Cheers,
[...]
> Lars, I hope you did not misunderstand this. What I wanted to say was only
> that, from the experience of many years of programming with East-Asian
> characters I know an encoding issue when I see one.
No, I did not misunderstand it. Probably I forgot a smiley. ;) It's a
bug and I'll fix it. :)
Hacking topic maps in python is so much more fun than doing the same in Java:-)
All the best,
Christian
[...]
>> It's a bug and I'll fix it. :)
>>
> That's great, take your time.
It's fixed in rev. 385:
>>> import codecs, mappa
>>> from urllib import urlopen
>>> s = codecs.getreader('utf-8')(urlopen('http://cxtm-tests.svn.sourceforge.net/viewvc/cxtm-tests/trunk/ltm/in/utf-8.ltm')).read()
>>> s
u'@"utf-8"\n[hiragana = "\u3072\u3089\u304c\u306a"\n @"http://psi.ontopia.net/iso/15924.xtm#hira"]\n'
>>> conn = mappa.connect()
>>> conn.loads(s, into='http://www.semagia.com/map', format='ltm')
>>> tm = conn.get('http://www.semagia.com/map')
>>> for topic in tm.topics:
for name in topic.names:
print name.value
ひらがな
>>>
I'll prepare a tm release soon. Meanwhile you may copy
<https://code.google.com/p/mappa/source/browse/tm/trunk/src/tm/mio/_source.py>
into your
site-packages/tm/mio/
folder
> Hacking topic maps in python is so much more fun than doing the same in Java:-)
Python is more fun anyway. :) Well, until it comes to Unicode issues.
;)
On 28 October 2010 18:53, Lars Heuer <he...@semagia.com> wrote:
>
> ひらがな
yes, that's right!
> >>>
>
> I'll prepare a tm release soon. Meanwhile you may copy
> <https://code.google.com/p/mappa/source/browse/tm/trunk/src/tm/mio/_source.py>
> into your
> site-packages/tm/mio/
> folder
Yep, this works for me now. Great.
>
>> Hacking topic maps in python is so much more fun than doing the same in Java:-)
>
> Python is more fun anyway. :) Well, until it comes to Unicode issues.
> ;)
But Java has its own set of Unicode issues, especially with so-called
wide characters. Python is more transparent and issues can be solved
as soon as they are understood.
Cheers,
Christian
--
Christian Wittern, Kyoto
[...]
>> I'll prepare a tm release soon. Meanwhile you may copy
>> <https://code.google.com/p/mappa/source/browse/tm/trunk/src/tm/mio/_source.py>
>> into your
>> site-packages/tm/mio/
>> folder
> Yep, this works for me now. Great.
Goodie. Thanks for reporting the issue and for not trusting my initial
explanations :)