Encoding error in automatically generated author initials

21 views
Skip to first unread message

Xin Wang

unread,
Dec 10, 2010, 9:57:56 PM12/10/10
to asciidoc
Hi,

When I specify author name using wide characters, such as '作者', and do
not set author initials explicitly, xmllint will complain with
following error:

a2x: executing: xmllint --nonet --noout --valid /home/dram/test.xml
/home/dram/test.xml:12: parser error : Input is not proper UTF-8,
indicate encoding !
Bytes: 0xE4 0x3C 0x2F 0x61
<authorinitials>▒</authorinitials>
^

I think the error is caused by the code generating author initials. We
should decode string to unicode before doing slice and than encode it
back.

Following patch may fix it.

--- /usr/bin/asciidoc.py 2010-12-10 20:52:07.000000000 +0800
+++ asciidoc.py 2010-12-11 10:53:36.869094991 +0800
@@ -1568,8 +1568,9 @@
author = author.strip()
author = re.sub(r'\s+',' ', author)
if not initials:
- initials = firstname[:1] + middlename[:1] + lastname[:1]
- initials = initials.upper()
+ initials = (char_decode(firstname)[:1]
+ + char_decode(middlename)[:1] +
char_decode(lastname)[:1])
+ initials = char_encode(initials).upper()
names = [firstname,middlename,lastname,author,initials]
for i,v in enumerate(names):
v = config.subs_specialchars(v)


Thanks,
Xin Wang

Stuart Rackham

unread,
Jan 23, 2011, 11:03:23 PM1/23/11
to asci...@googlegroups.com
Hi Xin

Thanks for the patch, I've applied it to the trunk:
http://code.google.com/p/asciidoc/source/detail?r=a6e786d091c32bee633479b76f005b404dc2fc94

I'm afraid Unicode is not one of my strong points.


Cheers, Stuart

Reply all
Reply to author
Forward
0 new messages