issue with TAG and unicode

40 views
Skip to first unread message

carlo

unread,
Jan 15, 2011, 5:49:02 AM1/15/11
to web2py-users
I have this code:

myrows=[]
for r in range(riga+1,nrows):
myrow=[]
for c in range(ncols):
cell=mysheet.cell_value(r,c) ##reading some text, cell is
unicode object
myrow.append(cell)

myrows.append(myrow)

idx=range(len(colnames)) ##colnames=['label1','label2','label3']
colnames=[item.replace('.','_') for item in colnames]
records=[]

for row in myrows: records.append(TAG['item'](*[TAG[colnames[i]]
(row[i]) for i in idx]))
response.headers['Content-Type']='application/xml'
return str(TAG['root'](*records))


As commented above cell is unicode object so myrows is a list of lists
of unicode objects.

If cell is an ascii char everything is ok; if it is not I get:

Traceback (most recent call last):
File "C:\Python26\web2py\gluon\restricted.py", line 188, in
restricted
exec ccode in environment
File "c:/Python26/web2py/applications/xcel2xml/controllers/
default.py", line 334, in <module>
File "C:\Python26\web2py\gluon\globals.py", line 95, in <lambda>
self._caller = lambda f: f()
File "c:/Python26/web2py/applications/xcel2xml/controllers/
default.py", line 255, in step44
return gen_xml(nomefile,riga,images)
File "c:/Python26/web2py/applications/xcel2xml/controllers/
default.py", line 296, in gen_xml
return str(TAG['root'](*records))
File "C:\Python26\web2py\gluon\html.py", line 797, in __str__
return self.xml()
File "C:\Python26\web2py\gluon\html.py", line 780, in xml
(fa, co) = self._xml()
File "C:\Python26\web2py\gluon\html.py", line 771, in _xml
self.components])
File "C:\Python26\web2py\gluon\html.py", line 110, in xmlescape
return data.xml()
File "C:\Python26\web2py\gluon\html.py", line 780, in xml
(fa, co) = self._xml()
File "C:\Python26\web2py\gluon\html.py", line 771, in _xml
self.components])
File "C:\Python26\web2py\gluon\html.py", line 110, in xmlescape
return data.xml()
File "C:\Python26\web2py\gluon\html.py", line 790, in xml
return '<%s%s>%s</%s>' % (self.tag, fa, co, self.tag)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
5: ordinal not in range(128)

From Inspect Attributes:
args= ('ascii', 'label\xc3\xa0', 5, 6, 'ordinal not in range(128)')

And from Variables:

Variables
fa ''
self <gluon.html.__tag__ object at 0x033D2490>
self.tag u'label'
co 'label\xc3\xa0'
args ('ascii', 'label\xc3\xa0', 5, 6, 'ordinal not in range(128)')

the string raising the error is "labelà" and I see from Variables that
it was encoded in utf-8 by xml I think.

The original string was "label\xe0" (as I said a unicode string).

Any suggestion? This is driving me crazy..thank you

carlo

carlo

unread,
Jan 15, 2011, 12:33:56 PM1/15/11
to web2py-users
Another example, same strange error:

File "C:\Python25\web2py\gluon\html.py", line 790, in xml
return '<%s%s>%s</%s>' % (self.tag, fa, co, self.tag)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
3: ordinal not in range(128)

Variables
fa ''
self <gluon.html.__tag__ object at 0x018C2F10>
self.tag u'titolo'
co 'pap\xc3\xa0'

carlo

unread,
Jan 15, 2011, 1:25:35 PM1/15/11
to web2py-users
ok narrowed down the problem, this IS working:

def index():
response.headers['Content-Type']='application/xml'
rows=[['6.0', u'pap\xe0', u'kloiuy', '1995.0']]
return export_xml(rows)


def export_xml(rows):
colnames=['prima','seconda','terza','quarta']
idx=range(len(colnames))
records=[]
for row in rows: records.append(TAG['record'](*[TAG[colnames[i]]
(row[i]) for i in idx]))
return str(TAG['records'](*records))

but this IS NOT working with Unicode error etc etc:

def index():
response.headers['Content-Type']='application/xml'
rows=[['6.0', u'pap\xe0', u'kloiuy', '1995.0']]
return export_xml(rows)


def export_xml(rows):
colnames=[u'prima',u'seconda',u'terza',u'quarta']
idx=range(len(colnames))
records=[]
for row in rows: records.append(TAG['record'](*[TAG[colnames[i]]
(row[i]) for i in idx]))
return str(TAG['records'](*records))


so the problem is with tag names..I will try to have a look at html.py
but your opinion is welcome.

carlo

carlo

unread,
Jan 16, 2011, 4:22:12 AM1/16/11
to web2py-users
>>> r=TAG['pippo'](u'plutò')
>>> str(r)
'<pippo>plut\xc3\xb2</pippo>'
>>> r=TAG[u'pippo'](u'plutò')
>>> str(r)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "C:\Python26\web2py\gluon\html.py", line 797, in __str__
return self.xml()
File "C:\Python26\web2py\gluon\html.py", line 790, in xml
return '<%s%s>%s</%s>' % (self.tag, fa, co, self.tag)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
4: ordinal
not in range(128)


Massimo Di Pierro

unread,
Jan 16, 2011, 11:08:03 AM1/16/11
to web2py-users
Tag names should not be unicode.

Jonathan Lundell

unread,
Jan 16, 2011, 11:08:04 AM1/16/11
to web...@googlegroups.com

The content of TAG is explicitly decoded, but the tag itself is not. I'm guessing that making the tag a unicode string is forcing a second decode of your content. Or something like that.

Leave the tag itself a plain string.

Message has been deleted

carlo

unread,
Jan 16, 2011, 5:00:43 PM1/16/11
to web2py-users
> Leave the tag itself a plain string.

don't you think this is a hard constraint?
Personally I do.

TAG promised to be such a handy tool to generate XML from Excel data,
which is the purpose of my app, but unfortunately this will not be
true.

Jonathan Lundell

unread,
Jan 16, 2011, 5:30:44 PM1/16/11
to web...@googlegroups.com

I suspect that TAG (and pretty much all of the stuff in gluon.html) was intended for html rather than xml. I wouldn't think it'd be too hard to support unicode tags, maybe with an option argument.

carlo

unread,
Jan 16, 2011, 6:19:08 PM1/16/11
to web2py-users
I am with you and I think would be useful, TAG would be so useful with
XML. Actually skimming the code I could not find where the quirk is.
Additionally I see that unicode TAG names are indeed supported some
way:

>>> name=u'là'
>>> value='plain'
>>> TAG[name](value).xml()
u'<l\xe0>plain</l\xe0>'

Even value can be unicode but only with ASCII chars:

>>> name=u'là'
>>> value=u'plain'
>>> TAG[name](value).xml()
u'<l\xe0>plain</l\xe0>'

Problems arise when both name and values are not ascii chars:

>>> name=u'là'
>>> value=u'parà'
>>> TAG[name](value).xml()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "C:\Python26\web2py\gluon\html.py", line 790, in xml
return '<%s%s>%s</%s>' % (self.tag, fa, co, self.tag)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
3: ordinal
not in range(128)


Reply all
Reply to author
Forward
0 new messages