I would like to have a function that can convert '>' into '>',
'&' into '&' etc. I could not find how to do it easily (I have a
code snippet for the opposite).
Thanks,
Laszlo
def convertentity(m):
"""Convert a HTML entity into normal string (ISO-8859-1)"""
if m.group(1)=='#':
try:
return chr(int(m.group(2)))
except ValueError:
return '&#%s;' % m.group(2)
try:
return htmlentitydefs.entitydefs[m.group(2)]
except KeyError:
return '&%s;' % m.group(2)
def unquotehtml(s):
"""Convert a HTML quoted string into normal string (ISO-8859-1).
Works with &#XX; and with > etc."""
return re.sub(r'&(#?)(.+?);',convertentity,s)
You can use htmlentitydefs module to help with this.
import htmlentitydefs
chr(htmlentitydefs.name2codepoint['gt'])
and (to go the other way)
htmlentitydefs.codepoint2name[ord('>')]
-Larry
You can use htmlentitydefs module to help with this.