Hi.
I found function to unescape html data, which I believe would be very
prudent to put into framework itself.
from htmlentitydefs import name2codepoint
def replace_entities(match):
try:
ent = match.group(1)
if ent[0] == "#":
if ent[1] == 'x' or ent[1] == 'X':
return unichr(int(ent[2:], 16))
else:
return unichr(int(ent[1:], 10))
return unichr(name2codepoint[ent])
except:
return match.group()
entity_re = re.compile(r'&(#?[A-Za-z0-9]+?);')
def html_unescape(data):
return entity_re.sub(replace_entities, data)
Tnx to author.
http://blog.client9.com/2008/10/html-unescape-in-python.html