Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

convert html entities into real chars

44 views
Skip to first unread message

Laszlo Nagy

unread,
Apr 10, 2007, 11:00:16 AM4/10/07
to pytho...@python.org

Hi,

I would like to have a function that can convert '>' into '>',
'&' into '&' etc. I could not find how to do it easily (I have a
code snippet for the opposite).
Thanks,

Laszlo

Laszlo Nagy

unread,
Apr 10, 2007, 11:24:11 AM4/10/07
to pytho...@python.org

> I would like to have a function that can convert '>' into '>',
> '&' into '&' etc. I could not find how to do it easily (I have a
> code snippet for the opposite).
Found it, sorry

def convertentity(m):
"""Convert a HTML entity into normal string (ISO-8859-1)"""
if m.group(1)=='#':
try:
return chr(int(m.group(2)))
except ValueError:
return '&#%s;' % m.group(2)
try:
return htmlentitydefs.entitydefs[m.group(2)]
except KeyError:
return '&%s;' % m.group(2)

def unquotehtml(s):
"""Convert a HTML quoted string into normal string (ISO-8859-1).

Works with &#XX; and with   > etc."""
return re.sub(r'&(#?)(.+?);',convertentity,s)

Larry Bates

unread,
Apr 10, 2007, 11:33:59 AM4/10/07
to Laszlo Nagy, pytho...@python.org

You can use htmlentitydefs module to help with this.

import htmlentitydefs

chr(htmlentitydefs.name2codepoint['gt'])

and (to go the other way)

htmlentitydefs.codepoint2name[ord('>')]

-Larry

Larry Bates

unread,
Apr 10, 2007, 11:33:59 AM4/10/07
to Laszlo Nagy, pytho...@python.org

You can use htmlentitydefs module to help with this.

Message has been deleted
0 new messages