Newsgroups: comp.lang.python
From: j...@pobox.com (John J. Lee)
Date: Wed, 06 Jun 2007 22:07:36 GMT
Local: Wed, Jun 6 2007 6:07 pm
Subject: Re: How do you htmlentities in Python
"Thomas Jollans" <tho...@jollans.NOSPAM.com> writes: Here's one that handles numeric character references, and chooses to > "Adam Atlas" <a...@atlas.st> wrote in message > news:1180965792.757685.132580@q75g2000hsh.googlegroups.com... > > As far as I know, there isn't a standard idiom to do this, but it's > > still a one-liner. Untested, but I think this should work: > > import re > '&(%s);' won't quite work: HTML (and, I assume, SGML, but not XHTML being > Also, this completely ignores non-name entities as also found in XML. (eg leave entity references that are not defined in standard library module htmlentitydefs intact, rather than throwing an exception. It ignores the missing semicolon issue (and note also that IE can cope import htmlentitydefs def unescape_charref(ref): def replace_entities(match): repl = htmlentitydefs.name2codepoint.get(ent[1:-1]) def unescape(data): class UnescapeTests(unittest.TestCase): def test_unescape_charref(self): def test_unescape(self): unittest.main() John You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
| ||||||||||||||