Google Groups unterstützt keine neuen Usenet-Beiträge oder ‑Abos mehr. Bisherige Inhalte sind weiterhin sichtbar.

html escape sequences

64 Aufrufe
Direkt zur ersten ungelesenen Nachricht

Will McGugan

ungelesen,
18.03.2005, 06:06:5218.03.05
an
Hi,

I'd like to replace html escape sequences, like &nbsp and &#39 with
single characters. Is there a dictionary defined somewhere I can use to
replace these sequences?

Thanks,

Will McGugan

Leif K-Brooks

ungelesen,
18.03.2005, 06:46:2018.03.05
an
Will McGugan wrote:
> I'd like to replace html escape sequences, like &nbsp and &#39 with
> single characters. Is there a dictionary defined somewhere I can use to
> replace these sequences?

How about this?

import re
from htmlentitydefs import name2codepoint

_entity_re = re.compile(r'&(?:(#)(\d+)|([^;]+));')

def _repl_func(match):
if match.group(1): # Numeric character reference
return unichr(int(match.group(2)))
else:
return unichr(name2codepoint[match.group(3)])

def handle_html_entities(string):
return _entity_re.sub(_repl_func, string)

Will McGugan

ungelesen,
18.03.2005, 06:53:2718.03.05
an

muchas gracias!

Will McGugan

0 neue Nachrichten