Grupos de Google ya no admite publicaciones ni suscripciones nuevas de Usenet. El contenido anterior sigue visible.

html escape sequences

64 vistas
Ir al primer mensaje no leído

Will McGugan

no leída,
18 mar 2005, 6:06:52 a.m.18/3/05
para
Hi,

I'd like to replace html escape sequences, like &nbsp and &#39 with
single characters. Is there a dictionary defined somewhere I can use to
replace these sequences?

Thanks,

Will McGugan

Leif K-Brooks

no leída,
18 mar 2005, 6:46:20 a.m.18/3/05
para
Will McGugan wrote:
> I'd like to replace html escape sequences, like &nbsp and &#39 with
> single characters. Is there a dictionary defined somewhere I can use to
> replace these sequences?

How about this?

import re
from htmlentitydefs import name2codepoint

_entity_re = re.compile(r'&(?:(#)(\d+)|([^;]+));')

def _repl_func(match):
if match.group(1): # Numeric character reference
return unichr(int(match.group(2)))
else:
return unichr(name2codepoint[match.group(3)])

def handle_html_entities(string):
return _entity_re.sub(_repl_func, string)

Will McGugan

no leída,
18 mar 2005, 6:53:27 a.m.18/3/05
para

muchas gracias!

Will McGugan

0 mensajes nuevos