Manuel Strehl
unread,Mar 10, 2011, 8:33:37 AM3/10/11Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to beautifulsoup
Hi,
I'm using 3.2.0 with Python 2.5 to parse HTML stubs like this:
<p>Hello World</p>
<!-- A comment -->
<p>Another Element</p>
Assume, this string is stored in a Unicode object "content". Then my
problem is this:
>>> soup = BeautifulSoup(content)
>>> print [x for x in soup.findAll(text = lambda s: isinstance(s, Comment)]
[' A comment ']
>>> # So far everything is fine
>>> print soup
<p>Hello World</p><!-- A comment --><p>Another Element</p>
>>> # still fine
>>> print unicode(soup)
<p>Hello World</p><!--<!-- A comment -->--><p>Another Element</p>
>>> # Huh? Where does the double comment declaration come from?
Has anyone an idea, where this weirdness in printing unicode(soup)
could stem from?
Cheers,
Manuel