Double comment declarations in unicode(soup)

16 views
Skip to first unread message

Manuel Strehl

unread,
Mar 10, 2011, 8:33:37 AM3/10/11
to beautifulsoup
Hi,

I'm using 3.2.0 with Python 2.5 to parse HTML stubs like this:

<p>Hello World</p>
<!-- A comment -->
<p>Another Element</p>

Assume, this string is stored in a Unicode object "content". Then my
problem is this:

>>> soup = BeautifulSoup(content)
>>> print [x for x in soup.findAll(text = lambda s: isinstance(s, Comment)]
[' A comment ']
>>> # So far everything is fine
>>> print soup
<p>Hello World</p><!-- A comment --><p>Another Element</p>
>>> # still fine
>>> print unicode(soup)
<p>Hello World</p><!--<!-- A comment -->--><p>Another Element</p>
>>> # Huh? Where does the double comment declaration come from?

Has anyone an idea, where this weirdness in printing unicode(soup)
could stem from?

Cheers,
Manuel

Leonard Richardson

unread,
Mar 10, 2011, 8:36:29 AM3/10/11
to beauti...@googlegroups.com
Manuel,

The problem is described here:
https://bugs.launchpad.net/beautifulsoup/+bug/686181

It'll be fixed in 4.0, and that bug has a patch for 3.2 (which I haven't tried).

Leonard

Manuel Strehl

unread,
Mar 11, 2011, 2:47:40 PM3/11/11
to beauti...@googlegroups.com
Ah, I see. Actually I've found the corresponding bug report 5 minutes
after posting.

Thanks for the work!

Cheers,
Manuel

Reply all
Reply to author
Forward
0 new messages