Double comment declarations in unicode(soup)

Manuel Strehl

unread,

Mar 10, 2011, 8:33:37 AM3/10/11

to beautifulsoup

Hi,

I'm using 3.2.0 with Python 2.5 to parse HTML stubs like this:

Hello World

Another Element

Assume, this string is stored in a Unicode object "content". Then my
problem is this:

>>> soup = BeautifulSoup(content)
>>> print [x for x in soup.findAll(text = lambda s: isinstance(s, Comment)]
[' A comment ']
>>> # So far everything is fine
>>> print soup
Hello WorldAnother Element
>>> # still fine
>>> print unicode(soup)
Hello World-->Another Element
>>> # Huh? Where does the double comment declaration come from?

Has anyone an idea, where this weirdness in printing unicode(soup)
could stem from?

Cheers,
Manuel

Leonard Richardson

unread,

Mar 10, 2011, 8:36:29 AM3/10/11

to beauti...@googlegroups.com

Manuel,

The problem is described here:
https://bugs.launchpad.net/beautifulsoup/+bug/686181

It'll be fixed in 4.0, and that bug has a patch for 3.2 (which I haven't tried).

Leonard

Manuel Strehl

unread,

Mar 11, 2011, 2:47:40 PM3/11/11

to beauti...@googlegroups.com

Ah, I see. Actually I've found the corresponding bug report 5 minutes
after posting.