Re: BeautifulStoneSoup does not handle nested tags with identical names correctly.

19 views
Skip to first unread message

Leonard Richardson

unread,
Dec 11, 2007, 10:00:59 PM12/11/07
to Nathan Wilcox, beauti...@googlegroups.com
Nathan,

> I'm not sure if this is a known issue. Do you know of a workaround?
>
>
>
> The parse result of BeautifulStoneSoup is incorrect, but for BeautifulSoup
> it is correct:
>
>
>
> >>>
> BeautifulSoup.BeautifulStoneSoup('<ul><li><ul><li>A1</li><li>A2</li></ul></li><li>B</li></ul>')
>
> <ul><li></li></ul><ul><li>A1</li><li>A2</li></ul><li>B</li>
>
>
>
> >>>
> BeautifulSoup.BeautifulSoup('<ul><li><ul><li>A1</li><li>A2</li></ul></li><li>B</li></ul>')
>
> <ul><li><ul><li>A1</li><li>A2</li></ul></li><li>B</li></ul>

BeautifulStoneSoup has no knowledge of any XML vocabulary, so it makes
a set of default assumptions including "tags with the same name can't
be nested." If you have a tag that can be nested inside itself, you
need to subclass BeautifulStoneSoup and add that tag name to
NESTABLE_TAGS. This section of the doc shows how to do that:

http://crummy.com/software/BeautifulSoup/documentation.html#Customizing%20the%20Parser

Leonard

Reply all
Reply to author
Forward
0 new messages