behavior with p, dl and nesting tags

18 views
Skip to first unread message

Tim Arnold

unread,
Apr 15, 2009, 10:21:54 AM4/15/09
to beautifulsoup
Hi, I'm not positive, but I think there's a bug in BeautifulSoup when
it comes to paragraphs containing a deflist. In my case, I'm getting
structures like this:

<dl>
<dt>term</dt>
<dd>
<p>this is the definition:
<dl>
<dt>first case</dt>
<dd><p>first case definition</p></dd>
</dl>
</p>
</dd>
</dl>

From the specs for HTML3.2 and HTML4.0, the <p> element can only
contain inline elements, not block elements, so the above snippet is
not valid.

The first para should be closed before the nested <dl> begins. I can
probably handle this myself by modifying the RESET_NESTING_TAGS and
NESTABLE_TAGS dictionaries, but I think BeautifulSoup should handle it
out of the box.

Please let me know if I'm wrong on this--and esp, if you have a
solution by modifying the nesting definition dictionaries.
thanks,
--Tim
Reply all
Reply to author
Forward
0 new messages