Tim Arnold
unread,Apr 15, 2009, 10:21:54 AM4/15/09Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to beautifulsoup
Hi, I'm not positive, but I think there's a bug in BeautifulSoup when
it comes to paragraphs containing a deflist. In my case, I'm getting
structures like this:
<dl>
<dt>term</dt>
<dd>
<p>this is the definition:
<dl>
<dt>first case</dt>
<dd><p>first case definition</p></dd>
</dl>
</p>
</dd>
</dl>
From the specs for HTML3.2 and HTML4.0, the <p> element can only
contain inline elements, not block elements, so the above snippet is
not valid.
The first para should be closed before the nested <dl> begins. I can
probably handle this myself by modifying the RESET_NESTING_TAGS and
NESTABLE_TAGS dictionaries, but I think BeautifulSoup should handle it
out of the box.
Please let me know if I'm wrong on this--and esp, if you have a
solution by modifying the nesting definition dictionaries.
thanks,
--Tim