On Oct 6, 8:50 am, Andres Riancho <
andres.rian...@gmail.com> wrote:
> In other words: The input tags are being left out of the form! Is this
> really a bug?
From what I can tell the problem is that Beautiful Soup is having
trouble dealing with where the <form> tag is.
This is what is happening:
<table>
<form>
<tr><td></td></tr>
</form>
<table>
Comes out as:
<table>
<form></form>
<tr><td></td></tr>
</table>
The following is what the author should have written:
<form>
<table>
<tr><td></td></tr>
</table>
</form>
In valid HTML the <form> tag surrounds the <table> tag, not the other
way around. No one can seem to figure out exactly how to deal with
tags that improperly surround table children (<tr>, <tbody>, <thead>).
According to Mozilla's DOM Inspector the behavior of Mozilla's DOM
matches BeautifulSoup's handling. Mozilla seems to have some way to
track forms that is independent of its DOM tree. html5lib bumps the
form tag out of the table completely as a sibling to <table> with no
children.
MinimalSoup has an entirely different issue. The <form> tag stays
inside of the outer table, just like in the HTML. The inner table then
jumps out of the outer table and becomes the next sibling of the outer
table.
<table id="outer-table">
<form><tr><td><!-- former location of inner-table --></td></tr></form>
</table>
<table id="inner-table">
<tr>...input tags...</tr>
</table>
-Aaron DeVore