I don’t think the problem is find
but the way the document is being constructed. Beautiful Soup seems to not be linking the elements properly on append. If they aren’t linked properly, find
can’t traverse them properly. I suspect there may be a bug in Beautiful Soup in this regards. Maybe it isn’t recursively inserting the children of the new tag properly, so their next_element
linkage is all wrong.
If you try and walk the chain when appending to cell
before appending the cell
, you get None
:
>>> print(doc.find('table').next_element.next_element.next_element.next_element)
None
but if you append to cell
after you append cell
, you get the element:
>>> print(doc.find('table').next_element.next_element.next_element.next_element)
<tr id="row_1"></tr>
I’m surprised I never noticed it was appending to contents. I guess that is the answer then: use the Tag.append()
to modify an element’s content, not content.append()
directly.
As there are various, diverse ways someone could modify the internals, it would be a pain to create a special content object to warn a user of every possible way they may break the linkage.
Honestly, making it immutable…or maybe not immutable, but at least making direct access private is probably the most reasonable thing. At the very least, I’d document that it shouldn’t be modified directly.
If you ever did a major release (Beautiful Soup 5) aside from removing all the deprecated duplicate methods, this may be something worth considering. I’d consider moving contents
to _contents
and expose a method that returns a generator of the contents, not unlike what etree
does with getChildren
.
Honestly, a safer get_contents()
method could be exposed now that just does a yield
of the items in contents
and could be encouraged now instead of using content directly. I imagine you could deprecate contents
by making it a deprecated property that returns whatever is in _contents
.