I don’t think the problem is
find but the way the document is being constructed. Beautiful Soup seems to not be linking the elements properly on append. If they aren’t linked properly,
find can’t traverse them properly. I suspect there may be a bug in Beautiful Soup in this regards. Maybe it isn’t recursively inserting the children of the new tag properly, so their
next_element linkage is all wrong.
If you try and walk the chain when appending to
cell before appending the
cell, you get
>>> print(doc.find('table').next_element.next_element.next_element.next_element) None
but if you append to
cell after you append
cell, you get the element:
>>> print(doc.find('table').next_element.next_element.next_element.next_element) <tr id="row_1"></tr>
I’m surprised I never noticed it was appending to contents. I guess that is the answer then: use the
Tag.append() to modify an element’s content, not
As there are various, diverse ways someone could modify the internals, it would be a pain to create a special content object to warn a user of every possible way they may break the linkage.
Honestly, making it immutable…or maybe not immutable, but at least making direct access private is probably the most reasonable thing. At the very least, I’d document that it shouldn’t be modified directly.
If you ever did a major release (Beautiful Soup 5) aside from removing all the deprecated duplicate methods, this may be something worth considering. I’d consider moving
_contents and expose a method that returns a generator of the contents, not unlike what
etree does with
Honestly, a safer
get_contents() method could be exposed now that just does a
yield of the items in
contents and could be encouraged now instead of using content directly. I imagine you could deprecate
contents by making it a deprecated property that returns whatever is in