.replace_with modifies dom tree, but if you .find() it's still there

35 views
Skip to first unread message

Lawrence Stewart

unread,
Mar 11, 2021, 10:48:59 PM3/11/21
to beautifulsoup
it seems like .replace_with() only modifies the dom tree and doesn't remove the node from the soup object...

sorry I don't know how to post code on here, but to recreate:
1 - loop through .descendants
2 - random_node.replace_with(SOME_OTHER_NODE)
3 - after the loop is done, .find('random_node_class'), it still finds it

facelessuser

unread,
Mar 13, 2021, 4:45:04 PM3/13/21
to beautifulsoup

Without an example, it is difficult to know what you are doing, but here are two examples and they both seem to work fine:

from bs4 import BeautifulSoup, Tag
from bs4.element import soupsieve

MARKUP = """
<div class="a">
    <div class="b">
        <div class="c">
            <div class="d">
            </div>
        </div>
    </div>
</div>
"""

print('===== Test 1 =====')

soup1 = BeautifulSoup(MARKUP, 'html5lib')
soup1.select_one('div.c').replace_with('test')
print('--- Is it still there? ---')
print(soup1.select_one('div.c') is not None)
print('--- Result ---')
print(soup1)

print('\n\n===== Test 2 =====')

soup2 = BeautifulSoup(MARKUP, 'html5lib')
is_c = soupsieve.compile('div.c')

for el in soup2.descendants:
    if isinstance(el, Tag) and is_c.match(el):
        el.replace_with('test')

print('--- Is it still there? ---')
print(soup2.select_one('div.c') is not None)
print('--- Result ---')
print(soup2)

Output

===== Test 1 =====
--- Is it still there? ---
False
--- Result ---
<html><head></head><body><div class="a">
    <div class="b">
        test
    </div>
</div>
</body></html>

===== Test 2 =====
--- Is it still there? ---
False
--- Result ---
<html><head></head><body><div class="a">
    <div class="b">
        test
    </div>
</div>
</body></html>

Now, should you be modifying a document tree while in the middle of looping? Probably not as you are updating the loop items while you are looping through them. If I had to guess, this is why you are running into issues.

Lawrence Stewart

unread,
Mar 13, 2021, 8:39:20 PM3/13/21
to beauti...@googlegroups.com
Thanks for your detailed reply, I'm going to have to do some digging and find out what's causing the side effect. Cheers.

--
You received this message because you are subscribed to a topic in the Google Groups "beautifulsoup" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beautifulsoup/XmPZNGx2vu8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beautifulsou...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beautifulsoup/b0c374dd-ecf0-4f67-96fd-a31f01094120n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages