Copy HTML from <body> and copy into other tree?

55 views
Skip to first unread message

Heck Lennon

unread,
Apr 29, 2025, 2:57:25 AMApr 29
to beautifulsoup
Hello,

I can't figure out how to copy the HTML from the <body> in one tree, and copy it into another tree.

Here's a sample:
==============
html_in = """
<body>
<i>good stuff</i>
</body>
"""

html_out = """
<body>
<div id="body">FILL_ME</div>
</body>
"""
soup_in = BeautifulSoup(html_in,"html.parser")
soup_out = BeautifulSoup(html_out,"html.parser")
body_copy = copy.copy(soup_in.body)
#print(body_copy)

#ValueError: Cannot replace an element with its contents when that element is not part of a tree.
#body_copy.unwrap()
#stuff = body_copy.unwrap()

#NONE stuff = body_copy.string
#PLAIN stuff = body_copy.text
#PLAIN stuff = body_copy.get_text()
#&lt; stuff = str(body_copy)
print(stuff)

#soup_out.find("div", id="body").string.replace_with(body_copy)
soup_out.find("div", id="body").string.replace_with(stuff)
print(soup_out)
==============

Thank you.

A89032AB-A775-472D-9C93-C8114E11C456.png

leonardr

unread,
Apr 29, 2025, 11:57:45 AMApr 29
to beautifulsoup
I wouldn't copy the entire <body> tag, just the part you want to insert into the second tree:

from bs4 import BeautifulSoup
import copy


html_in = """
<body>
<i>good stuff</i>
</body>
"""

html_out = """
<body>
<div id="body">FILL_ME</div>
</body>
"""

soup_in = BeautifulSoup(html_in,"html.parser")
soup_out = BeautifulSoup(html_out,"html.parser")

i_copy = copy.copy(soup_in.i)
soup_out.find("div", id="body").string.replace_with(i_copy)

print(soup_out)
# <body>
# <div id="body"><i>good stuff</i></div>
# </body>

But you can do it after copying the entire <body> tag, as long as you don't try to call manipulation methods on the copy.

soup_copy = copy.copy(soup_in)
soup_out = BeautifulSoup(html_out,"html.parser")
soup_out.find("div", id="body").string.replace_with(soup_copy.i)
# <body>
# <div id="body"><i>good stuff</i></div>
# </body>

Chris Papademetrious

unread,
Apr 29, 2025, 1:55:36 PMApr 29
to beautifulsoup
Hi frdt,

Do you need to leave the original soup_in soup undisturbed? If not, then you could pull the <body> out instead of copying it.


Hi Leonard,

Let's say the source <body> contains many children, and the operation must be nondestructive to soup_in. Would copying the entire <body> then moving its children be a reasonable approach?

body_copy = copy.copy(soup_in.body)
soup_out.find("div", id="body").string.replace_with(*body_copy.children)

 - Chris

Heck Lennon

unread,
May 1, 2025, 4:40:33 AMMay 1
to beautifulsoup
Thanks for the tip.

The example was very basic as illustration, but I actually need to copy a whole web page, not just a single element.

I don't mind editing the source tree, since I can always save it to a new file.

leonardr

unread,
May 1, 2025, 8:53:50 AMMay 1
to beautifulsoup
I see. That's more complicated because the original <body> probably contains multiple tags, but here's a way to do it without explicitly making any copies:

from bs4 import BeautifulSoup


html_in = """
<body>
<i>good stuff</i>
<i>more good stuff</i>

</body>
"""

html_out = """
<body>
<div id="body">FILL_ME</div>
</body>
"""

soup_in = BeautifulSoup(html_in,"html.parser")
soup_out = BeautifulSoup(html_out,"html.parser")

new_body = soup_out.find('div', id='body')
new_body.clear()
new_body.extend(soup_in.body.contents)

print(soup_out)


Heck Lennon

unread,
May 1, 2025, 11:57:16 AMMay 1
to beautifulsoup
Perfect!

Meanwhile, I came up with a work-around, by renaming "body" as "div" so that I don't end up with two "body" elements:

body = soup_in.find('body')
body.name = 'div'
print("Body renamed:\n",body)
soup_out.find("div", id="body").string.replace_with(body)
print("After:\n",soup_out)

Thank you.

Reply all
Reply to author
Forward
0 new messages