Re: Change contents of tag without encoding HTML entities

888 views
Skip to first unread message

Leonard Richardson

unread,
Sep 8, 2012, 9:08:19 AM9/8/12
to beauti...@googlegroups.com
> I'd like to replace the contents of a tag with some HTML content, but when I
> assign the new contents via .string the contents are encoded with HTML
> entities:
>
>>>> c = bs.new_tag("div")
>>>> c.string = "<h1>test</h1>"
>>>> c
> <div>&lt;h1&gt;test&lt;/h1&gt;</div>
>
> Having HTML entities instead of renderable HTML doesn't work out so well
> when it's time to write the HTML back out. I know I can unescape the text
> after the fact, but is there a way to assign it without it getting escaped
> in the first place?

You've set the content of the div tag to the string "<h1>test</h1>".
Since that string contains angle brackets, it needs to be escaped. If
you want the <div> tag to contain an actual <h1> tag, you need to
create that <h1> tag just as you did the <div>:

c = bs.new_tag("div")
h1 = bs.new_tag("h1")
h1.string = "test"
c.append(h1)

Leonard

spiffytech

unread,
Sep 8, 2012, 10:50:12 AM9/8/12
to beauti...@googlegroups.com, leon...@segfault.org
On Saturday, September 8, 2012 9:08:20 AM UTC-4, Leonard Richardson wrote:
You've set the content of the div tag to the string "<h1>test</h1>".
Since that string contains angle brackets, it needs to be escaped. If
you want the <div> tag to contain an actual <h1> tag, you need to
create that <h1> tag just as you did the <div>:

c = bs.new_tag("div")
h1 = bs.new_tag("h1")
h1.string = "test"
c.append(h1)

I provided what I though was a minimal example of what I'm trying to accomplish, but I guess it's not close enough to what I'm actually doing :)

My ultimate goal is this: I have two XML documents. I want to replace the contents of a certain tag in document 1 with the XML tree inside a certain tag in document 2. I've tried this:

doc1.find(locale=lang).string = doc2.find(locale=lang)

but I run into the entity escaping problem. I suppose I could loop over the whole XML tree I want from doc2, creating new tags, copying the attributes across for each, and inserting them in place in doc1, but that seems like a lot of work to get right for something that seems so conceptually simple. 

Is there an easier way to accomplish this with BeautifulSoup? 

Leonard Richardson

unread,
Sep 8, 2012, 12:01:17 PM9/8/12
to beauti...@googlegroups.com
> I provided what I though was a minimal example of what I'm trying to
> accomplish, but I guess it's not close enough to what I'm actually doing :)
>
> My ultimate goal is this: I have two XML documents. I want to replace the
> contents of a certain tag in document 1 with the XML tree inside a certain
> tag in document 2. I've tried this:
>
> doc1.find(locale=lang).string = doc2.find(locale=lang)
>
> but I run into the entity escaping problem. I suppose I could loop over the
> whole XML tree I want from doc2, creating new tags, copying the attributes
> across for each, and inserting them in place in doc1, but that seems like a
> lot of work to get right for something that seems so conceptually simple.
>
> Is there an easier way to accomplish this with BeautifulSoup?

Whenever you set .string to a value, the value is treated as a string,
and that means entity escaping. To replace the contents of a tag with
another tag, you need to use the tree manipulation methods. The
closest equivalent to the code you wrote is probably this:

locale = doc1.find(locale=lang)
locale.clear()
locale.append(doc2.find(locale=lang))

This will give you a tree that looks like this:

<foo locale="en">
<bar locale="en">content</bar>
</foo>

But it sounds like you want something more like this:

<foo locale="en">
content
</foo>

For that I recommend unwrap(), which replaces a tag with its contents.
So your complete code might look like this:

locale1 = doc1.find(locale=lang)
locale2 = doc2.find(locale=lang)

locale1.clear()
locale1.append(locale2)
locale2.unwrap()

References:

http://www.crummy.com/software/BeautifulSoup/bs4/doc/#clear
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#append
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#unwrap

Leonard

spiffytech

unread,
Sep 8, 2012, 12:39:41 PM9/8/12
to beauti...@googlegroups.com, leon...@segfault.org
On Saturday, September 8, 2012 12:01:18 PM UTC-4, Leonard Richardson wrote:
locale1 = doc1.find(locale=lang)
locale2 = doc2.find(locale=lang)

locale1.clear()
locale1.append(locale2)
locale2.unwrap()

Excellent! That did just what I needed. Thanks!
Reply all
Reply to author
Forward
0 new messages