Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Change contents of tag without encoding HTML entities
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  5 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
spiffytech  
View profile  
 More options Sep 8 2012, 12:08 am
From: spiffytech <spiffyt...@gmail.com>
Date: Fri, 7 Sep 2012 21:08:21 -0700 (PDT)
Local: Sat, Sep 8 2012 12:08 am
Subject: Change contents of tag without encoding HTML entities

I'd like to replace the contents of a tag with some HTML content, but when
I assign the new contents via .string the contents are encoded with HTML
entities:

>>> c = bs.new_tag("div")
>>> c.string = "<h1>test</h1>"
>>> c

<div>&lt;h1&gt;test&lt;/h1&gt;</div>

Having HTML entities instead of renderable HTML doesn't work out so well
when it's time to write the HTML back out. I know I can unescape the text
after the fact, but is there a way to assign it without it getting escaped
in the first place?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leonard Richardson  
View profile  
 More options Sep 8 2012, 9:08 am
From: Leonard Richardson <leona...@segfault.org>
Date: Sat, 8 Sep 2012 09:08:19 -0400
Local: Sat, Sep 8 2012 9:08 am
Subject: Re: Change contents of tag without encoding HTML entities

> I'd like to replace the contents of a tag with some HTML content, but when I
> assign the new contents via .string the contents are encoded with HTML
> entities:

>>>> c = bs.new_tag("div")
>>>> c.string = "<h1>test</h1>"
>>>> c
> <div>&lt;h1&gt;test&lt;/h1&gt;</div>

> Having HTML entities instead of renderable HTML doesn't work out so well
> when it's time to write the HTML back out. I know I can unescape the text
> after the fact, but is there a way to assign it without it getting escaped
> in the first place?

You've set the content of the div tag to the string "<h1>test</h1>".
Since that string contains angle brackets, it needs to be escaped. If
you want the <div> tag to contain an actual <h1> tag, you need to
create that <h1> tag just as you did the <div>:

c = bs.new_tag("div")
h1 = bs.new_tag("h1")
h1.string = "test"
c.append(h1)

Leonard


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
spiffytech  
View profile  
 More options Sep 8 2012, 10:50 am
From: spiffytech <spiffyt...@gmail.com>
Date: Sat, 8 Sep 2012 07:50:12 -0700 (PDT)
Local: Sat, Sep 8 2012 10:50 am
Subject: Re: Change contents of tag without encoding HTML entities

On Saturday, September 8, 2012 9:08:20 AM UTC-4, Leonard Richardson wrote:

> You've set the content of the div tag to the string "<h1>test</h1>".
> Since that string contains angle brackets, it needs to be escaped. If
> you want the <div> tag to contain an actual <h1> tag, you need to
> create that <h1> tag just as you did the <div>:

> c = bs.new_tag("div")
> h1 = bs.new_tag("h1")
> h1.string = "test"
> c.append(h1)

I provided what I though was a minimal example of what I'm trying to
accomplish, but I guess it's not close enough to what I'm actually doing :)

My ultimate goal is this: I have two XML documents. I want to replace the
contents of a certain tag in document 1 with the XML tree inside a certain
tag in document 2. I've tried this:

doc1.find(locale=lang).string = doc2.find(locale=lang)

but I run into the entity escaping problem. I suppose I could loop over the
whole XML tree I want from doc2, creating new tags, copying the attributes
across for each, and inserting them in place in doc1, but that seems like a
lot of work to get right for something that seems so conceptually simple.

Is there an easier way to accomplish this with BeautifulSoup?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leonard Richardson  
View profile  
 More options Sep 8 2012, 12:01 pm
From: Leonard Richardson <leona...@segfault.org>
Date: Sat, 8 Sep 2012 12:01:17 -0400
Local: Sat, Sep 8 2012 12:01 pm
Subject: Re: Change contents of tag without encoding HTML entities

> I provided what I though was a minimal example of what I'm trying to
> accomplish, but I guess it's not close enough to what I'm actually doing :)

> My ultimate goal is this: I have two XML documents. I want to replace the
> contents of a certain tag in document 1 with the XML tree inside a certain
> tag in document 2. I've tried this:

> doc1.find(locale=lang).string = doc2.find(locale=lang)

> but I run into the entity escaping problem. I suppose I could loop over the
> whole XML tree I want from doc2, creating new tags, copying the attributes
> across for each, and inserting them in place in doc1, but that seems like a
> lot of work to get right for something that seems so conceptually simple.

> Is there an easier way to accomplish this with BeautifulSoup?

Whenever you set .string to a value, the value is treated as a string,
and that means entity escaping. To replace the contents of a tag with
another tag, you need to use the tree manipulation methods. The
closest equivalent to the code you wrote is probably this:

locale = doc1.find(locale=lang)
locale.clear()
locale.append(doc2.find(locale=lang))

This will give you a tree that looks like this:

<foo locale="en">
 <bar locale="en">content</bar>
</foo>

But it sounds like you want something more like this:

<foo locale="en">
 content
</foo>

For that I recommend unwrap(), which replaces a tag with its contents.
So your complete code might look like this:

locale1 = doc1.find(locale=lang)
locale2 = doc2.find(locale=lang)

locale1.clear()
locale1.append(locale2)
locale2.unwrap()

References:

http://www.crummy.com/software/BeautifulSoup/bs4/doc/#clear
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#append
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#unwrap

Leonard


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
spiffytech  
View profile  
 More options Sep 8 2012, 12:39 pm
From: spiffytech <spiffyt...@gmail.com>
Date: Sat, 8 Sep 2012 09:39:41 -0700 (PDT)
Local: Sat, Sep 8 2012 12:39 pm
Subject: Re: Change contents of tag without encoding HTML entities

On Saturday, September 8, 2012 12:01:18 PM UTC-4, Leonard Richardson wrote:

> locale1 = doc1.find(locale=lang)
> locale2 = doc2.find(locale=lang)

> locale1.clear()
> locale1.append(locale2)
> locale2.unwrap()

Excellent! That did just what I needed. Thanks!

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »