Removing attributes from tags

927 views
Skip to first unread message

Bruce Eckel

unread,
Jan 30, 2012, 11:04:11 AM1/30/12
to beautifulsoup
Here's how I'm removing attributes from my h1 tags:

""" Change h1 with attributes to plain h1 """
for header in soup.findAll('h1'):
h1 = re.sub("<h1.*?>", "<h1>", str(header))
header.replaceWith(BeautifulSoup(h1))

# Must re-render soup:
soup = BeautifulSoup(soup.renderContents())
for header in soup.findAll('h1'):
print header

Basically I manipulate the string directly and then turn it back into
a node by using the BeautifulSoup parser. This works, but only if I re-
render the soup, which seems ungainly and is definitely slow.

What's the right way to do this?

Thanks!

Jim Tittsler

unread,
Jan 30, 2012, 5:40:15 PM1/30/12
to beauti...@googlegroups.com
On Tue, Jan 31, 2012 at 05:04, Bruce Eckel <bruce...@gmail.com> wrote:
> Here's how I'm removing attributes from my h1 tags:
>
> """ Change h1 with attributes to plain h1 """
> for header in soup.findAll('h1'):
>    h1 = re.sub("<h1.*?>", "<h1>", str(header))
>    header.replaceWith(BeautifulSoup(h1))

You can use the dictionary interface to discover and delete the attributes:

for header in soup.findAll('h1'):

for attr, val in reversed(header.attrs):
del (header[attr])


--
Jim Tittsler http://www.OnJapan.net/ GPG: 0x01159DB6
Python Starship http://Starship.Python.net/crew/jwt/
Mailman IRC irc://irc.freenode.net/#mailman

Bruce Eckel

unread,
Jan 31, 2012, 10:27:40 AM1/31/12
to beauti...@googlegroups.com
Thanks. Ironically, I *just* discovered that very thing this morning.

Question, though: why did you use "reversed()" in this case?

I'm finding that processing HTML seems to be a combination of doing simple replacements and regular expressions on the plain text, and other things using BeautifulSoup. Or I just haven't figured out enough yet about how BeautifulSoup works.

-- Bruce Eckel
www.Reinventing-Business.com
www.MindviewInc.com




--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To post to this group, send email to beauti...@googlegroups.com.
To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.


Jim Tittsler

unread,
Feb 1, 2012, 5:04:00 AM2/1/12
to beauti...@googlegroups.com
On Wed, Feb 1, 2012 at 04:27, Bruce Eckel <bruce...@gmail.com> wrote:
> Question, though: why did you use "reversed()" in this case?

I didn't test it, but I was leery of mutating the sequence as I
iterated over it.

> I'm finding that processing HTML seems to be a combination of doing simple
> replacements and regular expressions on the plain text, and other things
> using BeautifulSoup. Or I just haven't figured out enough yet about how
> BeautifulSoup works.

I rarely get out the regular expressions until I've isolated the text
nodes I'm interested in... unless I have to "fix" some broken HTML
before making soup.

Reply all
Reply to author
Forward
0 new messages