BS4: Inserting a tag before an element, moving a sub-element

499 views
Skip to first unread message

Bruce Eckel

unread,
Feb 6, 2012, 9:40:12 AM2/6/12
to beauti...@googlegroups.com
After thrashing around for awhile trying to do something completely within BS4, I finally threw up my hands and converted the soup to a string, then manipulated the string, then converted it back to soup and did soup manipulations:

# Move 'a' tags in <h1> to chapterBreak tags preceeding <h1>
# <span class="chapterBreak"><a name="CHAPTER_ANCHOR"></a></span>
text = soup.encode("ascii", substitute_html_entities=True)
text = text.replace("<h1", '''<span class="chapterBreak"></span><h1''')
soup = BeautifulSoup(text)
for tag in soup.find_all("h1"):
    tag.previous_element.append(tag.a) # Actually MOVES tag.a

It seemed like I should have been able to accomplish this completely within soup but I kept getting stymied, so the above code turned out to be the easiest and most straightforward way to do it (and the lxml parser makes it fast).

Problems I ran into trying to do it within BS:

1) I don't know how to make a Tag that has nothing in it. If you say 
soup.new_tag("")
it creates <></>, which is fine because then you can build the tag by saying things like t.name = "span". But it would also be nice to have a tag with nothing at all in it that you could append just what you want into without having to cope with the extra surrounding <whatever></whatever>

2) I don't know how to make a deep copy. If you move a tag around it drags all its connections with it, so you end up easily getting into infinite loop situations when you do replace_with().

For all I know, taking the hamfisted approach and converting to a string, performing string operations, then converting back to soup might be a perfectly appropriate technique. But I didn't see it suggested in the documentation, and the documentation seems to imply that you should be able to do everything within BS so I try to do it and feel like I'm not understanding something if I have to fall back to the string approach.

-- Bruce Eckel
www.Reinventing-Business.com
www.MindviewInc.com

Leonard Richardson

unread,
Feb 6, 2012, 12:42:13 PM2/6/12
to beauti...@googlegroups.com
Bruce,

I can't visualize what you're trying to do. Can you present a
simplified version of the markup and explain what you want to
accomplish?

> 1) I don't know how to make a Tag that has nothing in it. If you say
> soup.new_tag("")
> it creates <></>, which is fine because then you can build the tag by saying
> things like t.name = "span". But it would also be nice to have a tag with
> nothing at all in it that you could append just what you want into without
> having to cope with the extra surrounding <whatever></whatever>

What's the difference between this "tag that has nothing in it", and a list?

Would it help if there was a method Tag.set_contents(), where you can
pass in a list of tags and strings, and have them become the new
contents of a tag?

> 2) I don't know how to make a deep copy. If you move a tag around it drags
> all its connections with it, so you end up easily getting into infinite loop
> situations when you do replace_with().

We talked about the infinite loop off-list. Is that how you're moving
the tags around? I recommend using extract() and
insert()/replace_with() to move tags around, rather than just using
replace_with().

Leonard

Bruce Eckel

unread,
Feb 6, 2012, 1:56:41 PM2/6/12
to beauti...@googlegroups.com
Would it make sense to have a section called "moving tags around?"


> What's the difference between this "tag that has nothing in it", and a list?

All this is intuitive to you -- I didn't *know* I could just use an empty list as a Tag. It would be lovely to see an example of that in the docs.

--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To post to this group, send email to beauti...@googlegroups.com.
To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.


Reply all
Reply to author
Forward
0 new messages