Re: insert_after on nested paragraphs

109 views
Skip to first unread message

Aaron DeVore

unread,
Mar 9, 2013, 5:30:04 AM3/9/13
to beauti...@googlegroups.com
I think your problems is with p.descendants. Tag.descendants is a generator that lazily yields descendant tags/strings. That makes it sensitive to changes in the tree because a tree change can invalidate the current generator state.

Instead, replace "for desc in p.descendants" with "for desc in p.find_all('p')". That builds a list prior to traversing the tree. It also handles the checking for Tag and desc.name == "p".

-Aaron DeVore


On Fri, Mar 8, 2013 at 2:57 AM, TNPetr7 <tnp...@gmail.com> wrote:
Hello everyone,
I'm trying to move nested <p> inside another <p> after the descendants' parent. I'm using BS4 and Python 2.7.3 on Gentoo. Here's the code:

for p in soup.find_all("p"):
  for desc in p.descendants:
    if isinstance(desc, Tag):
      if desc.name == "p":
        p.insert_after(desc)


My input may look like this one:
<html>
<head>
  <title>title</title>
</head>
<body>
  <p>
    <p>first nested paragraph</p>
    <p><img src="img.png"></p>
    <p><br></p>
  Text inside parent paragraph.
  </p>
</body>
</html>


However, when p.insert_after(desc) is called, the loop processes only the first nested paragraph. My guess is that the loop is trying to continue on the descendants/siblings of the first nested paragraph. Since it's moved, the cycle is broken. I tested desc.extract() and it stops with AttributeError: 'NoneType' object has no attribute 'next_element' after calling it.

Can anyone comment on this and shed some light on how the loop over descendants work? I'm open to any alternative.

Thanks!
P.

--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beautifulsou...@googlegroups.com.
To post to this group, send email to beauti...@googlegroups.com.
Visit this group at http://groups.google.com/group/beautifulsoup?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Reply all
Reply to author
Forward
0 new messages