About bug reporting and bug report: extract() does extract false element.

13 views
Skip to first unread message

Mikael Lepistö

unread,
May 20, 2010, 9:37:05 AM5/20/10
to beautifulsoup
I was looking for place to send bug report but I didn't find any
bugzilla etc. for library. Anyways I found case where extract does
extract false element if there is multiple siblings with same string.

Here is small test code for reproducing the problem:

>>> from BeautifulSoup import BeautifulSoup as BS
>>> doc = '<div>A<div>B</div>A</div>'
>>> d = BS(doc)
>>> d.first().next.next.next.next.extract()
u'A'
>>> d
<div><div>B</div>A</div>

In this case first A was removed, not the last one.

Cheers, Mikael

--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To post to this group, send email to beauti...@googlegroups.com.
To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.

Mikael Lepistö

unread,
May 20, 2010, 9:53:11 AM5/20/10
to beautifulsoup
And after cup of coffee I did find the launchpad and
https://bugs.launchpad.net/beautifulsoup/+bug/397997 which sounds like
my problem.

Aaron DeVore

unread,
May 20, 2010, 1:05:25 PM5/20/10
to beauti...@googlegroups.com
Mikael,
This should be fixed in the 3.0.8/3.0.8.1 version. Before, extract
found the index of an element through equality. That worked fine
unless there were two equal nodes. Starting in 3.0.8, Beautiful Soup
uses identity instead (the 'is' keyword). The 3.1 series came before
3.0.8, so it doesn't have the fix.

-Aaron DeVore

2010/5/20 Mikael Lepistö <elh...@gmail.com>:
Reply all
Reply to author
Forward
0 new messages