Re: Unexpected result from find_parent

14 views
Skip to first unread message

Leonard Richardson

unread,
Jul 28, 2012, 7:34:12 AM7/28/12
to beauti...@googlegroups.com
> context = soup.find_all(text=re.compile("Thorning"))
> This returns (I had only expected it to return the string I was looking
> for: "Thorning"):
> Output
> [u'30/7-10 EB=Hold kft hvor er hun Dum=(Thorning) ?']

When you search by regular expression, you find the entire string that
matches the regular expression, not just the part of the string that
matches.

> When I look into context:
>
> sample copy/paste code
>
> for c in context:
> for i in c.find_parent():
> print "RESULT: %s" % i
> ...
>
>
> Output
> RESULT: 30/7-10 EB=Hold kft hvor er hun Dum=(Thorning) ?
> RESULT: <em>28. jul</em>
>
> The first result is exactly what I would expect - but I don't understand the
> second result?

The prettify() method is useful when trying to understand these things:

print c.find_parent().prettify()
# <a class="txt" href="http://ekstrabladet.dk/minsag/article1798780.ece">
# 30/7-10 EB=Hold kft hvor er hun Dum=(Thorning) ?
# <em>
# 28. jul
# </em>
# </a>

The result of c.find_parent() is an <A> tag that contains a string and
an <EM> tag. When you iterate over a tag, you iterate over its
children. Here, the first result is the string and the second result
is the <EM> tag.

Leonard

Andreas Christoffersen

unread,
Jul 29, 2012, 10:32:24 AM7/29/12
to beauti...@googlegroups.com
Thanks for getting me up to speed Leonard. Everything now works as expected! - Love BS4 - What ever other reasons there is, I find it much easier than lxml (for my needs anyway). Also really good documentation. Thanks again.
Reply all
Reply to author
Forward
0 new messages