> context = soup.find_all(text=re.compile("Thorning"))
> This returns (I had only expected it to return the string I was looking
> for: "Thorning"):
> Output
> [u'30/7-10 EB=Hold kft hvor er hun Dum=(Thorning) ?']
When you search by regular expression, you find the entire string that
matches the regular expression, not just the part of the string that
matches.
> When I look into context:
> sample copy/paste code
> for c in context:
> for i in c.find_parent():
> print "RESULT: %s" % i
> ...
> Output
> RESULT: 30/7-10 EB=Hold kft hvor er hun Dum=(Thorning) ?
> RESULT: <em>28. jul</em>
> The first result is exactly what I would expect - but I don't understand the
> second result?
The prettify() method is useful when trying to understand these things:
print c.find_parent().prettify()
# <a class="txt" href="http://ekstrabladet.dk/minsag/article1798780.ece">
# 30/7-10 EB=Hold kft hvor er hun Dum=(Thorning) ?
# <em>
# 28. jul
# </em>
# </a>
The result of c.find_parent() is an <A> tag that contains a string and
an <EM> tag. When you iterate over a tag, you iterate over its
children. Here, the first result is the string and the second result
is the <EM> tag.
Leonard