Re: Retrieve tag which contains a text searched by regular expression

410 views
Skip to first unread message

grattachecca

unread,
Sep 17, 2012, 9:27:28 AM9/17/12
to beauti...@googlegroups.com
I saw that this is a common problem. A partial solution can be

  print html.find_all(text=re.compile("\s*U\.R\.P\.\s*"))[0].find_parent("a")

Using find_parent() function. This does not find an <a> tag with the expression in it but find the regex and then try to check if it has a parent tag named "a" (at any superior level).

The good part is that find_parent() return an bs4.element.Tag instead of a bs4.element.NavigableString (returned by find_all + regular expression)

Xavier Combelle

unread,
Sep 14, 2012, 1:34:54 PM9/14/12
to beauti...@googlegroups.com
you should not use prettify

  from bs4 import BeautifulSoup

  import re

  html = BeautifulSoup(open("index.html"), "lxml")

  for parent in html.find_all("font", text="U.R.P.")[0].parents:
      if parent.name == "a":
          print 'found!'
          break

no need to do any regular expression
Reply all
Reply to author
Forward
0 new messages