find tag containing specific text

5,518 views
Skip to first unread message

Tarlika Elisabeth Schmitz

unread,
Jul 13, 2010, 8:11:11 AM7/13/10
to beauti...@googlegroups.com
I need to find the tag that contains a certain text. The HTML file
contains at most one such tag. The HTML prettified (previous
Beautifulsoup output) and therefore the search text will have leading
and trailing newlines and spaces.

I am looking for an exact match. In the example below, if
searching for a span with text "Chocolate", I only want ONE result.

Example:

<span class="x">
Chocolate
</span>
<span class="x">
Chocolate Bar
</span>

I need the span Tag as a return value because ultimately I am
interested in this tag's siblings, not the text itself.

I played around with

x = soup('span', text=re.compile("Chocolate"))
x[0].parent ...

but this gives me multiple hits (tags containing "Chocolate" +
"Chocolate Bar").


--

Best Regards,
Tarlika Elisabeth Schmitz

Tal Einat

unread,
Jul 13, 2010, 12:25:39 PM7/13/10
to beauti...@googlegroups.com

You should read a bit about regular expressions. Specifically, the
documentation of the Python re module isn't too bad:
http://docs.python.org/library/re.html#regular-expression-syntax

In your case, try:
re.compile(r"^Chocolate$")

- Tal Einat

Tarlika Elisabeth Schmitz

unread,
Jul 14, 2010, 3:32:00 PM7/14/10
to beauti...@googlegroups.com
On Tue, 13 Jul 2010 19:25:39 +0300
Tal Einat <tale...@gmail.com> wrote:

>Tarlika Elisabeth Schmitz wrote:
>> I am looking for an exact match. In the example below, if
>> searching for a span with text "Chocolate", I only want ONE result.
>>
>> Example:
>>
>> <span class="x">
>>    Chocolate
>> </span>
>> <span class="x">
>>    Chocolate Bar
>> </span>
>>

>> [...]


>
>In your case, try:
>re.compile(r"^Chocolate$")

re.compile(r"^ *Chocolate *$", re.MULTILINE)
does the job

Reply all
Reply to author
Forward
0 new messages