find tag containing specific text

Tarlika Elisabeth Schmitz

unread,

Jul 13, 2010, 8:11:11 AM7/13/10

to beauti...@googlegroups.com

I need to find the tag that contains a certain text. The HTML file
contains at most one such tag. The HTML prettified (previous
Beautifulsoup output) and therefore the search text will have leading
and trailing newlines and spaces.

I am looking for an exact match. In the example below, if
searching for a span with text "Chocolate", I only want ONE result.

Example:

Chocolate


Chocolate Bar

I need the span Tag as a return value because ultimately I am
interested in this tag's siblings, not the text itself.

I played around with

x = soup('span', text=re.compile("Chocolate"))
x[0].parent ...

but this gives me multiple hits (tags containing "Chocolate" +
"Chocolate Bar").

--

Best Regards,
Tarlika Elisabeth Schmitz

Tal Einat

unread,

Jul 13, 2010, 12:25:39 PM7/13/10

to beauti...@googlegroups.com

You should read a bit about regular expressions. Specifically, the
documentation of the Python re module isn't too bad:
http://docs.python.org/library/re.html#regular-expression-syntax

In your case, try:
re.compile(r"^Chocolate$")

- Tal Einat

Tarlika Elisabeth Schmitz

unread,

Jul 14, 2010, 3:32:00 PM7/14/10

to beauti...@googlegroups.com

On Tue, 13 Jul 2010 19:25:39 +0300
Tal Einat <tale...@gmail.com> wrote:

>Tarlika Elisabeth Schmitz wrote:
>> I am looking for an exact match. In the example below, if
>> searching for a span with text "Chocolate", I only want ONE result.
>>
>> Example:
>>
>> 
>> Chocolate
>> 
>> 
>> Chocolate Bar
>> 
>>

>> [...]

>
>In your case, try:
>re.compile(r"^Chocolate$")

re.compile(r"^ *Chocolate *$", re.MULTILINE)
does the job

Reply all

Reply to author

Forward