How to extract specific "body" from many ones in the html code

53 views
Skip to first unread message

Roman D.

unread,
Apr 14, 2023, 8:22:16 AM4/14/23
to beautifulsoup
Hello,
I am new to web scraping. Could you please help me? In BeautifulSoup4 in Python,
I try to extract the information in class="body". However, I need only specific bodies, not all. In particular, the only body I need to extract is the body which goes after class="runner">Letters to<

How to tell the beautifulsoup to extract specific bodies and not the all (taking into account that they all have the same class="body")?

Please look at the snapshots attachted. With figure bracket in the second snapshot, I highlited the information I want to extract for each name of a person.

Best, Romansnapshot1.pngsnapshot2.png

Isaac Muse

unread,
Apr 14, 2023, 10:54:35 PM4/14/23
to beautifulsoup
You can certainly navigate the HTML tree manually, but one easy way is to use CSS selectors.

We want to find the "div" with the class "body" (`div.body`) but only if it comes immediately after "font" with class "runner" (`font.runner`).

The `+` symbol allows us to fund a relationship where one tag comes immediately after the other.

````
from bs4 import BeautifulSoup

HTML = """
<font class="name"></font>
<div class="body">Not me</div>
<font class="runner"></font>
<div class="body">Find me!</div>
"""

soup = BeautifulSoup(HTML, 'html.parser')
print(soup.select('font.runner + div.body'))
````

This gives us the results:

```
[<div class="body">Find me!</div>]
```

Hope that helps.

Roman D.

unread,
Apr 15, 2023, 11:00:30 AM4/15/23
to beautifulsoup
Hello,

Thank you very much for your help! I've moved forward but still have trouble. I get None resutls when I try to do what is depicted in the snapshot (please, see the attachment). What am I doing wrong?snapshot3.png

суббота, 15 апреля 2023 г. в 04:54:35 UTC+2, faceless...@gmail.com:

Roman D.

unread,
Apr 15, 2023, 11:55:26 AM4/15/23
to beautifulsoup
Hello,

One more addition. If I do this, I also struggle to obtain what I need. Can yopu please tell me possibly why?
snapshot4.png

суббота, 15 апреля 2023 г. в 17:00:30 UTC+2, Roman D.:

Isaac Muse

unread,
Apr 15, 2023, 2:24:35 PM4/15/23
to beautifulsoup
Unfortunately, without a minimal reproducible example, I can only guess what your issue is.

Also, pictures are fine, but copy/pasteable examples are a must as I don't have time to transcribe examples from images.

Moe Meyer

unread,
Apr 15, 2023, 6:06:08 PM4/15/23
to beauti...@googlegroups.com
  1. is he still having trouble, you could use the find_next method in BeautifulSoup to find the next "div" tag with class "body" after the "font" tag with class "runner"


    from bs4 import BeautifulSoup html = """ <div> <font class="name">John</font> <div class="body">Not me</div> <font class="runner">Letters to</font> <div class="body">Find me!</div> </div> """ soup = BeautifulSoup(html, 'html.parser') # Find the font tag with class "runner" runner = soup.find('font', class_='runner') # Find the next div tag with class "body" after the font tag with class "runner" body = runner.find_next('div', class_='body') print(body.text)


    This should output


    “Find me!”

Pest Pro Rid All, LLC
Moe Meyer / Co Founder
Acquisitions / Market Development
Code enthusiast
Reply all
Reply to author
Forward
0 new messages