How to get a text that is very deep in HTML?

42 views
Skip to first unread message

Raimundo Baravaglio

unread,
Oct 31, 2022, 12:47:16 PM10/31/22
to beautifulsoup
Here the example I can't resolve:

I need to retrieve a text that is several levels below the main class.
The problem is that all the class names are identical and each one is three levels deeper.
Here the example, I need to get the word "Discontinued" from the following code:

<!-- START CODE -->
<html>
<head>
<title>Problem with BeautifulSoup</title>
</head>
<body>
<h1>Book</h1>
<div class="main">
    <div class=" product_name">
        <div class=" a">
            <strong>Level1</strong>
            <div class=" b">
                <strong>Level1</strong>
                <div class=" c">
                   " A label of level 1 "
                   <strong>Armed</strong>
                 </div>
             </div>
          </div>
        <div class=" a">
            <strong>Level2</strong>
            <div class=" b">
                <div class=" c">
                   " A label of level 2 "
                   <strong>Cleaned</strong>
                 </div>
                <strong>Level2</strong>
             </div>
         </div>
        <div class=" a">
            <strong>Level3</strong>
            <div class=" b">
                <strong>Level3</strong>
                <div class=" c">
                   " A label of level 3 "
                   <strong>Suspended</strong>
                 </div>
             </div>
         </div>
        <div class=" a">
            <strong>Level4</strong>
            <div class=" b">
                <strong>Level4</strong>
                <div class=" c">
                   " A label of level 4 "
                   <strong>Discontinued</strong>
                 </div>
             </div>
         </div>
        <div class=" a">
            <strong>Level5</strong>
            <div class=" b">
                <strong>Level5</strong>
                <div class=" c">
                   " A label of level 5 "
                   <strong>Justify</strong>
                 </div>
             </div>
         </div>
        <div class=" a">
            <strong>Level6</strong>
            <div class=" b">
                <strong>Level6</strong>
                <div class=" c">
                   " A label of level 6 "
                   <strong>Free charge</strong>
                 </div>
             </div>
         </div>
    </div>
</div>
</body>
<!-- END CODE -->

How could I do it with BeautifulSoup?
Thanks for your help!

Isaac Muse

unread,
Nov 2, 2022, 8:33:50 AM11/2/22
to beautifulsoup

You can accomplish a task like this by utilizing CSS selectors. The key here is utilizing the :nth-child pseudo class. Here we can specify the 3rd sibling element that is a div with class a and then target the underlying element.

soup = BeautifulSoup(HTML, 'html.parser')
print(soup.select_one('div.product_name div.a:nth-child(4) div.c strong').text)

Results:

➜  soupsieve git:(main) ✗ python3.10 example.py
Discontinued

There are certainly other ways to approach this as well, and depending on the context of the problem and the information you may or may not know at the time, the approach could be different, but based on the example provided, this would be sufficient.

Raimundo Baravaglio

unread,
Nov 4, 2022, 8:07:58 PM11/4/22
to beautifulsoup
Thank you !!!
Reply all
Reply to author
Forward
0 new messages