Returning Text within deep in HTML

Aaron Moore

unread,

Nov 7, 2022, 5:15:45 AM11/7/22

to beautifulsoup

Hi all,

I am looking to return "Bookmaker Real-Time" in the HTML below. This HTML is the container I plan to iterate over on the webpage. I am having trouble with the title attribute. I greatly appreciate your support.

I

Isaac Muse

unread,

Nov 7, 2022, 12:22:02 PM11/7/22

to beautifulsoup

When asking questions, it is always best to post what code you’ve already tried so people can help you to understand where you are going wrong as opposed to providing solutions for you.

With that said, titles can be accessed as shown below. All of this and more is mentioned in the docs: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#attributes

from bs4 import BeautifulSoup

HTML = b"""
<html>
<head>
<title>Problem with BeautifulSoup</title>
</head>
<body>
<div title="A title">Content</div>
</body>
</html>
"""

soup = BeautifulSoup(HTML, 'html.parser')
print(soup.select_one('div')["title"])

Aaron Moore

unread,

Nov 7, 2022, 8:47:11 PM11/7/22

to beautifulsoup

Understood. Thank you for your helpful insight.

It seems I was reading the html incorrectly & used the wrong container. This is my first html project. This is great library.

Aaron Moore

unread,

Nov 10, 2022, 8:50:44 PM11/10/22

to beautifulsoup

Perhaps you can help me to the finish line here.

I need to collect a list of titles for the columns of a webpage. I am having trouble reaching each of the title attributes in the code. The green lines indicate the targeted text. The red and blue lines are examples of the class formats which hold the targets. I have had some success with the following:

soup= soup.find_all("div",class_="text-success")

books_list = [ ]

for x in range(len(soup)):

books_list.apppend(soup[x].select_one('div')['title']) #The output does not accurately execute everytime, and it misses the title cards held under different classes. Similar code does not work when applied to the blue line class.

I'm sure I am missing something. Your support is greatly appreciated.

Isaac Muse

unread,

Nov 11, 2022, 1:59:54 AM11/11/22

to beautifulsoup

I’m not sure I 100% understand what you are asking, but maybe this helps. My approach is to usually use CSS selectors. I have an obvious bias for using CSS selectors as I am the author of the CSS selector library that Beautiful Soup uses. You can learn more about all the support CSS pseudo-classes etc. by checking out the documentation here.

from bs4 import BeautifulSoup
HTML =

"""
<div class="text-success">
    <div title="Pick me 1"></div>
    <div title="not me 1"></div>
</div>
<div class="text-warning">
    <div title="Pick me 2"></div>
    <div title="not me 2"></div>
</div>
"""
soup = BeautifulSoup(HTML, 'html.parser')
books_list = [el['title'] for el in soup.select("div:is(.text-success, .text-warning) > div:first-child")]
print(books_list)

Output:

['Pick me 1', 'Pick me 2']

Reply all

Reply to author

Forward