Parsing a specific html page

68 views
Skip to first unread message

Виктор Астахов

unread,
Jun 10, 2021, 10:35:32 PM6/10/21
to beautifulsoup
Hello I'm new to beatiful soup. I try to parse the site, and I even get approximately the desired exhaust.
The task is simple, get the name and link from the given code, lost somewhere on the page:
"<a class =" rubricator-list-item-link-12kOm "data-category-id =" 32 "data-category-mc-id =" 285 "data-marker =" category [1000285] / link "href = "/ volgograd / audio_i_video / mp3-pleery-ASgBAgICAUSIArIJ? cd = 1" title = "MP3 players">
                 MP3 players
                </a> "
findall ("a") gives these lines, but it is impossible to extract href and title from them
PS. Is it possible to somehow get the exhaust by the class name if it is not fully known?

facelessuser

unread,
Sep 2, 2021, 1:30:02 PM9/2/21
to beautifulsoup
from bs4 import BeautifulSoup
test = '''
<a class =" rubricator-list-item-link-12kOm "data-category-id =" 32 "data-category-mc-id =" 285 "data-marker =" category [1000285] / link "href = "/ volgograd / audio_i_video / mp3-pleery-ASgBAgICAUSIArIJ? cd = 1" title = "MP3 players">
                 MP3 players
                </a> "
'''
soup = BeautifulSoup(test, 'html.parser')
a = soup.find('a')
print(a['href'])
print(a['title'])

PS. Is it possible to somehow get the exhaust by the class name if it is not fully known?

I’m not sure what you mean by this.

Reply all
Reply to author
Forward
0 new messages