Parsing a specific html page

68 views

Skip to first unread message

Виктор Астахов

unread,

Jun 10, 2021, 10:35:32 PM6/10/21

to beautifulsoup

Hello I'm new to beatiful soup. I try to parse the site, and I even get approximately the desired exhaust.

The task is simple, get the name and link from the given code, lost somewhere on the page:

"<a class =" rubricator-list-item-link-12kOm "data-category-id =" 32 "data-category-mc-id =" 285 "data-marker =" category [1000285] / link "href = "/ volgograd / audio_i_video / mp3-pleery-ASgBAgICAUSIArIJ? cd = 1" title = "MP3 players">

MP3 players

</a> "

findall ("a") gives these lines, but it is impossible to extract href and title from them

PS. Is it possible to somehow get the exhaust by the class name if it is not fully known?

facelessuser

unread,

Sep 2, 2021, 1:30:02 PM9/2/21

to beautifulsoup

from bs4 import BeautifulSoup
test = '''
<a class =" rubricator-list-item-link-12kOm "data-category-id =" 32 "data-category-mc-id =" 285 "data-marker =" category [1000285] / link "href = "/ volgograd / audio_i_video / mp3-pleery-ASgBAgICAUSIArIJ? cd = 1" title = "MP3 players">
                 MP3 players
                </a> "
'''
soup = BeautifulSoup(test, 'html.parser')
a = soup.find('a')
print(a['href'])
print(a['title'])

PS. Is it possible to somehow get the exhaust by the class name if it is not fully known?