letras.mus.br Website Scrapping

23 views
Skip to first unread message

Rodrigo Kalil

unread,
Oct 3, 2022, 10:16:35 PM10/3/22
to beautifulsoup
While exploring the documents of this site, I discovered some html documents with three declarations like <!DOCTYPE html>.
When getting these docs with bs4, only the tags of the first declaration where obtained. But I need the info of a tag at another part of the doc.
I need the href of the tag: <a class="ytp-watermark yt-uix-sessionlink ytp-watermark-small" target="_blank" aria-label="Assista em www.youtube.com" data-sessionlink="feature=player-watermark" href="https://www.youtube.com/watch?v=wdoIAL8H8wE" data-layer="8"><use class="ytp-svg-shadow" xlink:href="#ytp-id-49">.

Sorry, but you mus not understand the content of the websites. I'm brazilian. Hope it doesn't hamper.

Thanks

Emmanuel Osuolale

unread,
Oct 4, 2022, 8:44:30 AM10/4/22
to beauti...@googlegroups.com
Your question is not very clear, how can we help please? 
What part of the document are you referring to?
Can you screenshot the part and send it to us?

--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beautifulsou...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beautifulsoup/ba05a173-96a5-4269-a0e6-e2e298df0968n%40googlegroups.com.

czrpxr

unread,
Oct 4, 2022, 9:05:03 AM10/4/22
to beautifulsoup
I don't know if BeautifulSoup is the best approach. I took a look at the website and what you are talking about is that the website has two iframes that are dynamically loaded. I don't know if BS is able to deal with dynamic content like this.

Regards.

Rodrigo Kalil

unread,
Oct 4, 2022, 9:20:47 AM10/4/22
to beauti...@googlegroups.com
czrpxr, yeah, I realized bs4 can't deal with this. I'll try using Selenium. As I see, the part I'm interested in only comes up when we click at the player.
Emmanuel, that page picks up a video from Youtube. I want my code to open the related Youtube page, cuz some information there is useful. However, the link present at the html cannot be accessed with bs4 as far as I saw. I´ll give you a screenshot, but I guess bs4 can´t help me now. If any of you have any option of lib that interacts with the player, as I think Selenium will do, please send me. 
Besides that, thanks for answering me, it´s my first time at this group.

image.png

image.png

Reply all
Reply to author
Forward
0 new messages