There are multiple ways to get the URLs from only the body. Here are two ways.
body
and then find all elements with href
under it.href
under the body
.from bs4 import BeautifulSoup
HTML = """
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="mystyle.css">
</head>
<body>
<h1>Some <a href="https://example.com/some-page-1/">link 1</a></h1>
<h1>Some <a href="https://example.com/some-page-2/">link 2</a></h1>
</body>
</html>
"""
soup = BeautifulSoup(HTML, 'html.parser')
print([el['href'] for el in soup.find('body').find_all(attrs={'href': True})])
print([el['href'] for el in soup.select('body [href]')])
Output
['https://example.com/some-page-1/', 'https://example.com/some-page-2/']
['https://example.com/some-page-1/', 'https://example.com/some-page-2/']
--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beautifulsou...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beautifulsoup/f790d42b-6094-4f87-b6ad-7bbdc1de5b35n%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beautifulsou...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beautifulsoup/aa8d8d97-9ead-47c7-a5d9-f798c05b1c9en%40googlegroups.com.