There are multiple ways to get the URLs from only the body. Here are two ways.
body and then find all elements with href under it.href under the body.from bs4 import BeautifulSoup
HTML = """
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="mystyle.css">
</head>
<body>
<h1>Some <a href="https://example.com/some-page-1/">link 1</a></h1>
<h1>Some <a href="https://example.com/some-page-2/">link 2</a></h1>
</body>
</html>
"""
soup = BeautifulSoup(HTML, 'html.parser')
print([el['href'] for el in soup.find('body').find_all(attrs={'href': True})])
print([el['href'] for el in soup.select('body [href]')])
Output
['https://example.com/some-page-1/', 'https://example.com/some-page-2/']
['https://example.com/some-page-1/', 'https://example.com/some-page-2/']
--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beautifulsou...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beautifulsoup/f790d42b-6094-4f87-b6ad-7bbdc1de5b35n%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beautifulsou...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beautifulsoup/aa8d8d97-9ead-47c7-a5d9-f798c05b1c9en%40googlegroups.com.