How to extract a NavigableString, if html code contains line breaks

70 views
Skip to first unread message

Evgeny

unread,
Jun 1, 2021, 8:11:15 AM6/1/21
to beautifulsoup

Hi,

I am trying to extract a NavigableString object from html.

The idea is to be able to further manipulate it with string.replace_with("new string")

In the following example everything works fine

from bs4 import BeautifulSoup 
html = """ <p><b>my string</b></p> """ 
soup = BeautifulSoup(html,"html.parser") soup.p.string.replace_with("New string") 
print(soup) 
# Output is '<p><b>New string</b></p>'

However if html code itself contains line breaks, then NavigableString is not extracted, instead a None is returned

from bs4 import BeautifulSoup 
html = """ 
<p> 
   <b>my string</b> 
</p>
""" 
soup = BeautifulSoup(html,"html.parser") soup.p.string.replace_with("New string") 

print(soup) # Output is: AttributeError: 'NoneType' object has no attribute 'replace_with'


So, the question is how to extract a NavigableString, if html code contains line breaks. I know that get_text(strip=True) will work, but it will return just a text, not a NavigableString, so I will not be able to further manipulate it.

Reply all
Reply to author
Forward
0 new messages