soup.get_text() does not preserve newline for html

25 views
Skip to first unread message

peng...@gmail.com

unread,
Dec 20, 2017, 1:37:19 AM12/20/17
to beautifulsoup
In the following example, "xyz" and "abc" are concatenated in the output. This is obviously different from what this piece of html code would be shown in a browser. What is the best way to convert html to text so that the layout like this is still maintained? Thanks.

$ cat main1.py
#!/usr/bin/env python
# vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1:

html_doc = "<h2>xyz</h2><p>abc</p>"

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print soup.get_text()
$ ./main1.py
xyzabc

Reply all
Reply to author
Forward
0 new messages