Convert HTML to plain text

463 views
Skip to first unread message

Adrián Ribao

unread,
Jun 10, 2009, 11:39:22 AM6/10/09
to beautifulsoup
Hello everybody,

I need to convert HTML to plaintext, can this be done with
beautifulsoup?

Thank you!

Pratik Dam

unread,
Jun 10, 2009, 4:26:47 PM6/10/09
to beauti...@googlegroups.com
see if this works 


from
BeautifulSoup import BeautifulSoup
import urllib
soup = BeautifulSoup(urllib.urlopen("http://www.crummy.com/software/BeautifulSoup/documentation.html").read())
print "".join(soup.findAll(text=True))


IMO this does not handle namespaces etc but kind of quuick solution
Reply all
Reply to author
Forward
0 new messages