Convert HTML to plain text

465 views

Skip to first unread message

Adrián Ribao

unread,

Jun 10, 2009, 11:39:22 AM6/10/09

to beautifulsoup

Hello everybody,

I need to convert HTML to plaintext, can this be done with
beautifulsoup?

Thank you!

Pratik Dam

unread,

Jun 10, 2009, 4:26:47 PM6/10/09

to beauti...@googlegroups.com

see if this works 


from  BeautifulSoup  import  BeautifulSoup
import urllib 
soup = BeautifulSoup(urllib.urlopen("http://www.crummy.com/software/BeautifulSoup/documentation.html").read())

print  "".join(soup.findAll(text=True))

IMO this does not handle namespaces etc  but kind of quuick solution

Reply all

Reply to author

Forward

0 new messages