Beautiful Soup 4.0.0 beta 3

9 views
Skip to first unread message

Leonard Richardson

unread,
Feb 8, 2012, 10:53:26 AM2/8/12
to beauti...@googlegroups.com
Beautiful Soup 4 beta 3 is out! You can install it with "easy_install
beautifulsoup4" or "pip install beautifulsoup4". You can also download
the tarball:

http://www.crummy.com/software/BeautifulSoup/bs4/download/4.0/beautifulsoup4-4.0.0b4.tar.gz

or check out the Bazaar repository:

https://code.launchpad.net/beautifulsoup/

Changes:

* If you're using Python 3.2, the built-in HTMLParser is now reliable
enough to use on its own. You don't need to install lxml or html5lib
just to parse bad HTML (but lxml is still a lot faster). The
forthcoming Python 2.7.3 should also work this way.

* There's now a new_string() method to go along with new_tag().

* new_tag() will follow the rules of whatever tree builder was used to
create the original soup. Specifically, a new <p> tag will look like
"<p />" if you're dealing with XML, but it'll look like "<p></p>" if
you're dealing with HTML.

* There are two new methods for manipulating the tree:
PageElement.insert_before() and PageElement.insert_after().
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#insert-before-and-insert-after

* I replaced the "substitute_html_entities" argument with the more
general "formatter" argument:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#output-formatters

The default formatter converts bare ampersands and angle brackets to
XML entities, but doesn't touch HTML entities.

Leonard

Reply all
Reply to author
Forward
0 new messages