Not one but two releases today. First, the first real 3.x release in
almost two years.
http://www.crummy.com/software/BeautifulSoup/bs3/download/3.x/Beautif...
This fixes a bug that can allow cross-site scripting attacks if
Beautiful Soup is used to sanitize HTML:
https://bugs.launchpad.net/beautifulsoup/+bug/868921
On output, angle brackets and bare ampersands are now escaped to XML
entities in strings. Previously they were only escaped in attribute
values. Beautiful Soup 4 escapes XML entities by default, so the
problem does not exist there unless you deliberately cause it (e.g. by
setting formatter=None).
-----
Now, on to the BS4 beta.
http://www.crummy.com/software/BeautifulSoup/bs4/download/4.0/beautif...
It's almost done at this point. All the reported bugs are fixed except
the lack of namespace support. I'd like to add that before the
release, but I don't know how much work it'll be.
Changelog:
* Multi-valued attributes like "class" always have a list of values,
even if there's only one value in the list.
* Added a number of multi-valued attributes defined in HTML5.
* Stopped generating a space before the slash that closes an
empty-element tag. This may come back if I add a special XHTML mode
(http://www.w3.org/TR/xhtml1/#C_2), but right now it's pretty
useless.
* Passing text along with tag-specific arguments to a find* method:
find("a", text="Click here")
will find tags that contain the given text as their
.string. Previously, the tag-specific arguments were ignored and
only strings were searched.
* Fixed a bug that caused the html5lib tree builder to build a
partially disconnected tree. Generally cleaned up the html5lib tree
builder.
* If you restrict a multi-valued attribute like "class" to a string
that contains spaces, Beautiful Soup will only consider it a match
if the values correspond to that specific string.
That last one is implemented as a big hack, but I can remove the hack
later without changing the API.
Leonard