Not one but two releases today. First, the first real 3.x release in
almost two years.
This fixes a bug that can allow cross-site scripting attacks if
Beautiful Soup is used to sanitize HTML:
On output, angle brackets and bare ampersands are now escaped to XML
entities in strings. Previously they were only escaped in attribute
values. Beautiful Soup 4 escapes XML entities by default, so the
problem does not exist there unless you deliberately cause it (e.g. by
Now, on to the BS4 beta.
It's almost done at this point. All the reported bugs are fixed except
the lack of namespace support. I'd like to add that before the
release, but I don't know how much work it'll be.
* Multi-valued attributes like "class" always have a list of values,
even if there's only one value in the list.
* Added a number of multi-valued attributes defined in HTML5.
* Stopped generating a space before the slash that closes an
empty-element tag. This may come back if I add a special XHTML mode
(http://www.w3.org/TR/xhtml1/#C_2), but right now it's pretty
* Passing text along with tag-specific arguments to a find* method:
find("a", text="Click here")
will find tags that contain the given text as their
.string. Previously, the tag-specific arguments were ignored and
only strings were searched.
* Fixed a bug that caused the html5lib tree builder to build a
partially disconnected tree. Generally cleaned up the html5lib tree
* If you restrict a multi-valued attribute like "class" to a string
that contains spaces, Beautiful Soup will only consider it a match
if the values correspond to that specific string.
That last one is implemented as a big hack, but I can remove the hack
later without changing the API.