Here's some markup:
<p class="body strikeout">
Up to this point, the value of the "class" attribute in that tag has
been the string "body strikeout". There was some limited support for
searching by CSS class, but it was a hack and not implemented
consistently.
In beta 5, the value of that class attribute is the list ["body",
"strikeout"]. The 'class' attribute, and a few others that are very
obscure, can have more than one value. This is documented here:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#multivalue
A semi-related feature is that you can also search by CSS class in a
consistent way. Any kind of search against CSS class will be run
separately against all of a tag's CSS classes. Documentation:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class
The only thing I'm not sure about is whether it's good to have
p['class'] be sometimes a list and sometimes a string:
In <p class="body strikeout">, p['class'] == ['body', 'strikeout']
In <p class="body">, p['class'] == 'body'
This may confuse users. I certainly don't want to make this attribute
*always* be a list. I could present p['class'] as "body strikeout"
when it was accessed, but treat it as ["body", "strikeout"] when
searching. Would that itself be confusing? If you have a strong
opinion, let me know.
Leonard
---
= 4.0.0b5 (20120209) =
* Rationalized Beautiful Soup's treatment of CSS class. A tag
belonging to multiple CSS classes is treated as having a list of
values for the 'class' attribute. Searching for a CSS class will
match *any* of the CSS classes.
This actually affects all attributes that the HTML standard defines
as taking multiple values (class, rel, rev, archive, accept-charset,
and headers), but 'class' is by far the most common. [bug=41034]
* If you pass anything other than a dictionary as the second argument
to one of the find* methods, it'll assume you want to use that
object to search against a tag's CSS classes. Previously this only
worked if you passed in a string.
* Fixed a bug that caused a crash when you passed a dictionary as an
attribute value (possibly because you mistyped "attrs"). [bug=842419]
* Unicode, Dammit now detects the encoding in HTML 5-style <meta> tags
like <meta charset="utf-8" />. [bug=837268]
* If Unicode, Dammit can't figure out a consistent encoding for a
page, it will try each of its guesses again, with errors="replace"
instead of errors="strict". This may mean that some data gets
replaced with REPLACEMENT CHARACTER, but at least most of it will
get turned into Unicode. [bug=754903]
* Patched over a bug in html5lib (?) that was crashing Beautiful Soup
on certain kinds of markup. [bug=838800]
* Fixed a bug that wrecked the tree if you replaced an element with an
empty string. [bug=728697]
* Improved Unicode, Dammit's behavior when you give it Unicode to
begin with.