Let me try to summarize your problem:
1. Your document has a <tbody> tag that contains a number of <tr> tags.
2. Some of the <tr> tags have a CSS class "even".
3. Some of the <tr> tags have no CSS class (because the markup is invalid)
4. Some of the <tr> tags have other CSS classes such as "committed".
You want to get the <tr> tags in #2 and #3, but not #4.
The simplest to do this is to simply get all the <tr> tags and filter
out the ones you don't want.
for tr in soup.tbody.find_all('tr'):
if 'committed' not in tr.get('class'):
...
You can filter out those tags ahead of time, by defining a function
that excludes the CSS classes you don't want:
def exclude_css_classes(cls):
return cls is None or not "committed" in cls
for tr in soup.tbody.find_all("tr", exclude_css_classes):
...
Or you can go the other way, and define a function that only includes
the CSS classes you want:
def include_css_classes(cls):
return cls is None or "even" in cls
for tr in soup.tbody.find_all("tr", include_css_classes):
...
Documentation on passing a function into a find() method:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#a-function
Hope this helps,
Leonard
> </td>
http://www.crummy.com/2012/07/31/0