Beautiful Soup - Lambdas and multiple CSS classes

1,257 views
Skip to first unread message

Travis Jansen

unread,
Aug 31, 2017, 10:12:46 AM8/31/17
to beautifulsoup
I'm new to Python and I've been learning to scrap stuff with BS4.  I'm loving it, but I'm having trouble with filtering rows by CSS class.  Here's my sample HTML:

<html>
<head>
<title>Test</title>
</head>

<body>

<table>
<tr class='test hidden'><td>foo</td></tr>
<tr class='full'><td>bar</td></tr>
<tr class='test hidden'><td>foo</td></tr>
<tr class='full'><td>bar</td></tr>
</table>

</body>

</html>



Andy my python:

# import libraries
import urllib2
import re
from bs4 import BeautifulSoup
from bs4 import Comment
import operator

with open('sample.html', 'r') as myfile:
  html=myfile.read()

soup = BeautifulSoup(html, 'html.parser')
rows = soup.find('table').find_all('tr', class_=lambda x: 'hidden' not in x)

for row in rows:
print row



Running this prints:

<tr class="test hidden"><td>foo</td></tr>
<tr class="full"><td>bar</td></tr>
<tr class="test hidden"><td>foo</td></tr>
<tr class="full"><td>bar</td></tr>

What I want is:

<tr class="full"><td>bar</td></tr>
<tr class="full"><td>bar</td></tr>


I've tried different combinations of "not in" or != but I can't seem to get it to work when the CSS class has multiple classes specified.  Can anyone point me in the right direction?

Travis Jansen

unread,
Aug 31, 2017, 10:34:37 AM8/31/17
to beautifulsoup
I realize I can just say class_='full' but I wrote that wrong.  The classes could be fullabc or fullxyz or whatever. I NEED the ability to get rows without 'hidden' on them, not rows with a certain class.

David Pyke

unread,
Sep 12, 2017, 9:02:15 AM9/12/17
to beautifulsoup
I struggled with this same problem recently and I finally gave up on using a lambda in this situation and separated the conditional code from the find_all() function.

Here's how I got it to work:

rows = soup.find('table').find_all('tr')

for row in rows:
   
if 'hidden' not in row.get('class', ''):
       
print(row)

Or to put it more succinctly:

for row in soup.table.find_all('tr'):
   
if 'hidden' not in row.get('class'):
       
print(row)

I think a clue to why the lambda doesn't work is because when you use a lambda all permutations of the attribute values are enumerated. For example try:

rows = soup.table.find('tr', class_=lambda x: print(x))

What you get is this:

test
hidden
test hidden
full
full
test
hidden
test hidden
full
full

I hope this helps.

Travis Jansen

unread,
Sep 12, 2017, 9:14:30 AM9/12/17
to beautifulsoup
Thanks, David.  I posted this on SO, and essentially got an answer very similar to what you did.  Mine is still in the lambda, but it does almost exactly the same thing:


rows = soup.find('table').find_all(

   
lambda tag: tag.name == 'tr' and 'hidden' not in tag.get('class', '')
)


Reply all
Reply to author
Forward
0 new messages