Cant find table with soup.findAll('table')

2,322 views
Skip to first unread message

Fidel N

unread,
Oct 29, 2013, 1:55:00 PM10/29/13
to beauti...@googlegroups.com
Hi:

Im using soup.findAll('table') to try to find the table in an html file, but it will not appear.


The table indeed exists in the file (Did this by open the file, F12,find the table code):

And the code Im using to call the file with BeautifulSoup is:
import sys
from bs4 import BeautifulSoup
with open(r'c:\blabla\filepath.html', 'r') as f:
    webpage = f.read()
print BeautifulSoup(webpage).findAll('table')

If I try with other tags it will work, such as:
print BeautifulSoup(webpage).findAll('html')


Shouldnt this return the table instead of an empty list? 

CHUX

unread,
Oct 29, 2013, 3:46:27 PM10/29/13
to beauti...@googlegroups.com
Well, I'd say you should show the contents of the HTML, at least to help people try to decipher what the problems might me. What I know is that if it comes to BeautifulSoup, there are many ways to skin a cat. I've been try to hack something out of a file for the better part of today. It's all part of the learning process!

Fidel N

unread,
Oct 29, 2013, 5:52:00 PM10/29/13
to beauti...@googlegroups.com
Yes, sorry, I thought that the image alone was enough, here is one example that it wont find containing table tag, I just copied it from inside the file:

<BR>
</P>
<TABLE WIDTH=533 BORDER=1 BORDERCOLOR="#000000" CELLPADDING=0 CELLSPACING=0>
<COL WIDTH=51>
<COL WIDTH=172>
<COL WIDTH=43>
<COL WIDTH=265>
<TR VALIGN=TOP>


I tried capital letters, not capital, any combination, with no luck. Thanks! 

Fidel N

unread,
Oct 30, 2013, 6:16:24 PM10/30/13
to beauti...@googlegroups.com
Hey, just as another confirmation, I am able to find "TABLE" within the given file by using regex.

So the following code will work for the re.findall part, but not for the soup one:

import sys
import urllib2
from bs4 import BeautifulSoup
import re
webpage = open(r'd:\samplefile.html', 'r').read()
soup = BeautifulSoup(webpage)
print re.findall("TABLE",webpage)   #works, prints ['TABLE','TABLE']
print soup.findAll("TABLE")   # prints an empty list []


Any clue what could be wrong here? Thanks!

Fidel N

unread,
Nov 2, 2013, 9:17:29 PM11/2/13
to beauti...@googlegroups.com
the file is here, just in case someone wants to have a look at it:


Still no success trying to find the table, although as you see its in the document and easily found through regex.

Paracha

unread,
Jan 23, 2014, 11:14:06 AM1/23/14
to beauti...@googlegroups.com
I am using bs4 (version 4.3.0) and having the exact same proplem.

Arturo

unread,
Jun 7, 2014, 4:36:53 PM6/7/14
to beauti...@googlegroups.com
And why do you write findAll instead of find_all?

Reply all
Reply to author
Forward
0 new messages