find/findALL returns NONE when trying to find text

3,485 views
Skip to first unread message

Shashwat alok

unread,
Apr 20, 2011, 11:45:18 PM4/20/11
to beautifulsoup
I am trying to extract the summary compensation table from the
following website. I have saved the source code on my drive in txt
(table.txt) format.

If i use the following code to search for summary compensation
string, s is always none. From what I understand s should be assigned
a string value of Summary Compensation.

I am new to python and beautiful soup. Any help is greatly
appreciated,

import glob
import codecs
import csv
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(open("table.txt").read())
print soup.prettify()
s=soup.findAll(text="Summary Compensation")
print s

http://www.sec.gov/Archives/edgar/data/1314102/000119312508210683/ddef14a.htm

pbuckner

unread,
Apr 21, 2011, 10:34:43 AM4/21/11
to beautifulsoup
Check your source text VERY carefully, especially when you're trying
to do an exact (non regular-expression) match.

You'll note in the SEC document, the actual text ends with a space:

<B>Summary Compensation </B>

So, text="Summary Compensation "
will work, or something like
text=re.compile("Summary Compensation")
will allow for more generous matches.

Shashwat alok

unread,
May 3, 2011, 4:11:01 PM5/3/11
to beauti...@googlegroups.com
Thanks a lot for your help. 


--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To post to this group, send email to beauti...@googlegroups.com.
To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.


Reply all
Reply to author
Forward
0 new messages