how to get the data from a website like this? I am trying to get data from this website and use pandas to manipulate it.

92 views
Skip to first unread message

Yu Wang

unread,
Mar 4, 2017, 8:28:28 PM3/4/17
to beautifulsoup
https://cdaw.gsfc.nasa.gov/CME_list/radio/waves_type2.html

The table is on this website, and it is really tricky to identify a row in this table and separate them into different columns. Any idea?

Jim Tittsler

unread,
Mar 4, 2017, 8:36:58 PM3/4/17
to beautifulsoup
Beautiful Soup is not the right tool for the job, because that is a
preformatted text table not an HTML table. Simple text processing in
Python (or even cutting out the <pre> part and using awk or cut) would
seem more appropriate.

Brian L Cartwright

unread,
Apr 8, 2017, 4:35:44 PM4/8/17
to beauti...@googlegroups.com
from BeautifulSoup import *
import urllib2

url =
"http://www.koreabaseball.com/Record/Player/HitterDetail/Basic.aspx?playerId=60456"

html = urllib2.urlopen(url, 'html.parser').read()
soup = BeautifulSoup(html)
print soup

item = soup.findAll('td')
print len(item)
print item


I've been using a script to read the site above for years, but now it
doesn't work. I've tried various combinations and always get soup, but can
not find any tags in it. I've tried find and findAll, looked for div, td,
tr, etc - but nothing.

I'd appreciate if some of you could run the script above and discover why
find won't work.

Thanks

Brian Cartwright

Reply all
Reply to author
Forward
0 new messages