Hello again...
So I have a code that strips a site and prints out the
strings/text. However the output is not structured like I want it to be,
ie a listed format, rather its structure resembles the html that was
parsed??? for example...
1:
Hoover HS
:
Birmingham,
AL
:
1
:
2
:
4
:
3
I am okay at data managing in python, so I know the strip,split,append
methods but none of them seem to mold the data like I want...
1: Hoover HS: Birmingham, AL: 1: 2: 4: 3 etc.....
Am I missing something in bs4 or is there something else programming wise
that I do not know about? (Im a self taught novice)
here is my working code:
import urllib2
import urllib
import string
from bs4 import BeautifulSoup
urlloop =
['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17 ','18']
def main():
for i in urlloop:
url =
"http://www.usatodayhss.com/news/rankings/super-25-boys-football?state..."+i
request = urllib2.Request(url)
page = urllib2.urlopen(request)
soup = BeautifulSoup(page)
HS = soup.find_all('tr', 'hss-data')
#print (soup.prettify())
#print (soup.get_text())
for i in HS:
tdlist = i.find_all('div')
data1 = i.find('td').string
if tdlist[0].span is not None:
data3 = 'none'
data5 = tdlist[1].a.string
data6 = tdlist[7].string
data7 = tdlist[11].string
data8 = tdlist[15].text
data9 = tdlist[20].text
print '%s: %s: %s: %s: %s: %s: %s' % (data1, data5,
data3, data6, data7, data8, data9)
else:
data3 = tdlist[0].a.string
data5 = tdlist[1].string
data6 = tdlist[6].string
data7 = tdlist[10].string
data8 = tdlist[15].string
data9 = tdlist[19].string
print '%s: %s: %s: %s: %s: %s: %s' % (data1, data3,
data5, data6, data7, data8, data9)
main()
Thanks,
Tom