Hello again...
So I have a code that strips a site and prints out the strings/text. However the output is not structured like I want it to be, ie a listed format, rather its structure resembles the html that was parsed??? for example...
1:
Hoover HS
:
Birmingham,
AL
:
1
:
2
:
4
:
3
I am okay at data managing in python, so I know the strip,split,append methods but none of them seem to mold the data like I want...
1: Hoover HS: Birmingham, AL: 1: 2: 4: 3 etc.....
Am I missing something in bs4 or is there something else programming wise that I do not know about? (Im a self taught novice)
here is my working code:
import urllib2
import urllib
import string
from bs4 import BeautifulSoup
urlloop = ['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18']
def main():
for i in urlloop:
url = "
http://www.usatodayhss.com/news/rankings/super-25-boys-football?state=AL&p="+i
request = urllib2.Request(url)
page = urllib2.urlopen(request)
soup = BeautifulSoup(page)
HS = soup.find_all('tr', 'hss-data')
#print (soup.prettify())
#print (soup.get_text())
for i in HS:
tdlist = i.find_all('div')
data1 = i.find('td').string
if tdlist[0].span is not None:
data3 = 'none'
data5 = tdlist[1].a.string
data6 = tdlist[7].string
data7 = tdlist[11].string
data8 = tdlist[15].text
data9 = tdlist[20].text
print '%s: %s: %s: %s: %s: %s: %s' % (data1, data5, data3, data6, data7, data8, data9)
else:
data3 = tdlist[0].a.string
data5 = tdlist[1].string
data6 = tdlist[6].string
data7 = tdlist[10].string
data8 = tdlist[15].string
data9 = tdlist[19].string
print '%s: %s: %s: %s: %s: %s: %s' % (data1, data3, data5, data6, data7, data8, data9)
main()
Thanks,
Tom