Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion bs4 output is structured like the html
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Tom  
View profile  
 More options Jul 26 2012, 9:51 am
From: Tom <boot...@gmail.com>
Date: Thu, 26 Jul 2012 06:51:53 -0700 (PDT)
Local: Thurs, Jul 26 2012 9:51 am
Subject: bs4 output is structured like the html

Hello again...
       So I have a code that strips a site and prints out the
strings/text.  However the output is not structured like I want it to be,
ie a listed format, rather its structure resembles the html that was
parsed???  for example...
1:
                                                Hoover HS
                                            :

                                                Birmingham,
                                                AL
                                            :

                                                    1
                                                :
                                                    2
                                                :
                                                    4
                                                :
                                                    3

I am okay at data managing in python, so I know the strip,split,append
methods but none of them seem to mold the data like I want...
1: Hoover HS: Birmingham, AL: 1: 2: 4: 3  etc.....

Am I missing something in bs4 or is there something else programming wise
that I do not know about? (Im a self taught novice)

here is my working code:
import urllib2
import urllib
import string
from bs4 import BeautifulSoup

urlloop =
['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17 ','18']

def main():
    for i in urlloop:
        url =
"http://www.usatodayhss.com/news/rankings/super-25-boys-football?state..."+i
        request = urllib2.Request(url)
        page = urllib2.urlopen(request)
        soup = BeautifulSoup(page)
        HS = soup.find_all('tr', 'hss-data')
#print (soup.prettify())
#print (soup.get_text())
        for i in HS:
            tdlist = i.find_all('div')
            data1 = i.find('td').string
            if tdlist[0].span is not None:
                    data3 = 'none'
                    data5 = tdlist[1].a.string
                    data6 = tdlist[7].string
                    data7 = tdlist[11].string
                    data8 = tdlist[15].text
                    data9 = tdlist[20].text
                    print '%s: %s: %s: %s: %s: %s: %s' % (data1, data5,
data3, data6, data7, data8, data9)
            else:
                    data3 = tdlist[0].a.string
                    data5 = tdlist[1].string
                    data6 = tdlist[6].string
                    data7 = tdlist[10].string
                    data8 = tdlist[15].string
                    data9 = tdlist[19].string
                    print '%s: %s: %s: %s: %s: %s: %s' % (data1, data3,
data5, data6, data7, data8, data9)
main()                

Thanks,
Tom


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.