<li><form method="post" action="recruit-search-results"><button type="submit" name="start" value="100">Next »<span class="value">100</span></button><input type="hidden" name="sport" value="football">
<input type="hidden" name="year" value="2011">
<input type="hidden" name="committed" value="1">
<input type="hidden" name="uncommitted" value="1">
<input type="hidden" name="loc" value="City, State or Zip Code">
<input type="hidden" name="hsprospects" value="1">
<input type="hidden" name="prepprospects" value="1">
<input type="hidden" name="jucoprospects" value="1">
<input type="hidden" name="start" value="0"></form></li>
On Wednesday, July 18, 2012 12:06:31 PM UTC-4, LunkRat wrote:
> Ah, I see. This works:
> from bs4 import BeautifulSoup
> import urllib2
> page = urllib2.urlopen('
> http://rivals.yahoo.com/ncaa/football/recruiting/recruit-search-resul...
> ')
> soup = BeautifulSoup(page)
> evens = soup.find_all('tr', 'even')
> odds = soup.find_all('tr', 'odd')
> rows = evens + odds
> for row in rows:
> tdlist = row.find_all('td')
> data1 = tdlist[0].string
> data2 = row.find('th').a.string
> data3 = tdlist[1].contents[0].string
> data4 = tdlist[1].contents[1].string
> data5 = tdlist[2].string
> data6 = tdlist[3].string
> data7 = tdlist[4].string
> if tdlist[5].span is not None:
> data8 = tdlist[5].span.string
> else:
> data8 = ""
> data9 = tdlist[6].string
> data10 = tdlist[7].string
> data11 = tdlist[8].a.string
> print '%s %s %s %s %s %s %s %s %s %s %s' % (data1, data2, data3,
> data4, data5, data6, data7, data8, data9, data10, data11)
> Link
> On Wed, Jul 18, 2012 at 9:18 AM, Tom <boot...@gmail.com> wrote:
>> Thanks Link,
>> That did store each td tag as an individual list... It did not
>> loop through the snippet of code but I can work that out later... Anyways..
>> I can implement this in what I want to accomplish but the reason I was
>> trying to "target" the classes 'odd' and 'even' is that there are plenty of
>> other <td> tags throughout the webpage that I am trying to bypass... and
>> the only way I can target the data taht I want to extract is by focusing in
>> on the odd/even classes...
>> for your reference ###here is the website I want to parse
>> http://rivals.yahoo.com/ncaa/football/recruiting/recruit-search-resul...
>> Your input will help though.. My goal is to just correctly extract the
>> data into lists then I will go back and deal with writing the user agents
>> and all the other stuff...
>> On Wednesday, July 18, 2012 9:39:35 AM UTC-4, LunkRat wrote:
>>> I'm not sure why you need to care about odd or even for the output you
>>> are looking for. Here is pseudocode (I did not test this) to just give a
>>> general idea of how I would do it (but I am still very novice with bs4 and
>>> python, so this is probably not the best way, but should get you started).
>>> rows = soup.find_all('tr')
>>> for row in rows:
>>> tdlist = row.find_all('td')
>>> data1 = tdlist[0].string
>>> data2 = row.find('th').a.string
>>> data3 = tdlist[1].string
>>> data4 = tdlist[1].em.string
>>> data5 = tdlist[2].string
>>> data6 = tdlist[3].string
>>> data7 = tdlist[4].string
>>> data8 = tdlist[5].span.string
>>> data9 = tdlist[6].string
>>> data10 = tdlist[7].string
>>> data11 = tdlist[8].a.string
>>> # Cram them into a list:
>>> wordlist = [data1, data2, data3, data4, **
>>> data5, data6, data7, data8, **data9, data10, data11]
>>> # Or just print them as straight strings:
>>> print '%s %s %s %s %s %s %s %s %s %**s %s' %
>>> (data1, data2, data3, data4, **data5, data6, data7, data8, **
>>> data9, data10, data11)
>>> Hope that helps!
>>> On Wed, Jul 18, 2012 at 8:05 AM, Tom <boot...@gmail.com> wrote:
>>>> Sorry I somehow posted my thread before I was done....
>>>> But to finish off my question... I ultimately want the data cleanly
>>>> written like this....
>>>> QB Tyrone Swoopes Whitewright, Texas Whitewright 6'5" 229 4.8 5 stars
>>>> 6.1 1 Texas #then I want to start a new line '\n'
>>>> LB Reuben Foster Auburn, Alabama Auburn 6'2" 228 N/A 5 stars 6.1 1
>>>> Auburn
>>>> OL Khaliel Rodgers Elkton, Maryland Eastern Christian Academy 6'3" 300
>>>> N/A 4 stars 6.0 1 USC
>>>> etc... any tips or pointers will be appreciated!
>>>> Thanks,
>>>> Tom
>>>> On Wednesday, July 18, 2012 8:58:18 AM UTC-4, Tom wrote:
>>>>> Im new to bs4 and I am having issues with extracting the text into
>>>>> lists...
>>>>> Here is a snippet of my code..
>>>>> from bs4 import BeautifulSoup
>>>>> code = """<tr class="odd"><td>QB</td><<th scope=row>a href="
>>>>> http://rivals.yahoo.com/****ncaa/football/recruiting/**playe**
>>>>> r-Tyrone-Swoopes-124071;_**ylt=**A**p1EPVj2dmRRkPO4OqcVVshIPZB4<http://rivals.yahoo.com/ncaa/football/recruiting/player-Tyrone-Swoope...>"
>>>>> >Tyrone Swoopes</a></th><td>Whitewrigh****t, Texas<em>Whitewright</em>
>>>>> </td>****<td>6'5"</td><td>229</td><td>4****.8</td><td><span
>>>>> class="stars ysr-results-5-star">5 stars</span></td><td>6.1</td><****
>>>>> td>1</td><td><div class="wrapper"><a href="http://rivals.yahoo.com/***
>>>>> *ncaa/football/recruiting/**commi**tments/2013/texas-83;_**ylt=**AmYu*
>>>>> *ZOKVFsFgtKnD1LRq84JIPZB4?&**spor**t=1<http://rivals.yahoo.com/ncaa/football/recruiting/commitments/2013/tex...>"
>>>>> class="committed">Texas</a></**d**iv></td></tr><tr class="even"><td>LB
>>>>> </td><th scope=row><a href="http://rivals.yahoo.com/****
>>>>> ncaa/football/recruiting/**playe**r-Reuben-Foster-108287;_**ylt=**Ak**
>>>>> 6Ntf2pP47bHwK70Ea3buRIPZB4<http://rivals.yahoo.com/ncaa/football/recruiting/player-Reuben-Foster...>"
>>>>> >Reuben Foster</a></th><td>Auburn, Alabama<em>Auburn</em></td><**td**>
>>>>> 6'2"</td><td>228</td><td>N/**A<**/td><td><span class="stars
>>>>> ysr-results-5-star">5 stars</span></td><td>6.1</td><****td>1</td><td><div
>>>>> class="wrapper"><a href="http://rivals.yahoo.com/****
>>>>> ncaa/football/recruiting/**commi**tments/2013/auburn-75;_**ylt=**
>>>>> AiA6iZ4bW_**IhMI4ZjLBQHARIPZB4?&**sport=1<http://rivals.yahoo.com/ncaa/football/recruiting/commitments/2013/aub...>"
>>>>> class="committed">Auburn</a></****div></td></tr>"""
>>>>> ##### I highlighted the tags to help guide you through my intentions..
>>>>> I want to target the <tr class="odd"> and <tr class="even"> . I want
>>>>> to extract the text between the <td> tags and put them into an
>>>>> individual list... (I will also need to extract text from the <th
>>>>> scope=row> tag into a list too.. but I can always figure that out
>>>>> later)####
>>>>> soup = BeautifulSoup(code) #read in my code
>>>>> odds = soup.find_all('tr', attrs={'class': 'odd'}) #target odd
>>>>> classes with <td> tags
>>>>> evens = soup.find_all('tr', attrs={'class': 'even'}) #target even
>>>>> classes with <td> tags
>>>>> for i in odds:
>>>>> word = []
>>>>> for tmp in odds.findall(text=True):
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "beautifulsoup" group.
>>>> To view this discussion on the web visit https://groups.google.com/d/**
>>>> msg/beautifulsoup/-/**zjELVdvNv30J<https://groups.google.com/d/msg/beautifulsoup/-/zjELVdvNv30J>
>>>> .
>>>> To post to this group, send email to beautifulsoup@googlegroups.com**.
>>>> To unsubscribe from this group, send email to
>>>> beautifulsoup+unsubscribe@**googlegroups.com<beautifulsoup%2Bunsubscribe@go oglegroups.com>
>>>> .
>>>> For more options, visit this group at http://groups.google.com/**
>>>> group/beautifulsoup?hl=en<http://groups.google.com/group/beautifulsoup?hl=en>
>>>> .
>>> --
>>> Link Swanson
>>> Must Build Digital
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "beautifulsoup" group.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msg/beautifulsoup/-/WGDG3p6w184J.
>> To post to this group, send email to beautifulsoup@googlegroups.com.
>> To unsubscribe from this group, send email to
>> beautifulsoup+unsubscribe@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/beautifulsoup?hl=en.
> --
> Link Swanson
> Must Build Digital