How do I scrape <td> data?

85 views
Skip to first unread message

Mark Kapono

unread,
Dec 22, 2016, 11:43:03 AM12/22/16
to beautifulsoup
Hello -- I am attempting to scrape some weather data from NWS, and have trouble extracting table data. Cannot find something specific to my needs in my research, and given I am inexperienced with BeautifulSoup, possibly not even recognized a solution. I want to extract data and export to a gsheet via GDOCS API. All I need is the data, without the headers, the headers which are left of data cell. Below is html of interest (attached), followed by piece of code that works, however lacks the "current_condition_detail" table, which I desire extracted.

HTML of interest:

<div id="current_conditions-summary" class="pull-left">
<img src="newimages/large/nsct.png" alt="" class="pull-left">
<p class="myforecast-current">Partly Cloudy</p>
<p class="myforecast-current-lrg">76°F</p>
<p class="myforecast-current-sm">24°C</p>
</div>
<div id="current_conditions_detail" class="pull-left">
   <table>
            <tbody><tr>
            <td class="text-right"><b>Humidity</b></td>
            <td>69%</td>
            </tr>
            <tr>
            <td class="text-right"><b>Wind Speed</b></td>
            <td>NE 13 mph</td>
            </tr>
            <tr>
            <td class="text-right"><b>Barometer</b></td>
            <td>30.10 in (1019.2 mb)</td>
            </tr>
            <tr>
            <td class="text-right"><b>Dewpoint</b></td>
            <td>65°F (18°C)</td>
            </tr>
            <tr>
            <td class="text-right"><b>Visibility</b></td>
            <td>10.00 mi</td>
            </tr>
            <tr><td class="text-right"><b>Heat Index</b></td><td>78°F (26°C)</td></tr>            <tr>
            <td class="text-right"><b>Last update</b></td>
            <td> 20 Dec 10:53 pm HST </td>
            </tr>
            </tbody></table>
</div>

Code below works:

#!/usr/bin/python

import requests
from bs4 import BeautifulSoup


soup=BeautifulSoup(page.content, 'html.parser')

current_summ=soup.find(id="current_conditions-summary")

sky=current_summ.find(class_="myforecast-current").get_text()
tempF=current_summ.find(class_="myforecast-current-lrg").get_text().encode('utf-8')
tempC=current_summ.find(class_="myforecast-current-sm").get_text().encode('utf-8')

print("HNL airport: " + sky)
print(tempF + " / " + tempC)


So, from above HTML, what I would like to extract and print is the "<td>69%</td> for the Humidity data, as well as other items contained in table. Any pointers, suggestions would be much appreciated. -- Mark.
graphic.JPG
html.JPG
Reply all
Reply to author
Forward
0 new messages