As an excuse to start learning BeautifulSoup, I whipped together some
python to extract that data. Enjoy!
from BeautifulSoup import BeautifulSoup
import urllib2
page = urllib2.urlopen("
http://providencesunshine.com/")
soup = BeautifulSoup(page)
# map cell index to a property name
cell_mapping = {0: 'date', 1: 'plat_lot', 2: 'taxpayer', 3: 'amount',
4: 'notes'}
# the data table has no id, so search based on attributes
table = soup.find('table', border='1', width="95%")
if not table:
raise Exception('The HTML structure appears to have changed. I can
\'t figure it out.')
else:
rows = []
# skip the first row with headers
for row in table.findAll('tr')[1:]:
cell_index = 0
this_row = {}
for cell in row.findAll('td'):
this_row[cell_mapping[cell_index]] = cell.find
('font').contents
cell_index += 1
rows.append(this_row)
print rows
On May 23, 1:43 pm, "Allan T." <
akt...@gmail.com> wrote:
> Providence is slowly opening up their data, with feeds for public
> notices and meetings on ProvidenceRI.com. I also noticed a new sitehttp://ProvidenceSunshine.comthat contains tax adjustments which were