Scraping across drop-down menu

2,060 views
Skip to first unread message

Michael Ross

unread,
Oct 9, 2014, 4:17:13 PM10/9/14
to web-s...@googlegroups.com
Hello,

I'm very impressed with the work you've done, and very excited to use your extension for a project of mine.

I'm looking at getting a set of data off of this website, which I feel ought to be decently easy to scrape but I can't quite figure it out.


What I'm looking for is to get the data from the table for each month for each year between 1980-2010. I haven't been able to get more than one month of data (probably because of the dropdown menus), and it appears to scrape the data from the table in a random order. Do you have any suggestions to help?

Thanks,

Michael

Mārtiņš Balodis

unread,
Oct 10, 2014, 4:41:31 AM10/10/14
to Michael Ross, web-s...@googlegroups.com
Hi,
Web Scraper cannot handle drop down menus but usually there is a solution for those that involves creating range start urls. If you look at the sites url you can see that year and month are simply numeric values and by changing them you can load another year/month. When specifying the start url for a sitemap you can specify a numeric range in it like this - [1-12]. In these cases Web Scraper will start with 12 urls. Here is a sitemap that scrapes year 2010. If you want to add more years then add more start urls for each year (only one range is allowed in the url). 
Also the site could be scraped using element click selector that would click the "next month" button but the range url solution is much better.

{"_id":"climate-weather-gc-ca","startUrl":"http://climate.weather.gc.ca/climateData/dailydata_e.html?timeframe=2&Prov=AB&StationID=1865&dlyRange=1959-05-01%7C2012-04-11&cmdB1=Go&Year=2010&Month=[1-12]&cmdB1=Go#","selectors":[{"parentSelectors":["_root"],"type":"SelectorElement","multiple":true,"id":"table-row","selector":"table.wet-boew-zebra tr:nth-of-type(n+2):not(:has(th))","delay":""},{"parentSelectors":["table-row"],"type":"SelectorText","multiple":false,"id":"day","selector":"td:nth-of-type(1)","regex":"","delay":""},{"parentSelectors":["table-row"],"type":"SelectorText","multiple":false,"id":"temp-max","selector":"td:nth-of-type(2)","regex":"","delay":""},{"parentSelectors":["table-row"],"type":"SelectorText","multiple":false,"id":"temp-min","selector":"td:nth-of-type(3)","regex":"","delay":""},{"parentSelectors":["table-row"],"type":"SelectorText","multiple":false,"id":"temp-mean","selector":"td:nth-of-type(4)","regex":"","delay":""},{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"year","selector":"form#frmDateNav1 select.display-inline.align-center option:selected","regex":"\\d+","delay":""},{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"month","selector":"select#Month1.display-inline option:selected","regex":"","delay":""}]}

--
You received this message because you are subscribed to the Google Groups "Web Scraper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web-scraper...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Ross

unread,
Oct 10, 2014, 10:36:06 AM10/10/14
to web-s...@googlegroups.com, mikerob...@gmail.com
This is wonderful! Thank you so much.
Reply all
Reply to author
Forward
0 new messages