Scraping spatial data

310 views
Skip to first unread message

Martyn Clark

unread,
Sep 19, 2014, 5:43:46 AM9/19/14
to web-s...@googlegroups.com
Hi,

I'm experimenting with scraping spatial data from a website: http://www.majidata.go.ke/town.php?MID=MTE=&SMID=MTM=

Essentially the page contains a bunch of mapped information (polygons drawn around different towns in Kenya) that I want to be able to extract so that I can use in my own GIS software. I've had a go trying to create selectors for these shapes but can't seem to select the elements I'm interested in. Is Web Scraper suitable for this kind of application/has anyone had any experience scraping this kind of data from a website?

Apologies if this query is a little light in detail, happy to provide more as necessary.

Many thanks for any help in advance.

Cheers

Marty


This message and its attachments are private and confidential. If you have received this message in error, please notify the sender and remove it and its attachments from your system.

The University of Westminster is a charity and a company limited by guarantee. Registration number: 977818 England. Registered Office: 309 Regent Street, London W1B 2UW.

Mārtiņš Balodis

unread,
Sep 19, 2014, 1:23:13 PM9/19/14
to Martyn Clark, web-s...@googlegroups.com
Hi,
Web scraper doesn't have selectors that could extract data from a map. The problem with these kind of maps is that they are rendered as images and there is no way you could select something. In this case the site loads a polygons coordinates from the server and then sends them to the map object that renders it as an image. The best way for extracting data from a map would be by directly talking to the map object. This would require a custom implementation for every map engine available. I'll add this as a feature request but I won't be implementing this right now.

I looked at how the web page communicates with the server and found that it loads polygons from links like these:
http://www.majidata.go.ke/ajax-maps.php?reg=towns&id=760


With a range start url you could try to to scrape these links and get the polygon data. Here is an example sitemap that scrapes links with ids from 300 to 760. Most of these links are empty though. Also you should change the range to 300-900 to scrape all of these polygons.

{"_id":"majidata-go-ke","startUrl":"http://www.majidata.go.ke/ajax-maps.php?reg=towns&id=[300-760]","selectors":[{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"data","selector":"body","regex":"","delay":""}]}

P.S.
You are the first one who has asked to scrape a map.



--
You received this message because you are subscribed to the Google Groups "Web Scraper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web-scraper...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Martyn Clark

unread,
Sep 19, 2014, 9:00:45 PM9/19/14
to web-s...@googlegroups.com, martyn...@my.westminster.ac.uk
Hey Mārtiņš,

Thanks for getting back to me. I had a feeling you were going to say that. I've been digging around in the website myself and I can get hold of the polygons no problem, it's actually parsing them into something I can use that is the problem. I could probably use Python to parse somehow, I'm just not that hot in python!

I'm surprised that I'm the first to ask about scraping mapped data - I find this is something I am drawn to doing A LOT. If not scraping data from maps, then mapping the data I want to scrape. For example, the majidata website has all sorts of data embedded in it - some spatial, some tabular - yet all relating to physical locations - which is what I'm interested in. I find the website incredibly non-user-friendly (sorry majidata!). I feel this is probably by design i.e. to protect the data - which is frustrating as it is the kind of data that I feel should be open and accessible - hence me resorting to scraping.

Thanks for your feedback though, Web Scraper is very cool and I'm really keen to play around with it a lot more.

Cheers

Martyn
Reply all
Reply to author
Forward
0 new messages