International postal code support

29 views
Skip to first unread message

Pete Warden

unread,
Apr 23, 2013, 12:57:09 PM4/23/13
to dstk-...@googlegroups.com
I've just checked in support for a much wider range of postal codes around the world, eg;

I'm excited because this is the first time I've been able to use messy 'found' data to construct some useful lookups. I've crawled the web looking for things that look like addresses that are accompanied by coordinates, and fed the postal code parts of those as inputs to the algorithm. For example, if I see something like this:

Over CB24 5NF, United Kingdom, at (52.317189, 0.019248)

in a format like hcard or some other structured data, I'll feed in

GB CB24 5NF 52.317189 0.019248

into the algorithm. For a lot of worldwide postal codes I have multiple different points for the same code, so I'm using the centroid of those. In the future it might be possible to construct rudimentary boundaries using something like Schuyler Erle's betashapes. I've got around 2 million postal codes loaded currently, some from my own web crawl and others from Simple Geo's dump of 25 million points of interest.

The downside of this approach is that the data is messy. Don't route your ambulances using it please! The nice part is that it gets better as we feed more points in, and I believe we'll be able to find more and more open sources of these as time goes by. It's also possible we can extend this to street numbering as we get more data.

Please, give it a try, either by playing with the main datasciencetoolkit.org server or by pulling the latest dstk and dstkdata repos from github onto your own copy and re-running populate_database.rb (it will take a while).

cheers,
           Pete
Reply all
Reply to author
Forward
0 new messages