I've just checked in support for a much wider range of postal codes around the world, eg;
I'm excited because this is the first time I've been able to use messy 'found' data to construct some useful lookups. I've crawled the web looking for things that look like addresses that are accompanied by coordinates, and fed the postal code parts of those as inputs to the algorithm. For example, if I see something like this:
Over CB24 5NF, United Kingdom, at (52.317189, 0.019248)
in a format like
hcard or some other structured data, I'll feed in
GB CB24 5NF 52.317189 0.019248
into the algorithm. For a lot of worldwide postal codes I have multiple different points for the same code, so I'm using the centroid of those. In the future it might be possible to construct rudimentary boundaries using something like Schuyler Erle's
betashapes. I've got around 2 million postal codes loaded currently, some from my own web crawl and others from Simple Geo's dump of 25 million points of interest.
The downside of this approach is that the data is messy. Don't route your ambulances using it please! The nice part is that it gets better as we feed more points in, and I believe we'll be able to find more and more open sources of these as time goes by. It's also possible we can extend this to street numbering as we get more data.
Please, give it a try, either by playing with the main
datasciencetoolkit.org server or by pulling the latest dstk and dstkdata repos from github onto your own copy and re-running populate_database.rb (it will take a while).
cheers,
Pete