I searched on net for how to extract the imdb gridded data (Sai Kirshna thanks for ur program too)
" python package.. and it looks like that one takes care of a lot of the complexity for us! Big thanks to Saswati Nandi
who authored it.
With a bit of looking up
I was able to bring the data out to a simple flat table format that we all know and love.
Sharing the code of a sample extraction here:
I took out some data for nearby Pune, and have done a quick viz in this google doc:
1. There is a lot of junk data in there : for 1950s yrs that I saw it was over 60%. Fixed values like -999 are filled in where there wasn't any reading for that particular place and date - probably because the data system being used couldn't work with nulls. So it's important to get rid of the junk data points before moving forward.
2. If I 7z-zipped the flat table csv after removing the junk data, and whadyaknow, the result is smaller. (tip for folks managing this department in IMD. Also, check out HDF5 format)
3. There's 3 data items: Max temp, min temp, rainfall.
4. Temp data is resolved to .5 lat-long points and available from 1950. Rain data is resolved to .25 lat-long points and available from 1901.
As usual I'm going to setup a week-long script (going year by year only, with plenty of pauses so that it doesn't overload the IMD site, mind!), extract it all and load it up into a Postgresql DB / API etc for accessing.
Would anyone out there like to collaborate on a visualization?
We have data that is in point lat-long form, and is temporal with date-wise resolution.
So, map-based combined with time series and/or animation would be good.