We've posted about
DestinE earlier. This is a major platform being put up by EU of a digital twin for the Earth with programmatic access to create own workflows and models. Unfortunately it's not open to non-EU countries, at least right now.
They recently posted about a few webinars and I wanted to draw parallels with the CoRE stack.
https://www.youtube.com/watch?v=c9Kg7_ZOZJI - STACK and HOOK services. STACK takes the above raw access one step further to publish the data in zarr and use dask for processing. We hope to do something similar by publishing the data as vector data cubes and then provide a library to interact with the data, although this is already possible in some sense through the various get_data APIs which return structured datapoints for various spatial units. HOOK is about creating your workflows and this is also something we will smoothen out soon, although a guide is published on how to do this
https://core-stack.org/contribute-datasets-pipelines-and-tools/. What we need to improve is to allow anyone to set up their own local CoRE stack backend instance - we are working on this.
https://www.youtube.com/watch?v=LCVfUPxgYFA - ISLET service. This provides VMs to users to run their own stuff. This is probably something we won't get to, i.e. providing compute infra for any one to run their own processes, given compute costs. DestinE of course, even for the above services, provides compute infra as well, like Google Earth Engine.
Overall, this was good to get a perspective. Three significant humps we need to get over:
- We compute all the data so far on-demand on a tehsil by tehsil basis. We need to pre-compute on a pan-India basis. A lot of the data layers are already pan-India but the analytics aren't. We are working on this.
- All the data is historical data. This is because most use-cases we've been looking into have been about problem diagnosis and planning. But we need to move towards live current data like current weather or forecasted weather. Still figuring this out.
- Massive clean-up of the data structure. When we started building all this, we hadn't fully realized the form it'll take. There is a lot of harmonization we need to do so that lists, time-series, etc. are all stored in a standard manner in all underlying layers. This will make programming with the data easier. Currently a lot of string splitting and what not needs to be done.
These are some of the major things to figure out in the new year!
Adi
-- Aaditeshwar SethMicrosoft Chair Professor, Computer Science and Engineering, IIT Delhi
Co-founder, Gram Vaani; Co-founder, CoRE Stack