First Initiative? The "Open Data" Initiative.

1 view
Skip to first unread message

Ajay Kumar

unread,
Jun 19, 2010, 1:24:32 PM6/19/10
to ICTD Asia, ictd-asi...@googlegroups.com
Hi Guys,
I have discussed this with some of you already.
You must be aware of the World Bank, US and UK openining up their
data.
Ref: www.data.gov
http://data.worldbank.org
http://data.gov.uk

Now since I work in the NGO sector, I am already working on digging up
such data for our programs and research and analysis.

So all these data are of great value for such development
organisations and project.
The problem with Indian government sites is, if at all they have data
available, they are not in machine readable/extractable or programmer
friendly formats.


I am currently working on planning for our Flood/Disaster Risk
Reduction project and evaluating how we can use information to make
quick decisions etc.

So, that made me search Google for "Open Data" initiatives started by
"anyone" in India. Some tried, but were never able to do it or make it
available to public..

This could be a very nice project to start up on and invest our
energies to drive in volunteers.
The idea is to make data available in as many standard formats and
maybe open up APIs also and provide it as a web service. Free of
course :)

Can someone experienced, like Indranil, tell us if its legally allowed
for us to host government hosted data in open formats on our site's?

We had done similar work for http://voteindia.in where we downloaded
legal affidavits scanned copies from government sites of MLAs and hand
typed them into excel sheets and then made it available.

I see
http://india.ictd.asia/opendata already in place ;) and then other
countries volunteers follow suit!

Oh, to start with,
1) http://www.india-water.com/ffs/index.htm - River Basin data. - Can
be used for Flood analysis.
2) http://indiabudget.nic.in/es2009-10/esmain.htm - Economic Survey of
India 2009-2010
3) Old Census Data?


Thoughts/Comments please! :)

Regards,

Ajay Kumar

http://ajuonline.net

Nandeep Mali

unread,
Jun 19, 2010, 1:34:35 PM6/19/10
to ictd...@googlegroups.com, ictd-asi...@googlegroups.com
On Sat, Jun 19, 2010 at 10:54 PM, Ajay Kumar <ajuo...@gmail.com> wrote:
> [snipping good stuff]

I had a discussion with Ajay on this on IRC. This is very good idea. I
had been thinking recently about how most of the eContent on education
by Indian government doesn't always use Unicode (like the NCERT
textbooks). This is on similar lines.

> Can someone experienced, like Indranil, tell us if its legally allowed
> for us to host government hosted data in open formats on our site's?

IMO, thats the deal maker or breaker. If this works out, I guess there
is nothing stopping this.

Even if this is available as a plain data initially it would be good.
APIs and stuff can come in later?

---
Nandeep

Ajay Kumar

unread,
Jun 20, 2010, 1:18:00 AM6/20/10
to ictd-asi...@googlegroups.com, ictd...@googlegroups.com
On 19 June 2010 23:04, Nandeep Mali <nan...@miniorb.in> wrote:
On Sat, Jun 19, 2010 at 10:54 PM, Ajay Kumar <ajuo...@gmail.com> wrote:
> [snipping good stuff]

I had a discussion with Ajay on this on IRC. This is very good idea. I
had been thinking recently about how most of the eContent on education
by Indian government doesn't always use Unicode (like the NCERT
textbooks). This is on similar lines.
 
NCERT Content and especially "Books" I am not sure about them. They are usually Copyright(ed) ?
 

> Can someone experienced, like Indranil, tell us if its legally allowed
> for us to host government hosted data in open formats on our site's?

IMO, thats the deal maker or breaker. If this works out, I guess there
is nothing stopping this.

I asked around and one lawyer friend said, we can provided we attribute the source. (Of course) its freely available public data.
So yes we can. I am trying to confirm this from multiple sources and also see if there is some law that allows this too, clearly stating that.
 

Even if this is available as a plain data initially it would be good.
APIs and stuff can come in later?

Yes. Start small. Rest can keep coming later..

Can we start with the CWC Data first? :)
 
--
Best Regards,

Ajay Kumar

http://www.ajuonline.net

Ankit Guglani

unread,
Jun 21, 2010, 1:49:47 AM6/21/10
to ictd-asi...@googlegroups.com, ictd...@googlegroups.com
Is it legal?

YES, you are only reformatting publicly available data. As long as you provide a disclaimer that you are only reformatting data originating or xxx.gov.in site and cannot vouch for the accuracy of its content and that the data provided may not be the latest given update frequencies, you should be good. Further we should direct the viewer to the original website for data for a more "authoritative" source if they have any qualms over the accuracy of the data. That being said, I am *not* a lawyer.

As for the Flood website, I extracted all the data but while writing a html to excel parser for it I got distracted ... I might be coming to Chennai for a month in July, and setting that up is taking up a lot of my time >.> ...

- Ankit

P.S: This is easy and useful stuff and everyone should play with curl and regex ... ^_^

Ajay Kumar

unread,
Jun 22, 2010, 2:19:31 PM6/22/10
to ictd-asi...@googlegroups.com, ictd...@googlegroups.com
Created the blueprint page on the wiki.
http://ictd.asia/wiki/Open_Data_Initiative_India


Ankit, can you share the data that you were able to extract? And in what format?
And if its incomplete, can you tell next steps so that I can work on that and finish it?

Ajay Kumar

unread,
Jun 24, 2010, 2:33:59 PM6/24/10
to ictd-asi...@googlegroups.com, ictd...@googlegroups.com
Hi,
I tried to gather some more feedback on people who have been either working on similar stuff or know about it, and we kind of agree that we can play around with the data and host it with attribution ofcourse, since its public data.

Now the next step would be to devise a methodology for the same. Anyone with an experience with playing around with data and standards, can help us suggest a better way to approach this.

Please feel free to post it on the wiki page itself.

Also on related note, I could use some Mediawiki expertise here in organising the wiki! Do let me know if anyone wants to help admin it!

Jeff Sonstein

unread,
Jun 24, 2010, 3:28:28 PM6/24/10
to ictd...@googlegroups.com, ictd-asi...@googlegroups.com
On Jun 24, 2010, at 2:33 PM, Ajay Kumar wrote:

> I tried to gather some more feedback on people who have been either working on similar stuff or know about it, and we kind of agree that we can play around with the data and host it with attribution ofcourse, since its public data.

I'll have to trust your judgement
as I do not know Indian law

> Now the next step would be to devise a methodology for the same. Anyone with an experience with playing around with data and standards, can help us suggest a better way to approach this.

a few things I'd suggest might be quite useful
in the intermediate- and long-term
(despite being a bit of work in the short-term)
as we have found in our recent work with
the US Federal Election Commission data/files:

1) develop relationship diagrams (linking)

related data is often stored in multiple files
even files in multiple locations w multiple access methods
and having documentation of what keys link different data
is very valuable in figuring out how to
form coherent & integrated information for the citizen-user
in applications and servers developed by the group

2) develop schemas & convert data to XML-form (defining)

clear definitions of data structure & constraints is critical
to understanding and manipulating it
and is often either present alongside the files in "metadata"
or can be constructed through examination of the data

with the data in XML-form conforming to your schemata
transformations and queries of the data
as well as reformation of logically-linked data
into more useful files/documents
becomes much much easier
as does producing useful outputs for citizen-users

3) develop open RESTful API(s) for access

this is a very powerful way to ensure
client and server code and logic is disentangled
thus enabling others not in the group
to also develop their own clients
as well as enabling the development of
unexpected (at the front end of development)
uses of the data

take Twitter as an example of this...
part of their success is because
they enabled a client-code-focused ecosystem to develop
through separating client & server code
and through keeping an open RESTful API

just a couple of suggestions...

jeffs

--
“Water? Never drink it...
Do you know what fish do in that stuff?”
-- attributed to W. C. Fields --
==========
Prof. Jeff Sonstein

Thejesh GN

unread,
Jun 25, 2010, 8:05:56 AM6/25/10
to ICTD Asia Discuss List
On Jun 19, 10:24 pm, Ajay Kumar <ajuonl...@gmail.com> wrote:
> Hi Guys,
> I have discussed this with some of you already.
> You must be aware of the World Bank, US and UK openining up their
> data.
> Ref:www.data.govhttp://data.worldbank.orghttp://data.gov.uk
>
> Now since I work in the NGO sector, I am already working on digging up
> such data for our programs and research and analysis.

Similar requirement made me to think about this a while back (http://
thejeshgn.com/2010/02/24/open-data-in-india/). I couldn't take time
out to work on the idea. Now that we have a team, we can. To begin
with I have tagged all the openly available data on delicious

http://delicious.com/gnthej/open-data-india

Ajay Kumar

unread,
Jun 30, 2010, 9:18:28 AM6/30/10
to ictd-asi...@googlegroups.com
Hi Thej,

On 25 June 2010 17:35, Thejesh GN <i...@thejeshgn.com> wrote:
Similar requirement made me to think about this a while back (http://
thejeshgn.com/2010/02/24/open-data-in-india/).
Exactly! Same work and thought process led me to "search" and land up on your page and hence I pinged you :) It is also interesting that we have already met in person!

I couldn't take time
out to work on the idea. Now that we have a team, we can. To begin
with I have tagged all the openly available data on delicious

http://delicious.com/gnthej/open-data-india

That is helpful!
Added the link to the Wiki page here: http://ictd.asia/wiki/Open_Data_Initiative_India#Data_Sources

Is there a similar way we can tag and arrange such links in "Mediawiki" ? So that the information isn't scattered at many places, but one single point?
Reply all
Reply to author
Forward
0 new messages