"Publishing Open Data - Do you really need an API?"

69 views
Skip to first unread message

Jury Konga

unread,
Mar 20, 2012, 8:03:19 AM3/20/12
to OpenDataBC
I found this to be an interesting article
http://www.peterkrantz.com/2012/publishing-open-data-api-design/ and I
know this group will have opinions. Looking forward to the
feedback :-)

Cheers Jury

James McKinney

unread,
Mar 20, 2012, 8:56:11 AM3/20/12
to opend...@googlegroups.com
I think raw/bulk downloads should be available as often as possible. APIs, as a means of distributing data, should be reserved for when the underlying data cannot be distributed, or if the API is performing some costly operation to generate its response (which not all developers may have the technical know-how to implement).

Nik Garkusha @OpenHalton

unread,
Mar 20, 2012, 12:42:04 PM3/20/12
to opend...@googlegroups.com
It depends: type of data/sets, size, relationships, insfrastructure, skills to support, frequency of updates, end-use scenarios, etc.
 
Some of the thoughts on where APIs offer advantage:
1. Querying large datasets for relevant bits of data (15GB dataset vs. 100KB slice of that data)
2. Frequent update scenarios (GPS bus tracking, current weather, etc.)
3. Exposing relationships (a slice of multiple data sets with relationships captured)
4. Powering Gov't own applications for citizens (using own APIs for visualizing/interacting w/ data)
 
You can still download data via APIs that are properly designed. However, not having an API should not be roadblock to publishing data. Downloads are fine for most common scenarios, but API offer the next level of dynamic platform for open gov data.   
 
I elaborated on some of these in my post: http://openhalton.ca/2012/03/do-you-need-an-api-it-depends/

Wrate, David LCTZ:EX

unread,
Mar 20, 2012, 12:58:09 PM3/20/12
to opend...@googlegroups.com

Nik, good to hear from you again. I’m happy to see this discussion rise to the surface particularly in the face of dwindling resources and increasing demand for open data.

 

Your last point is interesting. I’m keen to talk more about how govt and the open data community can co-design such applications so that the end product isn’t one where the user response isn’t ‘that’s nice but it would have been better if XYZ had been considered’.

 

Anyone want to chime in one this?

 

David

Herb Lainchbury

unread,
Mar 20, 2012, 1:07:05 PM3/20/12
to opend...@googlegroups.com
APIs are almost always unnecessary and IMHO should be avoided wherever possible by governments for several reasons.

1) they are expensive to create compared providing bulk datasets

2) they are expensive to maintain

3) they won't be used - I usually won't use they extra functionality a publisher's API offers over plain old downloads unless I absolutely have no other choice.  Why?  Because I always want to minimize coupling between systems.  The less I know about how your system works the better.  Leave a bulk file at the end of a URL and I know everything I need to know about how to get the data.  I will almost always write an ETL to get the data into a platform that I know I can control and rely on.

4) they are almost always unnecessary.  There are few datasets so large that I wouldn't want to grab them whole (Kevin's openmoonmap.org comes to mind).  

5) they are too easily a diversion from releasing data....  it's human nature to get "busy" building something new rather than just doing what's required.  As Ward Cunningham, inventor of the Wiki says, "Do the simplest thing that could possibly work".

Good question Jury.  Thanks for asking it.

H

--
Herb Lainchbury
Dynamic Solutions Inc.
www.dynamic-solutions.com
http://twitter.com/herblainchbury

Kevin McArthur

unread,
Mar 20, 2012, 1:34:30 PM3/20/12
to opend...@googlegroups.com
Yes, I have to agree with Herb here. I'm always going to want the data, no matter how large.

My OpenMoonMap.org site is the perfect example... it has to rely on NASA's idea of what the web mapping should look like. It goes down all the time taking the site down with it. I can't use any of the off-the-shelf GIS tools with it because all we have is a basic WMS. I've been on a mission to get the source data for this, but so far no luck, and thus the site wont grow.

That said, I do like an API in a few very specific and limited situations.

Real-time data. GeoRSS feeds, live video feeds, etc.
Authority at point in time data. (Think a lien search, or some other similar process that requires authoritative clearance as of very specific point in time)
API's are also appropriate for creating incoming data streams -- for crowdsourced data input. (eg fish catch reporting would be a cool api)

In almost all other scenarios, API's are generally a massive pain to work with, add a significant point of failure, and add little to no value in the open-data scenario. (We can transfer terabytes easily these days, so data size is really a non-starter)...  Last year we even published a CSV->API converter public domain product, to stop folks chasing their tails on API creation. If some management process says you absolutely have to have an API for CSV type data, it should take about 5 minutes to setup, and no time/resources/money should be dedicated to this task; the underlying data should always be published.

--

Kevin

James McKinney

unread,
Mar 20, 2012, 2:41:48 PM3/20/12
to opend...@googlegroups.com
+1 Herb, Kevin. Nik points out some good examples of cases in which bulk downloads aren't timely (real-time data), don't support the type of query a user wants, or are very large. However, most datasets are not like this. There will likely be more of these as open data grows, but I think they will always remain the exception.

Rhiannon Coppin

unread,
Mar 20, 2012, 5:48:33 PM3/20/12
to opend...@googlegroups.com
I've been following this discussion.. for me (as a journalist) an API only makes sense for yes, large data sets where I only want a small fraction (map information, twitter bits, etc.) or when the data represents something ongoing, real-time, like crime occurrences that are reported daily/weekly or other events that happen sporadically -- like board decisions on doctor misconduct or workplace safety violations. 

On that note, I wan't to show the list a current RFP (1 week left) for "a data exploration and visualization tool" for WorkSafeBC, which is posted on bcbid.gov.bc.ca

WorkSafeBC wants a system to tie into existing database systems and provide levels of access for analysts, developers, and for "consumers." (Is that the public? Is this an OpenData problem/opportunity?)

Anyhow, I can't link to the document online (session-driven site), but I will quote from it here and also try to attach the included diagram and ask you all:
 -- is this RFP asking (specifically) for an API? Is this a case where one makes sense?
or
 -- is there an Open opportunity here if the winner of this bid builds an API as part of the project; and that API is made public?

Quoting:
The tool needs to be able to connect to multiple data sources, perform data mashups, provide advanced visualizations, accommodate geospatial analysis, and provide an easy to use interface for analysts, developers and end consumers.

Technical Specifications:
Ease of Use:  the tool’s overall ease of use for analysts, developers and consumers is paramount.  Analysts and developers should not need to know SQL or any other proprietary languages to leverage the tool.  Consumers should be able to interact easily with the data in a highly visual, easy to use interface.
Pull and Assemble Data:  the tool will allow Analysts to access and join data from multiple data stores and combine external data in the form of flat files, spreadsheets or other formats.  Analysts should be able to access a central repository containing frequently used joins, designs and objects and to access metadata captured from earlier explorations. 
Analyze and Visualize Data:  the tool will provide advanced analysis capabilities and support advanced visualization techniques.  The tool should be able to present geo-coded data on a visual map.  Formatting options for layout and design must be flexible and easy to use.
Package and Distribute Data:  the final output of the data exploration and visualization should be easy to package and distribute.  Consumers should be able to interact with the output while connected to their local area network or when working disconnected.  The output should be exportable to Excel, PowerPoint and other formats.
Mobile Capabilities:  the tool should allow consumers who work in the field or the office to receive and interact with data on mobile devices including tablets and smart phones.
Training and Support:  the training required to leverage the full capabilities of the tool should be manageable for Analysts and minimal for Consumers.  Training should be clearly accessible through e-learning, online help and through local training partners if necessary.
Architecture and Integration:  the tool should be able to integrate with the databases, cubes and semantic layers that are currently in production at WorkSafeBC and provide integration with SharePoint.   The tool and its capabilities need to be embedded within our existing applications and web pages.  The tool should be easy to deploy and scalable for enterprise use.  The initial installation is estimated at approximately 500 users.


Rhiannon

Vancouver, B.C.
WorkSafeBCMarch20120RFPimg.png

Dan Bonab

unread,
Mar 20, 2012, 8:59:24 PM3/20/12
to opend...@googlegroups.com
To me Open Data is still data. so all the innovations that has happened in the past and is happening now could be applied to open data as well. I don't think there is any question about the added value of APIs, the question is more whether the government should undertake that task or not.
even if I download a dataset from somewhere and decide to host it myself and build an application using that data, I would more likely try to create some kind of an API internally ( .. a data layer, etc) to make my codebase more manageable and easier to maintain.
Maybe for a short quick hack or a visualization it is more of a hassle to work with APIs than just a simple csv file, but If you are envisioning a growing need or business or an enterprise level solution there are different challenges you have to deal with that APIs could come very handy.
Also, data is not always a one-shot read-only entity, there is publishing, contributing, securing, maintaining, making sure of availability, integrity and so many other things.
Regarding the "who should do it" question, I agree with most of the people here that government shouldn't be doing it unless in special cases.
the other thing is that I see people bring up GeoRSS and feeds, lets just make sure we don't confuse things, feeds and APIs are two different things.

Cheers

Dan


From: James McKinney <ja...@opennorth.ca>
To: opend...@googlegroups.com
Sent: Tuesday, March 20, 2012 11:41:48 AM
Subject: Re: [OpenDataBC] "Publishing Open Data - Do you really need an API?"

Luke Closs

unread,
Mar 22, 2012, 1:21:40 AM3/22/12
to opend...@googlegroups.com, Jury Konga
I'll pile-on in this thread. I agree with the consensus so far. APIs
are nice to have, but probably shouldn't be built by the gov't except
in rare cases.

Although I would make the point that usually bulk downloads ARE APIs.
Your browser does a HTTP GET of a usually fixed URL to fetch the
resource. It's the simplest possible API. The gov't updates that
file, then you GET it.

So if a API is required, a single bulk download API request meets that
requirement IMO.


I can also talk from personal experience about building Recollect.net.
I would not base my application on any API provided by the gov't for
technical and business reasons. I wouldn't trust that the Gov't APIs
could meet the performance, efficiency and reliability requirements I
need. So I seek raw data, then write importers into my DB schema.
From here, I expose my data model into a RESTful API that we build our
app UI and connectors on.

I would use an API to import data into the app, or poll the API for
new data, but I would always seek to maintain a local copy of the data
with access designed for my specific use.


I want to bounce this back to Jury - how does this blog post and group
consensus inform the projects that you're working on? Are you seeing
similar problems with APIs?

Luke

Jury Konga

unread,
Mar 22, 2012, 2:08:59 AM3/22/12
to opend...@googlegroups.com
Luke - thx for response and kudos to the group for a spirited diacussion. I'll point to Herb's "thx for asking" as there being value in having the discussion. The outcome - I like Nik's comment "it depends" :-)

Luke to your question, I believe that there's huge value in the open data community collaborating with and assisting municipalities (all govs) with designing their open data portals - after all, the data catalogues are primarily meant for your use and those of the research community. A significant decision is being in the ICT community around inhouse data centers versus the cloud or a hybrid and I thought the article had an element that could contribute to the decision making process.

I'm hearing from the group that Gov should stay out of the API biz. That's great if that's the optimum scenario for you - so, back to you, is that the optimum .... or maybe "it depends" :-)

Cheers Jury

Sent from my iPhone

Luke Closs

unread,
Mar 22, 2012, 2:13:49 AM3/22/12
to opend...@googlegroups.com
On Wed, Mar 21, 2012 at 11:08 PM, Jury Konga <jko...@sympatico.ca> wrote:
> Luke to your question, I believe that there's huge value in the open data community collaborating with and assisting municipalities (all govs) with designing their open data portals - after all, the data catalogues are primarily meant for your use and those of the research community. A significant decision is being in the ICT community around inhouse data centers versus the cloud or a hybrid and I thought the article had an element that could contribute to the decision making process.
>
> I'm hearing from the group that Gov should stay out of the API biz. That's great if that's the optimum scenario for you - so, back to you, is that the optimum .... or maybe "it depends" :-)

Gov'ts should provide the "simplest possible API" - the single bulk
download file. No coding fancy things, just put the data at a URL and
keep it up to date.

Only when it's extremely necessary - as noted elsewhere in this thread
- should a more complex API be considered.


But "Bulk download" vs "API" is totally orthogonal to "In-house" vs
"Cloud". A URL is a URL is a URL. I don't care if the bulk download
comes from a municipal server or from amazon s3 cloud hosting. Focus
on the simplest possible API - the bulk download file.

Jury Konga

unread,
Mar 22, 2012, 8:23:40 AM3/22/12
to opend...@googlegroups.com
Thx for feedback Luke. The discussion continues ... here's a blog from Sunlight Labs http://sunlightlabs.com/blog/2012/government-do-you-really-need-an-api/

Sent from my iPad

Reply all
Reply to author
Forward
0 new messages