I think raw/bulk downloads should be available as often as possible. APIs, as a means of distributing data, should be reserved for when the underlying data cannot be distributed, or if the API is performing some costly operation to generate its response (which not all developers may have the technical know-how to implement).
It depends: type of data/sets, size, relationships, insfrastructure, skills to support, frequency of updates, end-use scenarios, etc.
Some of the thoughts on where APIs offer advantage: 1. Querying large datasets for relevant bits of data (15GB dataset vs. 100KB slice of that data) 2. Frequent update scenarios (GPS bus tracking, current weather, etc.) 3. Exposing relationships (a slice of multiple data sets with relationships captured) 4. Powering Gov't own applications for citizens (using own APIs for visualizing/interacting w/ data)
You can still download data via APIs that are properly designed. However, not having an API should not be roadblock to publishing data. Downloads are fine for most common scenarios, but API offer the next level of dynamic platform for open gov data.
Nik, good to hear from you again. I’m happy to see this discussion rise to the surface particularly in the face of dwindling resources and increasing demand for open data.
Your last point is interesting. I’m keen to talk more about how govt and the open data community can co-design such applications so that the end product isn’t one where the user response isn’t ‘that’s nice but it would have been better if XYZ had been considered’.
Anyone want to chime in one this?
David
From: opendatabc@googlegroups.com [mailto:opendatabc@googlegroups.com] On Behalf Of Nik Garkusha @OpenHalton
Sent: Tuesday, March 20, 2012 9:42 AM
To: opendatabc@googlegroups.com
Subject: [OpenDataBC] Re: "Publishing Open Data - Do you really need an API?"
It depends: type of data/sets, size, relationships, insfrastructure, skills to support, frequency of updates, end-use scenarios, etc.
Some of the thoughts on where APIs offer advantage:
1. Querying large datasets for relevant bits of data (15GB dataset vs. 100KB slice of that data)
2. Frequent update scenarios (GPS bus tracking, current weather, etc.)
3. Exposing relationships (a slice of multiple data sets with relationships captured)
4. Powering Gov't own applications for citizens (using own APIs for visualizing/interacting w/ data)
You can still download data via APIs that are properly designed. However, not having an API should not be roadblock to publishing data. Downloads are fine for most common scenarios, but API offer the next level of dynamic platform for open gov data.
APIs are almost always unnecessary and IMHO should be avoided wherever possible by governments for several reasons.
1) they are expensive to create compared providing bulk datasets
2) they are expensive to maintain
3) they won't be used - I usually won't use they extra functionality a publisher's API offers over plain old downloads unless I absolutely have no other choice. Why? Because I always want to minimize coupling between systems. The less I know about how your system works the better. Leave a bulk file at the end of a URL and I know everything I need to know about how to get the data. I will almost always write an ETL to get the data into a platform that I know I can control and rely on.
4) they are almost always unnecessary. There are few datasets so large that I wouldn't want to grab them whole (Kevin's openmoonmap.org comes to mind).
5) they are too easily a diversion from releasing data.... it's human nature to get "busy" building something new rather than just doing what's required. As Ward Cunningham, inventor of the Wiki says, "Do the simplest thing that could possibly work".
On Tue, Mar 20, 2012 at 5:56 AM, James McKinney <ja...@opennorth.ca> wrote: > I think raw/bulk downloads should be available as often as possible. APIs, > as a means of distributing data, should be reserved for when the underlying > data cannot be distributed, or if the API is performing some costly > operation to generate its response (which not all developers may have the > technical know-how to implement).
Yes, I have to agree with Herb here. I'm always going to want the data, no matter how large.
My OpenMoonMap.org site is the perfect example... it has to rely on NASA's idea of what the web mapping should look like. It goes down all the time taking the site down with it. I can't use any of the off-the-shelf GIS tools with it because all we have is a basic WMS. I've been on a mission to get the source data for this, but so far no luck, and thus the site wont grow.
That said, I do like an API in a few very specific and limited situations.
Real-time data. GeoRSS feeds, live video feeds, etc. Authority at point in time data. (Think a lien search, or some other similar process that requires authoritative clearance as of very specific point in time) API's are also appropriate for creating incoming data streams -- for crowdsourced data input. (eg fish catch reporting would be a cool api)
In almost all other scenarios, API's are generally a massive pain to work with, add a significant point of failure, and add little to no value in the open-data scenario. (We can transfer terabytes easily these days, so data size is really a non-starter)... Last year we even published a CSV->API converter public domain product, to stop folks chasing their tails on API creation. If some management process says you absolutely have to have an API for CSV type data, it should take about 5 minutes to setup, and no time/resources/money should be dedicated to this task; the underlying data should always be published.
> APIs are almost always unnecessary and IMHO should be avoided wherever > possible by governments for several reasons.
> 1) they are expensive to create compared providing bulk datasets
> 2) they are expensive to maintain
> 3) they won't be used - I usually won't use they extra functionality a > publisher's API offers over plain old downloads unless I absolutely > have no other choice. Why? Because I always want to minimize > coupling between systems. The less I know about how your system works > the better. Leave a bulk file at the end of a URL and I know > everything I need to know about how to get the data. I will almost > always write an ETL to get the data into a platform that I know I can > control and rely on.
> 4) they are almost always unnecessary. There are few datasets so > large that I wouldn't want to grab them whole (Kevin's openmoonmap.org > <http://openmoonmap.org> comes to mind).
> 5) they are too easily a diversion from releasing data.... it's human > nature to get "busy" building something new rather than just doing > what's required. As Ward Cunningham, inventor of the Wiki says, "Do > the simplest thing that could possibly work".
> Good question Jury. Thanks for asking it.
> H
> On Tue, Mar 20, 2012 at 5:56 AM, James McKinney <ja...@opennorth.ca > <mailto:ja...@opennorth.ca>> wrote:
> I think raw/bulk downloads should be available as often as > possible. APIs, as a means of distributing data, should be > reserved for when the underlying data cannot be distributed, or if > the API is performing some costly operation to generate its > response (which not all developers may have the technical know-how > to implement).
+1 Herb, Kevin. Nik points out some good examples of cases in which bulk downloads aren't timely (real-time data), don't support the type of query a user wants, or are very large. However, most datasets are not like this. There will likely be more of these as open data grows, but I think they will always remain the exception.
> Yes, I have to agree with Herb here. I'm always going to want the data, no matter how large.
> My OpenMoonMap.org site is the perfect example... it has to rely on NASA's idea of what the web mapping should look like. It goes down all the time taking the site down with it. I can't use any of the off-the-shelf GIS tools with it because all we have is a basic WMS. I've been on a mission to get the source data for this, but so far no luck, and thus the site wont grow.
> That said, I do like an API in a few very specific and limited situations.
> Real-time data. GeoRSS feeds, live video feeds, etc. > Authority at point in time data. (Think a lien search, or some other similar process that requires authoritative clearance as of very specific point in time) > API's are also appropriate for creating incoming data streams -- for crowdsourced data input. (eg fish catch reporting would be a cool api)
> In almost all other scenarios, API's are generally a massive pain to work with, add a significant point of failure, and add little to no value in the open-data scenario. (We can transfer terabytes easily these days, so data size is really a non-starter)... Last year we even published a CSV->API converter public domain product, to stop folks chasing their tails on API creation. If some management process says you absolutely have to have an API for CSV type data, it should take about 5 minutes to setup, and no time/resources/money should be dedicated to this task; the underlying data should always be published.
> --
> Kevin
> On 12-03-20 10:07 AM, Herb Lainchbury wrote:
>> APIs are almost always unnecessary and IMHO should be avoided wherever possible by governments for several reasons.
>> 1) they are expensive to create compared providing bulk datasets
>> 2) they are expensive to maintain
>> 3) they won't be used - I usually won't use they extra functionality a publisher's API offers over plain old downloads unless I absolutely have no other choice. Why? Because I always want to minimize coupling between systems. The less I know about how your system works the better. Leave a bulk file at the end of a URL and I know everything I need to know about how to get the data. I will almost always write an ETL to get the data into a platform that I know I can control and rely on.
>> 4) they are almost always unnecessary. There are few datasets so large that I wouldn't want to grab them whole (Kevin's openmoonmap.org comes to mind).
>> 5) they are too easily a diversion from releasing data.... it's human nature to get "busy" building something new rather than just doing what's required. As Ward Cunningham, inventor of the Wiki says, "Do the simplest thing that could possibly work".
>> Good question Jury. Thanks for asking it.
>> H
>> On Tue, Mar 20, 2012 at 5:56 AM, James McKinney <ja...@opennorth.ca> wrote: >> I think raw/bulk downloads should be available as often as possible. APIs, as a means of distributing data, should be reserved for when the underlying data cannot be distributed, or if the API is performing some costly operation to generate its response (which not all developers may have the technical know-how to implement).
I've been following this discussion.. for me (as a journalist) an API only makes sense for yes, large data sets where I only want a small fraction (map information, twitter bits, etc.) or when the data represents something ongoing, real-time, like crime occurrences that are reported daily/weekly or other events that happen sporadically -- like board decisions on doctor misconduct or workplace safety violations.
On that note, I wan't to show the list a current RFP (1 week left) for "a data exploration and visualization tool" for WorkSafeBC, which is posted on bcbid.gov.bc.ca
WorkSafeBC wants a system to tie into existing database systems and provide levels of access for analysts, developers, and for "consumers." (Is that the public? Is this an OpenData problem/opportunity?)
Anyhow, I can't link to the document online (session-driven site), but I will quote from it here and also try to attach the included diagram and ask you all: -- is this RFP asking (specifically) for an API? Is this a case where one makes sense? or -- is there an Open opportunity here if the winner of this bid builds an API as part of the project; and that API is made public?
Quoting:
> The tool needs to be able to connect to multiple data sources, perform > data mashups, provide advanced visualizations, accommodate geospatial > analysis, and provide an easy to use interface for analysts, developers and > end consumers.
*Technical Specifications:*
*Ease of Use: * the tool’s overall ease of use for analysts, developers and
> consumers is paramount. Analysts and developers should not need to know > SQL or any other proprietary languages to leverage the tool. Consumers > should be able to interact easily with the data in a highly visual, easy to > use interface. > *Pull and Assemble Data:* the tool will allow Analysts to access and > join data from multiple data stores and combine external data in the form > of flat files, spreadsheets or other formats. Analysts should be able to > access a central repository containing frequently used joins, designs and > objects and to access metadata captured from earlier explorations.
*Analyze and Visualize Data: * the tool will provide advanced analysis
> capabilities and support advanced visualization techniques. The tool > should be able to present geo-coded data on a visual map. Formatting > options for layout and design must be flexible and easy to use. > *Package and Distribute Data:* the final output of the data exploration > and visualization should be easy to package and distribute. Consumers > should be able to interact with the output while connected to their local > area network or when working disconnected. The output should be exportable > to Excel, PowerPoint and other formats. > *Mobile Capabilities:* the tool should allow consumers who work in the > field or the office to receive and interact with data on mobile devices > including tablets and smart phones. > *Training and Support:* the training required to leverage the full > capabilities of the tool should be manageable for Analysts and minimal for > Consumers. Training should be clearly accessible through e-learning, > online help and through local training partners if necessary. > *Architecture and Integration: * the tool should be able to integrate > with the databases, cubes and semantic layers that are currently in > production at WorkSafeBC and provide integration with SharePoint. The > tool and its capabilities need to be embedded within our existing > applications and web pages. The tool should be easy to deploy and scalable > for enterprise use. The initial installation is estimated at approximately > 500 users.
On Tue, Mar 20, 2012 at 11:41 AM, James McKinney <ja...@opennorth.ca> wrote: > +1 Herb, Kevin. Nik points out some good examples of cases in which bulk > downloads aren't timely (real-time data), don't support the type of query a > user wants, or are very large. However, most datasets are not like this. > There will likely be more of these as open data grows, but I think they > will always remain the exception.
> On 2012-03-20, at 1:34 PM, Kevin McArthur wrote:
> Yes, I have to agree with Herb here. I'm always going to want the data, > no matter how large.
> My OpenMoonMap.org site is the perfect example... it has to rely on > NASA's idea of what the web mapping should look like. It goes down all the > time taking the site down with it. I can't use any of the off-the-shelf GIS > tools with it because all we have is a basic WMS. I've been on a mission to > get the source data for this, but so far no luck, and thus the site wont > grow.
> That said, I do like an API in a few very specific and limited situations.
> Real-time data. GeoRSS feeds, live video feeds, etc. > Authority at point in time data. (Think a lien search, or some other > similar process that requires authoritative clearance as of very specific > point in time) > API's are also appropriate for creating incoming data streams -- for > crowdsourced data input. (eg fish catch reporting would be a cool api)
> In almost all other scenarios, API's are generally a massive pain to work > with, add a significant point of failure, and add little to no value in the > open-data scenario. (We can transfer terabytes easily these days, so data > size is really a non-starter)... Last year we even published a CSV->API > converter public domain product, to stop folks chasing their tails on API > creation. If some management process says you absolutely have to have an > API for CSV type data, it should take about 5 minutes to setup, and no > time/resources/money should be dedicated to this task; the underlying data > should always be published.
> --
> Kevin
> On 12-03-20 10:07 AM, Herb Lainchbury wrote:
> APIs are almost always unnecessary and IMHO should be avoided wherever > possible by governments for several reasons.
> 1) they are expensive to create compared providing bulk datasets
> 2) they are expensive to maintain
> 3) they won't be used - I usually won't use they extra functionality a > publisher's API offers over plain old downloads unless I absolutely have no > other choice. Why? Because I always want to minimize coupling between > systems. The less I know about how your system works the better. Leave a > bulk file at the end of a URL and I know everything I need to know about > how to get the data. I will almost always write an ETL to get the data > into a platform that I know I can control and rely on.
> 4) they are almost always unnecessary. There are few datasets so large > that I wouldn't want to grab them whole (Kevin's openmoonmap.org comes to > mind).
> 5) they are too easily a diversion from releasing data.... it's human > nature to get "busy" building something new rather than just doing what's > required. As Ward Cunningham, inventor of the Wiki says, "Do the simplest > thing that could possibly work".
> Good question Jury. Thanks for asking it.
> H
> On Tue, Mar 20, 2012 at 5:56 AM, James McKinney <ja...@opennorth.ca>wrote:
>> I think raw/bulk downloads should be available as often as possible. >> APIs, as a means of distributing data, should be reserved for when the >> underlying data cannot be distributed, or if the API is performing some >> costly operation to generate its response (which not all developers may >> have the technical know-how to implement).
To me Open Data is still data. so all the innovations that has happened in the past and is happening now could be applied to open data as well. I don't think there is any question about the added value of APIs, the question is more whether the government should undertake that task or not. even if I download a dataset from somewhere and decide to host it myself and build an application using that data, I would more likely try to create some kind of an API internally ( .. a data layer, etc) to make my codebase more manageable and easier to maintain. Maybe for a short quick hack or a visualization it is more of a hassle to work with APIs than just a simple csv file, but If you are envisioning a growing need or business or an enterprise level solution there are different challenges you have to deal with that APIs could come very handy. Also, data is not always a one-shot read-only entity, there is publishing, contributing, securing, maintaining, making sure of availability, integrity and so many other things. Regarding the "who should do it" question, I agree with most of the people here that government shouldn't be doing it unless in special cases. the other thing is that I see people bring up GeoRSS and feeds, lets just make sure we don't confuse things, feeds and APIs are two different things.
Cheers
Dan
________________________________ From: James McKinney <ja...@opennorth.ca> To: opendatabc@googlegroups.com Sent: Tuesday, March 20, 2012 11:41:48 AM Subject: Re: [OpenDataBC] "Publishing Open Data - Do you really need an API?"
+1 Herb, Kevin. Nik points out some good examples of cases in which bulk downloads aren't timely (real-time data), don't support the type of query a user wants, or are very large. However, most datasets are not like this. There will likely be more of these as open data grows, but I think they will always remain the exception.
On 2012-03-20, at 1:34 PM, Kevin McArthur wrote:
Yes, I have to agree with Herb here. I'm always going to want the data, no matter how large.
>My OpenMoonMap.org site is the perfect example... it has to rely on NASA's idea of what the web mapping should look like. It goes down all the time taking the site down with it. I can't use any of the off-the-shelf GIS tools with it because all we have is a basic WMS. I've been on a mission to get the source data for this, but so far no luck, and thus the site wont grow.
>That said, I do like an API in a few very specific and limited situations.
>Real-time data. GeoRSS feeds, live video feeds, etc. >Authority at point in time data. (Think a lien search, or some other
similar process that requires authoritative clearance as of very specific point in time)
>API's are also appropriate for creating incoming data streams -- for
crowdsourced data input. (eg fish catch reporting would be a cool api)
>In almost all other scenarios, API's are generally a massive pain to
work with, add a significant point of failure, and add little to no value in the open-data scenario. (We can transfer terabytes easily these days, so data size is really a non-starter)... Last year we even published a CSV->API converter public domain product, to stop folks chasing their tails on API creation. If some management process says you absolutely have to have an API for CSV type data, it should take about 5 minutes to setup, and no time/resources/money should be dedicated to this task; the underlying data should always be published.
>On 12-03-20 10:07 AM, Herb Lainchbury wrote: >APIs are almost always unnecessary and IMHO should be avoided wherever possible by governments for several reasons.
>>1) they are expensive to create compared providing bulk datasets
>>2) they are expensive to maintain
>>3) they won't be used - I usually won't use they extra functionality a publisher's API offers over plain old downloads unless I absolutely have no other choice. Why? Because I always want to minimize coupling between systems. The less I know about how your system works the better. Leave a bulk file at the end of a URL and I know everything I need to know about how to get the data. I will almost always write an ETL to get the data into a platform that I know I can control and rely on.
>>4) they are almost always unnecessary. There are few datasets so large that I wouldn't want to grab them whole (Kevin's openmoonmap.org comes to mind).
>>5) they are too easily a diversion from releasing data.... it's human nature to get "busy" building something new rather than just doing what's required. As Ward Cunningham, inventor of the Wiki says, "Do the simplest thing that could possibly work".
>>Good question Jury. Thanks for asking it.
>>H
>>On Tue, Mar 20, 2012 at 5:56 AM, James McKinney <ja...@opennorth.ca> wrote:
>>I think raw/bulk downloads should be available as often as possible. APIs, as a means of distributing data, should be reserved for when the underlying data cannot be distributed, or if the API is performing some costly operation to generate its response (which not all developers may have the technical know-how to implement).
I'll pile-on in this thread. I agree with the consensus so far. APIs are nice to have, but probably shouldn't be built by the gov't except in rare cases.
Although I would make the point that usually bulk downloads ARE APIs. Your browser does a HTTP GET of a usually fixed URL to fetch the resource. It's the simplest possible API. The gov't updates that file, then you GET it.
So if a API is required, a single bulk download API request meets that requirement IMO.
I can also talk from personal experience about building Recollect.net. I would not base my application on any API provided by the gov't for technical and business reasons. I wouldn't trust that the Gov't APIs could meet the performance, efficiency and reliability requirements I need. So I seek raw data, then write importers into my DB schema. From here, I expose my data model into a RESTful API that we build our app UI and connectors on.
I would use an API to import data into the app, or poll the API for new data, but I would always seek to maintain a local copy of the data with access designed for my specific use.
I want to bounce this back to Jury - how does this blog post and group consensus inform the projects that you're working on? Are you seeing similar problems with APIs?
Luke - thx for response and kudos to the group for a spirited diacussion. I'll point to Herb's "thx for asking" as there being value in having the discussion. The outcome - I like Nik's comment "it depends" :-)
Luke to your question, I believe that there's huge value in the open data community collaborating with and assisting municipalities (all govs) with designing their open data portals - after all, the data catalogues are primarily meant for your use and those of the research community. A significant decision is being in the ICT community around inhouse data centers versus the cloud or a hybrid and I thought the article had an element that could contribute to the decision making process.
I'm hearing from the group that Gov should stay out of the API biz. That's great if that's the optimum scenario for you - so, back to you, is that the optimum .... or maybe "it depends" :-)
Cheers Jury
Sent from my iPhone
On Mar 22, 2012, at 1:21 AM, Luke Closs <lukecl...@gmail.com> wrote:
> I'll pile-on in this thread. I agree with the consensus so far. APIs > are nice to have, but probably shouldn't be built by the gov't except > in rare cases.
> Although I would make the point that usually bulk downloads ARE APIs. > Your browser does a HTTP GET of a usually fixed URL to fetch the > resource. It's the simplest possible API. The gov't updates that > file, then you GET it.
> So if a API is required, a single bulk download API request meets that > requirement IMO.
> I can also talk from personal experience about building Recollect.net. > I would not base my application on any API provided by the gov't for > technical and business reasons. I wouldn't trust that the Gov't APIs > could meet the performance, efficiency and reliability requirements I > need. So I seek raw data, then write importers into my DB schema. > From here, I expose my data model into a RESTful API that we build our > app UI and connectors on.
> I would use an API to import data into the app, or poll the API for > new data, but I would always seek to maintain a local copy of the data > with access designed for my specific use.
> I want to bounce this back to Jury - how does this blog post and group > consensus inform the projects that you're working on? Are you seeing > similar problems with APIs?
On Wed, Mar 21, 2012 at 11:08 PM, Jury Konga <jko...@sympatico.ca> wrote: > Luke to your question, I believe that there's huge value in the open data community collaborating with and assisting municipalities (all govs) with designing their open data portals - after all, the data catalogues are primarily meant for your use and those of the research community. A significant decision is being in the ICT community around inhouse data centers versus the cloud or a hybrid and I thought the article had an element that could contribute to the decision making process.
> I'm hearing from the group that Gov should stay out of the API biz. That's great if that's the optimum scenario for you - so, back to you, is that the optimum .... or maybe "it depends" :-)
Gov'ts should provide the "simplest possible API" - the single bulk download file. No coding fancy things, just put the data at a URL and keep it up to date.
Only when it's extremely necessary - as noted elsewhere in this thread - should a more complex API be considered.
But "Bulk download" vs "API" is totally orthogonal to "In-house" vs "Cloud". A URL is a URL is a URL. I don't care if the bulk download comes from a municipal server or from amazon s3 cloud hosting. Focus on the simplest possible API - the bulk download file.
> On Wed, Mar 21, 2012 at 11:08 PM, Jury Konga <jko...@sympatico.ca> wrote: >> Luke to your question, I believe that there's huge value in the open data community collaborating with and assisting municipalities (all govs) with designing their open data portals - after all, the data catalogues are primarily meant for your use and those of the research community. A significant decision is being in the ICT community around inhouse data centers versus the cloud or a hybrid and I thought the article had an element that could contribute to the decision making process.
>> I'm hearing from the group that Gov should stay out of the API biz. That's great if that's the optimum scenario for you - so, back to you, is that the optimum .... or maybe "it depends" :-)
> Gov'ts should provide the "simplest possible API" - the single bulk > download file. No coding fancy things, just put the data at a URL and > keep it up to date.
> Only when it's extremely necessary - as noted elsewhere in this thread > - should a more complex API be considered.
> But "Bulk download" vs "API" is totally orthogonal to "In-house" vs > "Cloud". A URL is a URL is a URL. I don't care if the bulk download > comes from a municipal server or from amazon s3 cloud hosting. Focus > on the simplest possible API - the bulk download file.