(
http://razor.occams.info/blog/2009/12/09/open-government-directive-evaluation-on-principles/)
Open Government Directive Evaluation on Principles
Last week I reviewed the House�s Statement of Disbursement electronic
document along the dimensions of principles of open government data. The
Open Government Directive (OGD) talks about how agencies should go about
the process of opening data. Here is a review of what the OGD says,
organized by open data principle.
I�ll put the conclusion up front: The OGD addresses nearly all of the
open government data principles that have been put forward, and even
adds two of its own: being pro-active about data release and creating
accountability by designating an official responsible for data quality
(more on these at the end). So from this perspective, the OGD is pretty
spot-on. It is very strong in public input, public review, and
interagency coordination, which are normally the weakest spots of
government data (but, on the other hand, this isn�t data, this is a
goal, so the proof will be in the pudding). It could have been stronger
in the areas of machine processability & promoting analysis, and
explaining what is appropriate for data licensing (ideally, none).
Here are the details:
Information is not meaningfully public if it is not available on the
Internet for free.
The OGD says �each agency shall take prompt steps to expand access to
information by making it available online in open formats.� The OGD
itself doesn�t say free, but executive branch policy already requires
that public information not be sold to the public at more than the
marginal cost of distribution � which is about as good as one might
expect. So we�ll count this principle as asserted by the OGD.
� GOOD
Data Should Be Primary. Primary data is data as collected at the source,
with the finest possible level of granularity, not in aggregate or
modified forms.
In the OGD�s appendix where it outlines further details for agencies, it
says agencies should release data �as granular as possible�.
� GOOD
Timely.
The OGD days, �Timely publication of information is an essential
component of transparency. Delays should not be viewed as an inevitable
and insurmountable consequence of high demand.�
� GOOD
Accessible. Data are available to the widest range of users for the
widest range of purposes, meaning use an open standard, with a bulk
download, and with documentation.
Machine processable: Data are reasonably structured to allow automated
processing.
The OGD specifically defines �open format� � which is the subject of the
directive � as something that is platform independent and machine
readable. Now, here the OGD slips a little because it redefines �open�
but actually leaves out open standards. I don�t think that was
intentional, so we�ll give the OGD credit for mentioning open standards
even though it didn�t exactly. It mentions �downloadable� but not in
bulk, and there is no mention of documentation in the OGD. We can�t tell
what the OGD meant by �machine readable� � I think of this term now a
sloppy form of �machine processable�. It would have helped if the OGD
specifically noted that the point is to support analysis and reuse of
the data.
I used to use �machine readable� until someone corrected me that really
any format can be read by a machine. The question is what the machine
can do with it: to what degree can the data be meaningfully processed by
a machine? So now I use machine-processable.
� WEAK
Non-discriminatory: Data are available to anyone, with no requirement of
registration.
Non-proprietary: Data are available in a format over which no entity has
exclusive control.
License-free. Dissemination of the data is not limited by intellectual
property law or other terms.
The OGD says data must be �made available to the public without
restrictions that would impede the re-use of that information.� Here we
could have really benefited from some simple but concrete guidance.
� WEAK
Promote analysis: Data published by the government should be in formats
and approaches that promote analysis and reuse of that data.
There is a sense in which this is implicit in the OGD, but maybe it is
the goggles through which I read it. The OGD fails to say explicitly
that analysis is the whole point of open government data.
� FAIL
Public input: The public is in the best position to determine what
information technologies will be best suited for the applications the
public intends to create for itself.
Public review: There should be a means for the public to interact with
the data publisher during and after the data has been made. The public
may have questions or may find errors. The process of creating the data
should also be transparent.
These principles are perhaps the least commonly addressed, and yet it is
one of the most prominent aspects of the OGD. The OGD requires agencies
to allow the public to give feedback on data quality, data
prioritization, and other aspects of the agency�s OGD plan. In fact, the
OGD says, �Each agency shall respond to public input received on its
Open Government Webpage on a regular basis.�
In addition, the OGD will form a working group (described next) that
will discuss �ideas to promote participation and collaboration,
including how to � take advantage of the expertise and insight of people
both inside and outside the Federal Government, and form high-impact
collaborations with researchers, the private sector, and civil society.�
In the appendix where it outlines further goals for agencies, the OGD
says, �Your agency should also identify key audiences for its
information and their needs, and endeavor to publish high-value
information for each of those audiences in the most accessible forms and
formats.�
� EXCELLENT
Interagency coordination: Interoperability makes data more valuable by
making it easier to derive new uses from combinations of data. To the
extent two data sets refer to the same kinds of things, the creators of
the data sets should strive to make them interoperable.
The OGD will establish a working group lead by the Deputy Director for
Management at OMB, the Federal Chief Information Officer, and the
Federal Chief Technology Officer to provide a forum to share best
practices for data collection, aggregation, validation, and
dissemination throughout the government, to coordinate implementations
of federal spending transparency, and to provide a forum for sharing
best practices for participation.
� EXCELLENT
Provenance and trust: Published content should be digitally signed or
include attestation of publication/creation date, authenticity, and
integrity.
Permanent Web Address: The file should have a stable location.
Safe file formats: Government bodies publishing data online should
always seek to publish using data formats that do not include executable
content.
Globally Unique Identifiers: This concept, important on the world wide
web, is that any document, resource, data record, or entity mentioned in
a database, or some might say every paragraph in a document, should have
a unique identification that others can use to point to or cite it
elsewhere.
Linked Open Data: This is a method for publishing databases in a
standard format for interconnectivity with other databases without the
expense of wide agreement on unified inter-agency or global data standards.
These get into some of the more precise details of data format. I might
have liked to see provenance & trust addressed, but I am not sure
whether I would really expect these principles to be included in a high
level 120-day plan, at least not at this point. So their absence is not
something I hold against the OGD. Still:
� FAIL
Other Notes
The OGD talks about being proactive with data release.
The OGD also adds accountability: �Each agency � shall designate a
high-level senior official to be accountable for the quality and
objectivity of, and internal controls over, the Federal spending
information publicly disseminated.�
- Josh Tauberer
- CivicImpulse / GovTrack.us
http://razor.occams.info |
www.govtrack.us |
civicimpulse.com
"Members of both sides are reminded not to use guests of the
House as props."