http://www.recovery.gov/?q=node/317
--
Joseph Lorenzo Hall
ACCURATE Postdoctoral Research Associate
UC Berkeley School of Information
Princeton Center for Information Technology Policy
http://josephhall.org/
I was always pretty good at debate :-)
(I do actually have more specific technical ideas
for the data consumption / response side but there
is a severe impedance mismatch in the interfaces I've
found so far to the open government folks who
could conceivably help and, I'm a person of extremely
modest means so I have some difficulty forcing my
foot in the door with demos and such.)
More seriously: thanks.
-t
However, let's be clear that this isn't black box transparency. OMB was
attempting to ensure that no agency can get away with non-disclosure by
saying that can't do a feed. OMB said, well, if in fact you can't do a
feed then provide the information in a structured format so that we can
all scrape the information from the websites.
BTW, we also need to better understand whether this guidance is in
addition to the first guidance or replaces the first one. It seems the
first one was clear that there were to be feeds. This guidance seems to
supplement that, not supplant it.
And thanks for updating your blog about OMB Watch's positions.
On another content matter, I am concerned that the first guidance called
only for summaries of contracts. I was hoping that this guidance would
broaden that to provide the complete contract (redacted where necessary)
along with the RFP. But there is no mention of that in this version.
This is an example of the type of content I'm worried we will not be
getting.
1742 Connecticut Ave., N.W.
Washington, D.C. 20009
TEL: 202-234-8494; FAX: 202-234-8584
On Wed, 2009-04-08 at 23:38 -0400, Gary Bass wrote:
> We'll be sure to mention this issue in our comments to OMB. Totally
> agree about the Excel spreadsheets vs feeds.
>
> However, let's be clear that this isn't black box transparency. OMB was
> attempting to ensure that no agency can get away with non-disclosure by
> saying that can't do a feed. OMB said, well, if in fact you can't do a
> feed then provide the information in a structured format so that we can
> all scrape the information from the websites.
That is orthogonal to the black-box issue.
The format of the data is not the issue. I don't
know how to say it more clearly. And I don't mean
to "own" the black-box issue but I think it is perfectly
easy to understand what the people who raised it are saying.
You can say that OMB's regulations here aren't making
the black box problem more than incrementally worse -
that's ultimately uncontroversial. I don't think you
can say that this isn't black box transparency, though.
-t
"Recipient reporting required by Section 1512 of the Recovery Act will be collected centrally." - page 2, Section 1.5
"OMB intends to oversee the development a central collection system for the information required to be reported by Section 1512 of the Act." - page 24, Section 2.14
Nebors was careful to use the same "collection" phrasing in his testimony.
For those programs where the State is the primary recipient of Recovery Act funds, Federal
agencies should provide States the flexibility to determine the optimal approach for collecting
and transmitting to the Federal government data required by Section 1512 of the Recovery Act.
For example, a State may prefer to create a central point of contact responsible for transmitting
all Section 1512 data to the Federal government’s central collection solution (or to the individual
Federal agency, if appropriate). Alternatively, a State may prefer to have individual State
agencies or recipients separately report to the Federal government rather than relying on a single
point of contact to consolidate the information centrally for transmission. In all cases, however,
Federal agencies should expect the State to assign a responsible office to oversee Section 1512
data collection to ensure quality, completeness, and timeliness of data submissions. This State
office should play a critical role in assisting Federal agency efforts to obtain quality, complete,
and timely data submissions. (emphasis added)
> *From:* Greg Elin <ge...@sunlightfoundation.com
> <mailto:ge...@sunlightfoundation.com>>
> My interpretation of the Updated Guidance parallels Gary's. The
> alternative to feeds is there only for those who cannot provide the
> feeds.
for me, the important issues are that (a) the only reliable
communications channel right now is email to OMB, (b) the feed
guidelines have not been changed or clarified at all, and they still are
not eve required to be discoverable, let alone carry data in some
well-defined format, and (c) the updated guidelines now say they are
working on a web-based submission form, whoich would basically improve
the email submission process and further dilute the idea of feeds.
> On the centralization issue, OMB's language is "collected centrally".
yes, but only they can do the collecting because nobody else knows what
is out there. i manually searched all known sites
(http://isd.ischool.berkeley.edu/stimulus/feeds/agencies.html) for feeds
and did not find too many
(http://isd.ischool.berkeley.edu/stimulus/feeds/feeds.html). without
radically improved guidelines centering on feeds, the feeds are really
of no practical use. we tried, but we haven given up. my personal
favorite: NASA's unifeeds (one feed per report):
http://www.nasa.gov/recovery/reports/weekly/index.html
> IMHO, this language choice is significant implying the use of
> aggregation tools--picking up or "collecting" the information--instead
> of obligated entities "reporting" the information. The data just has
> to end up in one place, but there could be a variety of routes in
> which the data gets there, including data feeds and published web pages.
sure, and in theory that could still happen. but from an architectural
point of view, the worst things that could happen is to establish
redundant channels, ending up with having to reconcile reports available
through more than one channel. the initial and updated guidance both
establish redundant channels, email and feeds. since the feed guidance
was very weak in the initial guidance and was not changed at all, while
the email guidance was used to do the actual collection and now talks
about being augmented with a web form, i think it is reasonable to
assume that feeds may still be allowed, but are just a side-effect.
in my opinion, the only viable way to make feeds work and to expose them
as a reliable and robust source of information would be to require
recovery.gov to also so their collection via feeds. this would be a very
strong incentive to make feeds work.
> I read that again as trying to avoid a single black-box solution, but
> pushing people toward web-based solutions as much as possible.
we were very excited when we saw the initial guidance and saw feeds
being mentioned. however, now that we have two documents and can compare
how recovery.gov worked over the past few weeks, how the guidance was
updated, and what kind of data is available via recovery.gov and agency
feeds, i don't think that the feeds will go anywhere. i'd love to be
proven wrong, but given the current trajectory, we'll end up with opaque
data collection and recovery.gov being the only entity having access to
the reporting channels in the back-end.
i still think that even though feeds are very likely out as the way of
how reporting is done, they should be used by recovery.gov for
publishing data. this is what we currently provide by republishing
scraped data, we have feeds for individual agencies
(http://isd.ischool.berkeley.edu/stimulus/feeds/usda/weekly.atom), and a
feed for all weekly reports scraped from the recovery.gov site
(http://isd.ischool.berkeley.edu/stimulus/feeds/weekly-site.atom). we
generate these by parsing excel and republishing the data as
feed-packaged XHTML/XML. (my apologies for the latter being so big; we
still have to implement feed paging...)
cheers,
erik wilde tel:+1-510-6432253 - fax:+1-510-6425814
dr...@berkeley.edu - http://dret.net/netdret
UC Berkeley - School of Information (ISchool)
> To join the CAR Google Group, visit http://www.coalitionforanaccountablerecovery.org/.
http://www.coalitionforanaccountablerecovery.org/sites/default/files/OMB_Watch_CAR_Recovery_Data_Architecture-Final.pdf
(dated march 5) is still the most recent version, it seems.
i am concerned that the architecture presented on page 11 is actually
less transparent (at least in the way it's shown) than the architecture
proposed in the recovery act architecture document, because it portrays
recovery.gov as the only way to get to any data. in my personal
terminology, this image is all about "openness", and not at all about
"transparency". the vision we had (and the one outlined in the recovery
act architecture document) has transparency as the primary goal, making
the data sources available directly to anybody interested in them. which
means that any report produced at any level should be directly
accessible to anybody interested. now, this could still mean that
recovery.gov could act as a hosting platform for these reports, but that
would be a pure implementation detail. logically speaking, data entered
by any recipient would be exposed directly to anybody interested in it.
i know we had some discussion around this in the break-out session in
the meeting in washington, and it was agreed that the way the
information flow was depicted in that figure was a bit unfortunate. i
think it would be worthwhile the effort to clearly split a logical view
of the information flow and the implementation view of the information
flow, so that it becomes clear that recovery.gov should not be the
centralized system that it looks like in this system.
it was my understanding that in theory, anybody should be able to
completely replicate the functionality of recovery.gov by directly
tapping into the information sources providing reports. in such a case,
recovery.gov would just play a role such as amazon's S3, just providing
hosting for data that is produced and then made available in a robust
and secure way. they would only host feeds that would be populated by
agencies and contractors publishing reports.
i also would be careful when asking for APIs (section 3.2). this is
getting technical again, but since you want services, you should also be
clear about what kind of service, and i think we really don't want APIs
(http://dret.typepad.com/dretblog/2009/02/apis-considered-harmful.html).
asking for APIs is like asking for SOAP services or SPARQL endpoints,
and i think we should not ask for either of these. i think we should
specifically ask for feeds and meaningful services built around these
feeds, because feeds are the only web service that almost any person can
directly use (by using a feed reader).
i'll try to write up some blog post about the "transparency vs.
openness" issue, but i think this is what makes the current CAR draft a
bit imprecise: it does not clearly separate between how reporting should
be done and what guidelines should exist for that; and what kind of
services oversight groups and normals citizens might want to use based
on that reporting architecture, and how those services should be provided.
cheers,
dret.