big problems: Appropriators May Undercut Legislative Transparency at hearing tomorrow

2 views
Skip to first unread message

Daniel Schuman

unread,
May 30, 2012, 1:36:14 PM5/30/12
to openhous...@googlegroups.com

Appropriators May Undercut Legislative Transparency

Daniel Schuman
May 30, 2012, 1:26 p.m.

House Appropriators may deal a tremendous blow to prospects for improving public access to legislative information. In a draft report expected to accompany the Legislative Branch Appropriations Bill for 2013, scheduled for a full committee vote tomorrow, appropriators misunderstand how data can be "authenticated," and kick responsibility for improving public access to legislative data to a non-public task force with no set reporting date. Unless corrected, this draft report represents a tremendous step backward for transparency, and fails to seriously grapple with the history of efforts to free legislative information for widespread public use.

Legislative Information Should Be Widely Disseminated

The purpose of THOMAS is to bring legislative information to as many people as possible; preservation and authentication is best handled through other long-established methods that THOMAS was never intended to address. The lack of authenticity to THOMAS data does not present a problem for most users. Rather, the largest problem with THOMAS is that the data is not provided so that it can be easily copied, placing a significant burden on citizens who wish to make sophisticated use of the information. The THOMAS website directly provides nearly a million users each month with an "inauthentic" version of information about legislative activities, a practice that will continue unabated under the draft committee report. While THOMAS often links to a GPO document that is "authenticated," its display of bill text, legislative summaries, cosponsor data, and other information is not certified as being correct, and often changes because of the Library's errors in how it publishes the data.

To the extent to which THOMAS information should be authentic, the report does not engage with best practices around authenticity of data on the Internet. Verifying the authenticity of data can be performed securely and reliably with the use of metadata external to the data itself. In fact, this is precisely how GPO's FDSys currently authenticates XML documents of the US government, including its legislation, regulations, and laws. GPO accompanies each document it publishes with a "PREMIS" metadata file that includes information needed to cryptographically verify the authenticity of documents. For example, here's the PREMIS file accompanying HR 6289. Worries about authenticity are a red herring.

Bulk Access is a Separate Question from Authenticity 

Bulk access to THOMAS data is a simpler and less controversial step than this draft report contemplates. The underlying information is already publicly available on the THOMAS website, and third parties already are scraping the data from the site to make it available in bulk. It simply makes sense for the Library to meet the needs of the public directly through providing the data in bulk itself. This merely opens up another avenue to access info that's already being released. It would also eliminate any errors created through the scraping process.

The Draft Report Creates a Secret, Never-Ending Process

The draft report would require the establishment of a task force to examine and report back on a number of issues raised in the report regarding bulk access to legislative data. This is seriously flawed in several major ways.

First, bulk access is about granting the public better access to legislative information. It stands to reason that the public should be included in all discussions. However, the proposed task force does not include any non-governmental participants. A number of individuals and organizations are expert in these matters, and should be full participants.

Second, the draft report imposes no deadline for a report from the task force. The last time Appropriators required a task force on a similar matter, four year ago, it never reached any conclusions or reported back. Without a deadline, the same will happen here.

Third, the task force's report should be provided to the public as well as to the committee at the time it is completed. Draft reports should also be made available for public comment.

Fourth, the report language is terribly overbroad: it prohibits the establishment of bulk data downloads of legislative information prior to the reporting back of the task force. Making use of modern technology to provide information in better ways should be something that is encouraged, not prohibited. Information is already being provided to the public in bulk regarding certain legislative activity. Would this report language stop the GPO from providing bulk access to theCongressional Record, as it does now? Would it prohibit the House of Representatives from providing bulk access through its innovative docs.house.gov portal? If so, that would be a disaster for transparency.

Finally, the idea of a task force to assess these questions ignores that these issues were already addressed by the Library of Congress in a 2008 memo.The memo explained that the XML database containing bill metadata was expected to be able to be released in bulk by May 2008. It also stated that "CRS... will continue to identify and analyze ... the following policy matters for the Committee's consideration," including "data accuracy" and "data permanence and authentication." Where are the results of CRS's analysis? What is the strategic plan for THOMAS referenced in the memo? Where is the study promised that would engage in "an examination of permanence and authentication of legislative data, along with any attendant issues, risks and workload?"

Simply put, the draft committee report's establishment of a task force is another recipe for delay. We saw this four years ago, the last time the Library was pressed to make improvements on this issue. The time is long past for action, and the Appropriations Committee will be judged on whether it makes another plan to make a plan, or whether it establishes real deadlines for progress. THOMAS itself was created in a matter of months when the Speaker of the House decided it was a priority. Bulk access to legislative data will also come about when legislators decide that being transparent is more important than establishing a task force to talk about it.


Daniel

Daniel Schuman
Director | Advisory Committee on Transparency
Policy Counsel | The Sunlight Foundation
o: 202-742-1520 x 273 | c: 202-713-5795 | @danielschuman

Josh Tauberer

unread,
May 30, 2012, 1:56:13 PM5/30/12
to openhous...@googlegroups.com, Daniel Schuman
This is unfortunate, and of course technically incorrect.

This is patently false:

> There currently is no comparable technology for the application and
> verification of digital signatures on XML documents.

And this is just incoherent:

> Are there other data models or alternative that can enhance
> congressional openness and transparency without relying on bulk data
> downloads in XML?

(= "Are there other data models ... that can enhance congressional
openness and transparency without ... data")

Congress is using XML without digital signatures widely already, for
bill text, for House votes, and for Senate votes. THOMAS isn't digitally
signed (it's not on https:), as Daniel noted. All of the sudden digital
signatures became so important?

I'm sympathetic when excuses are about cost and priorities. But this is
just techno-nonsense. Just being a citizen for a moment, I'm horrified
that a committee would publish such an insulting three paragraphs.

- Josh Tauberer (@JoshData)

http://razor.occams.info

On 05/30/2012 01:36 PM, Daniel Schuman wrote:
>
> Appropriators May Undercut Legislative Transparency
>
> Daniel Schuman <http://sunlightfoundation.com/people/dschuman/>
>
> May 30, 2012, 1:26 p.m.
> http://sunlightfoundation.com/blog/2012/05/30/appropriators-may-undercut-legislative-transparency/
>
> House Appropriators may deal a tremendous blow to prospects for
> improving public access to legislative information. In a draft report
>
> <http://appropriations.house.gov/UploadedFiles/LEGBRANCH-FY13-FULLCOMMITTEEREPORT.pdf>
> expected to accompany the Legislative Branch Appropriations Bill for
> 2013, scheduled for a full committee vote tomorrow
> <http://sunlightfoundation.com/blog/2012/05/25/full-committee-markup-on-leg-approps-set-for-thursday/>,
> appropriators misunderstand how data can be "authenticated," and
> kick responsibility for improving public access to legislative data
> to a non-public task force with no set reporting date. Unless
> corrected, this draft report represents a tremendous step backward
> for transparency, and fails to seriously grapple with the history of
> efforts to free legislative information for widespread public use.
>
> *Legislative Information Should Be Widely Disseminated*
> <http://www.gpo.gov/fdsys/search/pagedetails.action?browsePath=111%2Fhr%2F%5B6200%3B6299%5D&granuleId=&packageId=BILLS-111hr6289ih&fromBrowse=true>.
> Worries about authenticity are a red herring.
>
> *Bulk Access is a Separate Question from Authenticity *
>
> Bulk access to THOMAS data is a simpler and less controversial step
> than this draft report contemplates. The underlying information is
> already publicly available on the THOMAS website, and third parties
> already are scraping the data from the site to make it available in
> bulk. It simply makes sense for the Library to meet the needs of the
> public directly through providing the data in bulk itself. This
> merely opens up another avenue to access info that's already being
> released. It would also eliminate any errors created through the
> scraping process.
>
> *The Draft Report Creates a Secret, Never-Ending Process*
>
> The draft report would require the establishment of a task force to
> examine and report back on a number of issues raised in the report
> regarding bulk access to legislative data. This is seriously flawed
> in several major ways.
>
> First, bulk access is about granting the public better access to
> legislative information. It stands to reason that the public should
> be included in all discussions. However, the proposed task force does
> not include any non-governmental participants. A number of
> individuals and organizations are expert in these matters, and should
> be full participants.
>
> Second, the draft report imposes no deadline for a report from the
> task force. The last time Appropriators required a task force on a
> similar matter
> <http://sunlightfoundation.com/blog/2012/02/09/put-thomas-on-the-fast-track/>,
> four year ago, it never reached any conclusions or reported back.
> Without a deadline, the same will happen here.
>
> Third, the task force's report should be provided to the public as
> well as to the committee at the time it is completed. Draft reports
> should also be made available for public comment.
>
> Fourth, the report language is terribly overbroad: it prohibits the
> establishment of bulk data downloads of legislative information prior
> to the reporting back of the task force. Making use of modern
> technology to provide information in better ways should be something
> that is encouraged, not prohibited. Information is already being
> provided to the public in bulk regarding certain legislative
> activity. Would this report language stop the GPO from providing bulk
> access to theCongressional Record
> <http://www.gpo.gov/fdsys/browse/collection.action?collectionCode=CREC>,
> as it does now? Would it prohibit the House of Representatives from
> providing bulk access through its innovative docs.house.gov
> <http://docs.house.gov/> portal? If so, that would be a disaster for
> transparency.
>
> Finally, the idea of a task force to assess these questions ignores
> that these issues were already addressed by the Library of Congress
> in a 2008 memo
> <http://sunlightfoundation.com/blog/2012/05/18/two-steps-forward-on-improving-public-access-to-legislative-information/>.The
> memo explained that the XML database containing bill metadata was
> expected to be able to be released in bulk by May 2008. It also
> stated that "CRS... will continue to identify and analyze ... the
> following policy matters for the Committee's consideration,"
> including "data accuracy" and "data permanence and authentication."
> Where are the results of CRS's analysis? What is the strategic plan
> for THOMAS referenced in the memo? Where is the study promised that
> would engage in "an examination of permanence and authentication of
> legislative data, along with any attendant issues, risks and
> workload?"
>
> Simply put, the draft committee report's establishment of a task
> force is another recipe for delay. We saw this four years ago, the
> last time the Library was pressed to make improvements on this issue.
> The time is long past for action, and the Appropriations Committee
> will be judged on whether it makes another plan to make a plan, or
> whether it establishes real deadlines for progress. THOMAS itself was
> created in a matter of months when the Speaker of the House decided
> it was a priority. Bulk access to legislative data will also come
> about when legislators decide that being transparent is more
> important than establishing a task force to talk about it.
>
>
> Daniel
>
> Daniel Schuman Director | Advisory Committee on Transparency
> <http://transparencycaucus.org/> Policy Counsel | The Sunlight
> Foundation <http://sunlightfoundation.com/> o: 202-742-1520 x 273 |
> c: 202-713-5795 | @danielschuman
>
> -- You received this message because you are subscribed to the Google
> Groups "Open House Project" group. To post to this group, send email
> to openhous...@googlegroups.com. To unsubscribe from this group,
> send email to openhouseproje...@googlegroups.com. For more
> options, visit this group at
> http://groups.google.com/group/openhouseproject?hl=en.

Gregory Slater

unread,
May 30, 2012, 5:43:09 PM5/30/12
to openhous...@googlegroups.com, Daniel Schuman

Could you give a link to govtrack on this? - greg slater

Daniel Schuman

unread,
May 30, 2012, 6:16:19 PM5/30/12
to Gregory Slater, openhous...@googlegroups.com
It seems like a lot of folks are calling appropriators about this. Here are some talking points. Remember, the hearing is tomorrow (Thursday) at 11am.

THOMAS Talking Points

Daniel Schuman
May 30, 2012, 6:08 p.m.

This Thursday at 11am, the House Appropriations Committee will mark-up the legislative branch appropriations bill and an accompanying committee report. The report has unfortunate language that would undermine how legislative information is made available to the public based on misunderstandings of technology and policy.

Many people are calling the members of the committee to complain, with a particular focus on Rep. Andew Crenshaw, who chairs the subcommittee, and Reps. Hal Rogers and Norman Dicks, the Chairman and Ranking Member for the full committee. (Rep. Mike Honda , who is the ranking member of the subcommittee, has been supportive of bulk access to legislative data for years, and should be congratulated for his efforts.) Here are some helpful talking points.

  • The current draft of the legislative branch committee report needs to be changed. It imposes harmful new barriers on public access to legislation information that will undermine transparency. It does so by stopping bulk access to some legislative information, such as the Congressional Record, and undermining important efforts like the House's transparency portal docs.house.gov.

  • The report will also indefinitely delay any efforts to open up new legislative information to bulk access. It does so by creating a "task force" to study the issue, much like the one created four years ago. There's no date by which the task force must report and no member of the public has been invited to serve. This is not progress, it is death by bureaucracy.

  • What the committee should do is require bulk access to THOMAS data within 120 days of the appropriation bill's passage. It should also create an advisory committee to guide the evolution of THOMAS.

  • The concerns raised in the committee report about the authenticity of data from THOMAS are a red herring. Legislative information is already publicly available; we are only asking for it to be more accessible -- in bulk. This is an uncontroversial, inexpensive, and common practice across the government.

  • The THOMAS website was created in a matter of months when the Speaker of the House decided it was a priority. The House's leadership thinks it's a priority, members of the public think it's a priority, and so do many members of Congress. It's time for the Appropriations Committee to make bulk access to THOMAS a priority as well.


Daniel

Daniel Schuman
Director | Advisory Committee on Transparency
Policy Counsel | The Sunlight Foundation
o: 202-742-1520 x 273 | c: 202-713-5795 | @danielschuman


Eric Mill

unread,
May 30, 2012, 10:45:48 PM5/30/12
to openhous...@googlegroups.com, Daniel Schuman
Also, GPO has been publishing an accompanying PREMIS file for every document of the US government, for years now, as part of FDSys. This XML contains, among other things, a SHA-1 cryptographic hash of the original document, so you can, you know, verify authenticity. Just pick a bill and you'll always see the link to an "Authenticity Metadata" file.

GPO is already doing this for the laws of the United States, they're part of the legislative branch, and PREMIS itself was invented by a committee chaired by the Library of Congress!

ARRRGGHHHH.

to openhouseproject@googlegroups.com. To unsubscribe from this group,
send email to openhouseproject+unsubscribe@googlegroups.com. For more
--
You received this message because you are subscribed to the Google Groups "Open House Project" group.
To post to this group, send email to openhouseproject@googlegroups.com.
To unsubscribe from this group, send email to openhouseproject+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/openhouseproject?hl=en.




--

Reply all
Reply to author
Forward
0 new messages