Appropriators Should Consider Public Access to Leg Info at Friday Mark-up
Daniel Schuman <http://sunlightfoundation.com/people/dschuman/>May 17,
2012, 6:12 p.m.
http://sunlightfoundation.com/blog/2012/05/17/appropriators-should-co...
Public access to legislative information could get a boost this Friday
at a House
subcommittee hearing<http://appropriations.house.gov/UploadedFiles/HMTG-112-HMKP-AP24-2012...>.
The Legislative Branch Appropriations subcommittee will be marking upCongress'
budget for FY 2013<http://appropriations.house.gov/News/DocumentSingle.aspx?DocumentID=2...>,
which will present the opportunity to require that the data behind THOMAS
be made available to the public in a better format.
Why does this matter? Simply speaking, our democracy is founded upon an
informed public acting through its elected officials to make policy. THOMAS
makes this possible, but its limitations make it difficult.
Developers and programmers have worked to overcome THOMAS's limitations,
creating websites like OpenCongress <http://www.opencongress.org/>and
GovTrack.us <http://www.govtrack.us/> that together have nearly twice as
many visitors as THOMAS, mobile device apps like Sunlight's"Congress"
Android app <http://sunlightlabs.com/blog/2009/sunlights-first-android-app/>
that's
been downloaded 400,000 times, as well as integrating the data into news
coverage (like at the *New York
Times<http://developer.nytimes.com/docs/congress_api>
*) and special purpose sites
likeWashingtonWatch.com<http://www.washingtonwatch.com/>
.
Unfortunately, weaknesses in how THOMAS makes the data available limits
what can be accomplished by even the most talented developer. No one
expects THOMAS to do everything, but it suffers from basic problems. Its
web page addresses break after 15 minutes, it doesn't provide redlines of
bills, you can't get alerts when legislation is moving, and it does a poor
job of integrating relevant legislative data. There's a laundry list of
improvements here<http://www.opencongress.org/wiki/THOMAS_bulk_data_access#Ideas_for_Up...>.
In addition, there are other tasks that shouldn't be done by THOMAS, but
should exist... whether as simple as connecting relevant CRS reports to
legislation or as dynamic as adding an interactive social media layer.
These are examples of the benefits of opening up the data that drives
THOMAS. Beneath the 1990s web interface is an up-to-date database of bills,
bill status information, legislative summaries, and much more. Releasing
the data in a developer-friendly format (i.e. structured data made
available in bulk) would empower innovators to improve upon the services
THOMAS provides, and to go in entirely new directions, all at no cost to
the public.
When the THOMAS website went
live<http://www.loc.gov/today/pr/1995/95-002.html> on
January 5, 1995, it was the result of a bipartisan effort to grant
"citizens across the country and around the world ... access, via the
Internet, to congressional information." THOMAS significantly improved how
legislative information was made available
online<http://www.helsinki.fi/science/optek/1993/n4/bradley.txt> --
it provided additional materials in a centralized location, and did not
charge the public for access -- with a pledge that over time
"enhancement[s] will be made to THOMAS to upgrade its features."
While citizens around the world gained access to some congressional
information, enhancements to THOMAS's capabilities have been limited in
scope. Its limitations kindled a desire in users to be able to build their
own tools to make use of legislative data. These efforts have been severely
hampered because THOMAS doesn't give the public access to its underlying
database, instead releasing its information piecemeal through thousands of
webpages.
This challenge was partially overcome by technologists like Josh Tauburer,
who in 2004 launched GovTrack.us, which he describes in his great new
book *Open
Government Data <http://opengovdata.io/>* as "one of the first websites
world-wide to offer comprehensive parliamentary tracking for free and with
the intention to be used by everyday citizens." But there's a catch. The
unstructured way the THOMAS data was released required him to find some way
to gather and organize the data.
He turned to screen scraping, which involves "programmatically loading up
web pages, looking at their HTML source, and extracting information using
simple pattern matching." Jim Harper at Washington Watch, which tracks
bills and government spending, also uses screen scraping. They've run into
similar problems: screen scrapers don't catch all the data, they're a pain
to build, they easily break, and can suffer from a time lag. All of this
could easily be fixed by publicly releasing the structured database behind
THOMAS.
In fact, releasing the database -- often referred to as providing "bulk
access to data" -- is a longstanding open data principle that has been
called for by many people over the years.
In May 2007, a coalition of organizations and experts released the *Open
House Report<http://www.theopenhouseproject.com/the-open-house-project-report/3-le...>
*, which recommended (among other things) the creation of a "Legislation
Database."
"Congress should make available to the public a well-supported database of
all bill status and summary information currently accessible through the
Library of Congress. This database, as well as its supporting files, should
be in a structured, non-proprietary format such as XML. "
This recommendation was embraced by Representative Mike
Honda<http://www.theopenhouseproject.com/2008/02/01/congressman-honda-on-th...>,
then Chairman of the House Legislative Branch Appropriations Subcommittee.
In November of 2007, a committee staffer asked the Library of Congress
"to report
back on solutions to provide raw legislative data to the
public<https://groups.google.com/forum/?fromgroups#!topic/openhouseproject/D...>,
as well as the resources required to accomplish this." No such report has
been released by the Library to the public.
Around the same time, legislative language was
inserted<http://www.gpo.gov/fdsys/pkg/CPRT-111JPRT47494/pdf/CPRT-111JPRT47494-...>
into
an explanatory statement accompanying the Omnibus Appropriations Act of
2009 (P.L. 111-8) that declared "There is support for enhancing public
access to legislative documents, bill status, summary information, and
other legislative data through more direct methods such as bulk data
downloads and other means of no-charge digital access to legislative
databases."
This direct endorsement of bulk access to legislative data did not yield
measurable results from the Library of Congress, which is responsible for
the THOMAS database. Not did the myriad of meetings, phone calls, and
letters from congressional staff to the Library.
Over time, there has been a shift of responsibility for THOMAS to the Law
Library from other parts of the Library of Congress, as announced in their
January 5, 2010 holiday
newsletter<http://sunlightfoundation.com/blog/2010/01/06/tip-of-the-hat-to-thomas/>.
Although the newsletter raised hoped that the "analysis of the system's
functionality and content based on user feedback" would lead to
improvements in access to the underlying data, no movement on this issue
was forthcoming. Even so, the public and members of Congress have continued
to press forward on the issue.
For example, in May 2010, I had the opportunity to
testify<http://sunlightfoundation.com/blog/2011/05/11/sunlight-testimony-bulk...>
on
behalf of Sunlight before the House Legislative Appropriations
subcommittee. We called on Congress to:
Grant the public access to legislative documents, bill status and summary
information, and other legislative data no later than 120 days after the
start of FY 2012. We also ask for the immediate creation of an advisory
committee, composed of relevant legislative agency employees and members of
the public, that will meet regularly to address the public's need for
access to this information, and the means by which it is provided.
In September 2010, Rep. Foster introduced
legislation<http://sunlightfoundation.com/blog/2010/09/30/rep-foster-introduces-b...>
to
improve public access to THOMAS. The bill would have provided bulk access
to bill summary and other THOMAS data, created an advisory committee to
make recommendations on improving THOMAS, and urged the Library to work
towards providing bulk access to the full text of the legislation. The
session ended before there was an opportunity for action.
Even though the 112th Congress brought a change in leadership in the
House, bipartisan interest in making this information available to the
public continued. Indeed, over the years appropriators, overseers, and
leadership have pushed the ball forward. In June of 2011, the Committee on
House Administration held a
hearing<http://sunlightfoundation.com/blog/2011/06/20/moving-congress-online-...>
on making congressional documents available electronically as a
transparency and cost-savings measure. One of the panelists, Cornell's Tom
Bruce, advocated that the House focus on providing legislative data in bulk
and in a timely fashion.
In December, Reps. Cantor and Hoyer co-hosted a Congressional
Hackathon<http://sunlightfoundation.com/blog/2011/12/08/in-hackwetrust-the-hous...>,
which brought together nearly 300 developers and policy wonks to discuss
how to use technology to make the legislative branch more open. Out of that
meeting came three action
items<http://majorityleader.gov/uploadedfiles/hackathonreport.pdf>,
the first of which was "providing legislative data in a bulk format to
enable third-party developers to create more dynamic interfaces for
legislative information."
By the middle of the month, the Committee on House Administration set forth
standards<http://sunlightfoundation.com/blog/2011/12/16/house-to-be-more-open-o...>
for
the electronic posting of House and committee documents and data. In
January, the House launched a groundbreaking transparency
portal<http://sunlightfoundation.com/blog/2012/01/13/house-launches-transpar...>.
It provides a one stop website where the public can access all House bills,
amendments, resolutions for floor consideration, and conference reports in
XML, as well as information on floor proceedings and more. Information will
ultimately be published online in real time and archived for perpetuity. So
far, only documents considered by the full House are available online, but
it's expected that Committee documents will be available by the beginning
of 2013.
The House transparency portal is a tremendous breakthrough, but it does
have significant limitations. Because it came online in 2012, it doesn't
capture the historical information contained in the THOMAS database. As a
House resource, it doesn't have Senate records. And it doesn't contain bill
summaries, related bills, and other information prepared by the Library of
Congress and GPO that are made available through THOMAS. Therese
limitations can be overcome in time, and they clearly points the way to the
future, especially if the Library of Congress doesn't act.
On February 2, the House held a full day Legislative Data and Transparency
Conference<http://cha.house.gov/about/contact-us/legislative-data-conference>,
which brought together nearly all of the key players in making
congressional information available to the public. On behalf of Sunlight, I
delivered a talk on benchmarks for measuring success for legislative data
transparency<http://assets.sunlightfoundation.com.s3.amazonaws.com/policy/papers/B...>,
which clearly included a call for THOMAS data to be made available in bulk.
Surprisingly, the Library of Congress' representative, when directly asked
about THOMAS, indicated the issue wasn't even on the radar. Three days
later, the Sunlight Foundation submitted
comments<http://sunlightfoundation.com/blog/2012/02/09/put-thomas-on-the-fast-...>
to
the House Legislative Branch Appropriations Committee on the importance of
making legislative data available to the public, as did Josh Tauburer and
Open Congress.
By April, a coalition of 30 organizations wrote a letter to
legislators<http://sunlightfoundation.com/blog/2012/04/10/improve-public-access-t...>
asking
Congress to provide bulk access to THOMAS and create an advisory body. Part
of the letter reads as follows:
We estimate that for every person that goes directly to the THOMAS website,
at least two people visit a third-party website. But even these sites must
rely on legislative information generated and maintained by Congress, which
is only available through the difficult-to-use THOMAS website. There will
always be a need for a congressionally-mandated website, but Congress
should ensure that the innovative and transformative uses of legislative
information by third parties is grounded upon accurate and timely data. And
that means providing bulk access to everyone.
So here we are in May. The three best legislative opportunities to require
bulk access to THOMAS this legislative year, in increasing order of
difficulty, are in the Leg Branch Approps Subcommittee mark-up on Friday,
the full committee mark-up, and in the final vote on the House floor. (The
Senate also provides an opportunity, but the House traditionally has led on
these issues.)
It's time to fulfill the promise of citizen access to legislative
information. Congress should require bulk access to THOMAS legislative data
no later than 120 days of passage of the appropriations bill, and create an
advisory committee that regularly meetings to look at public access to
legislative information and is composed of people inside and outside of
government. It would make information that's already required to be
publicly available much more useful to everyone, and impose (at best) a
minimal cost.
THOMAS was created by Congress to make legislative information freely
available to the public, but the Library has not kept up with best
practices. Congress should break the logjam and keep the promise of making
free legislative information available to everyone in a way that encourages
the public to make the most of it.
Daniel
Daniel Schuman
Director | Advisory Committee on Transparency<http://transparencycaucus.org/>
Policy Counsel | The Sunlight Foundation <http://sunlightfoundation.com/>
o: 202-742-1520 x 273 | c: 202-713-5795 | @danielschuman