Thomas reform: Support from the Committee on House Administration

Rob Pierson

unread,

Nov 14, 2007, 6:44:49 PM11/14/07

to openhous...@googlegroups.com

Speaking of progress...

Today I met with staff from the Committee on House Administration (CHA) and the Library of Congress (LoC) about Thomas, the legislative information system.

The LoC, which provides the Thomas legislative search system, spoke about their plans to improve the search capabilities of Thomas and its congressional correlate, the Legislative Information System. They also spoke about their plans to make Thomas more like LIS.

CHA expressed its support for exploring ways that Thomas could more effectively provide legislative information to the public, and requested that the Library report back to them with solutions to provide raw legislative data to the public, as well as the resources required to accomplish this.

CHA has been very supportive in this effort, and their assistance has been invaluable. I'll continue to update the group as this effort moves forward.

Josh Tauberer

unread,

Nov 15, 2007, 9:52:36 PM11/15/07

to openhous...@googlegroups.com

Rob Pierson wrote:
> Speaking of progress...
>
> Today I met with staff from the Committee on House Administration (CHA)
> and the Library of Congress (LoC) about Thomas, the legislative
> information system.
>
> The LoC, which provides the Thomas legislative search system, spoke
> about their plans to improve the search capabilities of Thomas and its
> congressional correlate, the Legislative Information System. They also
> spoke about their plans to make Thomas more like LIS.

Hey, Rob, that's great news. Thanks for the update and your efforts, and
also thanks to your boss, Congressman Honda, for taking an interest in
this. Some comments in particular:

> CHA expressed its support for exploring ways that Thomas could more
> effectively provide legislative information to the public, and requested
> that the Library report back to them with solutions to provide raw
> legislative data to the public, as well as the resources required to
> accomplish this.

Wonderful! I'm glad CHA sees that there could be new ways technology can
be put to good use.

As for resources, just my two cents- From the technical end only,
providing raw legislative data to the public is probably the least
resource-intensive thing THOMAS can add to its capabilities and yet also
can have one of the biggest impacts.

Providing 'structured data', i.e. an API or XML, for information on
legislation should be a very low-resource new capability. Web surfers
will never access this information directly --- it would instead be
fetched periodically by a handful of independent websites. So compared
to THOMAS serving what I imagine is at least 100,000 people a day,
providing some additional structured data information to, say, 50
websites a day is basically nothing. But the benefit to the public would
be enormous because these websites can start to provide all sorts of new
additional capabilities to the public (for others, see also: the OHP
report and various of my blog posts on the OHP website on this topic).

Actually, depending on how you look at it, providing this may actually
*add resources back to THOMAS*. For instance, when GovTrack fetches
information from THOMAS, it uses THOMAS resources, but in turn this lets
GovTrack serve additional citizens (with no further impact to THOMAS). I
did the math a few months ago and figured that, very very roughly, for
every page GovTrack fetches from THOMAS, GovTrack (and additional
'downstream' sites) serves that information to 30 people. That basically
means GovTrack is sort of multiplying THOMAS's resources (but currently
only by a very small amount, of course). The net impact of providing
'structured data' may very well be that it frees up THOMAS resources, in
the long run.... Again, depending on how you look at it. :)

> CHA has been very supportive in this effort, and their assistance has
> been invaluable. I'll continue to update the group as this effort moves
> forward.

Great, thanks to CHA too! As I mentioned to you off-list, the places the
House has already used the leading technology, like bill and roll call
vote XML, have been right-on. Just gotta spread the love (of XML) to
other areas of Congress.

--
- Josh Tauberer

http://razor.occams.info

"Yields falsehood when preceded by its quotation! Yields
falsehood when preceded by its quotation!" Achilles to
Tortoise (in "Gödel, Escher, Bach" by Douglas Hofstadter)

Rob Pierson

unread,

Nov 16, 2007, 1:10:12 PM11/16/07

to openhous...@googlegroups.com

Josh,

You're right - I'm also quite pleased with the forward-thinking efforts in place by the House for votes and bills. (See xml.house.gov for more info) Hopefully those efforts will be replicated by the LoC and Senate. If there are any Senate staff interested in working on this with me please email me off list.

I wanted to clarify one point that may not have been clear in my earlier email. CHA has not provided an official opinion on Thomas providing raw legislative data. Their interest in researching the merits and feasibility of offering this information, however, is appreciated, and I think I speak for a lot of us when I express our appreciation to them and others in the House for their work in making information available in a structured format.

Josh, perhaps you or someone else on the list could speak to the broad technical details of how APIs or XML demand resources from a webserver. For instance, I used the script creator at http://www.opencongress.org/tools/bill_status?bill_id=110-h4173 to create a bill update widget which I was playing with on our site. (I'm not sure if I'll keep it up, but it's pretty cool!)

http://honda.house.gov/legislation/110-h1492.shtml

Does a query go to an XML legislative database (govtrack's, I believe) and to the script generator (Open Congress' site) every time someone loads the page? If this and other sorts of widgets were used on more websites and the XML came from the official Thomas web server, would there be concerns about Thomas' webserver load?

In your email you suggested a scenario in which Thomas' XML database would be replicated on other servers - is that how these sorts of widget setups are generally implemented?

Josh Tauberer

unread,

Nov 16, 2007, 3:48:22 PM11/16/07

to openhous...@googlegroups.com

Rob Pierson wrote:
> Josh, perhaps you or someone else on the list could speak to the broad
> technical details of how APIs or XML demand resources from a webserver.
> For instance, I used the script creator at
> http://www.opencongress.org/tools/bill_status?bill_id=110-h4173 to
> create a bill update widget which I was playing with on our site. (I'm
> not sure if I'll keep it up, but it's pretty cool!)

Sure, that's a good example of how an API or public XML database can be
put to use. For background, what this lets people do is embed a little
widget on their webpage that shows the status of a bill. As a bill moves
through Congress, the widget automatically updates.

> Does a query go to an XML legislative database (govtrack's, I believe)
> and to the script generator (Open Congress' site) every time someone
> loads the page? If this and other sorts of widgets were used on more
> websites and the XML came from the official Thomas web server, would
> there be concerns about Thomas' webserver load?

In this case, no, the widget does not impact THOMAS's webserver load. In
general it would depend on the provider of the widget, but there are
steps THOMAS could take to ensure that any API or XML database would not
be used in a way that more than marginally impacts their load, if that
is a concern.

The OpenCongress widget works like this, from start to finish:

Twice each day, GovTrack updates its unauthoratative XML database of
legislation by downloading pages from THOMAS. (Since THOMAS does not
provide structured data access, GovTrack has to reconstruct the
information from the pages on THOMAS in a pretty inelegant way, and some
times it doesn't work. This is currently the only point where THOMAS's
resources are impacted, and it is the point where an API or XML database
would be made use of.)

Following the updates, various websites including OpenCongress duplicate
the database onto their own web servers by copying the files from
GovTrack. (If THOMAS had an API, these websites would go to THOMAS
directly for their daily updates.)

When a person visits a web page that has included this widget, that
person's web browser contacts the OpenCongress web server. OpenCongress
then queries its own database, without contacting THOMAS or GovTrack,
and sends back to the visitor the content of the widget.

> In your email you suggested a scenario in which Thomas' XML database
> would be replicated on other servers - is that how these sorts of widget
> setups are generally implemented?

Mostly it depends on what the source of the data wants. For instance, if
the source of the data says it's okay to build widgets that interact
directly with their servers --- for instance, Google Maps mash-ups
accessing Google directly for their maps --- then that's what people do.
On the other hand, for GovTrack data, I tell people to replicate the
data on their servers, and so that's what they do.

Hope that helps.

--
- Josh Tauberer

http://razor.occams.info

"Yields falsehood when preceded by its quotation! Yields
falsehood when preceded by its quotation!" Achilles to
Tortoise (in "Gödel, Escher, Bach" by Douglas Hofstadter)

>

> On Nov 15, 2007 9:52 PM, Josh Tauberer <taub...@govtrack.us

> http://razor.occams.info <http://razor.occams.info>

Reply all

Reply to author

Forward