Hey, Rob, that's great news. Thanks for the update and your efforts, and
also thanks to your boss, Congressman Honda, for taking an interest in
this. Some comments in particular:
> CHA expressed its support for exploring ways that Thomas could more
> effectively provide legislative information to the public, and requested
> that the Library report back to them with solutions to provide raw
> legislative data to the public, as well as the resources required to
> accomplish this.
Wonderful! I'm glad CHA sees that there could be new ways technology can
be put to good use.
As for resources, just my two cents- From the technical end only,
providing raw legislative data to the public is probably the least
resource-intensive thing THOMAS can add to its capabilities and yet also
can have one of the biggest impacts.
Providing 'structured data', i.e. an API or XML, for information on
legislation should be a very low-resource new capability. Web surfers
will never access this information directly --- it would instead be
fetched periodically by a handful of independent websites. So compared
to THOMAS serving what I imagine is at least 100,000 people a day,
providing some additional structured data information to, say, 50
websites a day is basically nothing. But the benefit to the public would
be enormous because these websites can start to provide all sorts of new
additional capabilities to the public (for others, see also: the OHP
report and various of my blog posts on the OHP website on this topic).
Actually, depending on how you look at it, providing this may actually
*add resources back to THOMAS*. For instance, when GovTrack fetches
information from THOMAS, it uses THOMAS resources, but in turn this lets
GovTrack serve additional citizens (with no further impact to THOMAS). I
did the math a few months ago and figured that, very very roughly, for
every page GovTrack fetches from THOMAS, GovTrack (and additional
'downstream' sites) serves that information to 30 people. That basically
means GovTrack is sort of multiplying THOMAS's resources (but currently
only by a very small amount, of course). The net impact of providing
'structured data' may very well be that it frees up THOMAS resources, in
the long run.... Again, depending on how you look at it. :)
> CHA has been very supportive in this effort, and their assistance has
> been invaluable. I'll continue to update the group as this effort moves
> forward.
Great, thanks to CHA too! As I mentioned to you off-list, the places the
House has already used the leading technology, like bill and roll call
vote XML, have been right-on. Just gotta spread the love (of XML) to
other areas of Congress.
--
- Josh Tauberer
"Yields falsehood when preceded by its quotation! Yields
falsehood when preceded by its quotation!" Achilles to
Tortoise (in "Gödel, Escher, Bach" by Douglas Hofstadter)
Sure, that's a good example of how an API or public XML database can be
put to use. For background, what this lets people do is embed a little
widget on their webpage that shows the status of a bill. As a bill moves
through Congress, the widget automatically updates.
> Does a query go to an XML legislative database (govtrack's, I believe)
> and to the script generator (Open Congress' site) every time someone
> loads the page? If this and other sorts of widgets were used on more
> websites and the XML came from the official Thomas web server, would
> there be concerns about Thomas' webserver load?
In this case, no, the widget does not impact THOMAS's webserver load. In
general it would depend on the provider of the widget, but there are
steps THOMAS could take to ensure that any API or XML database would not
be used in a way that more than marginally impacts their load, if that
is a concern.
The OpenCongress widget works like this, from start to finish:
Twice each day, GovTrack updates its unauthoratative XML database of
legislation by downloading pages from THOMAS. (Since THOMAS does not
provide structured data access, GovTrack has to reconstruct the
information from the pages on THOMAS in a pretty inelegant way, and some
times it doesn't work. This is currently the only point where THOMAS's
resources are impacted, and it is the point where an API or XML database
would be made use of.)
Following the updates, various websites including OpenCongress duplicate
the database onto their own web servers by copying the files from
GovTrack. (If THOMAS had an API, these websites would go to THOMAS
directly for their daily updates.)
When a person visits a web page that has included this widget, that
person's web browser contacts the OpenCongress web server. OpenCongress
then queries its own database, without contacting THOMAS or GovTrack,
and sends back to the visitor the content of the widget.
> In your email you suggested a scenario in which Thomas' XML database
> would be replicated on other servers - is that how these sorts of widget
> setups are generally implemented?
Mostly it depends on what the source of the data wants. For instance, if
the source of the data says it's okay to build widgets that interact
directly with their servers --- for instance, Google Maps mash-ups
accessing Google directly for their maps --- then that's what people do.
On the other hand, for GovTrack data, I tell people to replicate the
data on their servers, and so that's what they do.
Hope that helps.
--
- Josh Tauberer
"Yields falsehood when preceded by its quotation! Yields
falsehood when preceded by its quotation!" Achilles to
Tortoise (in "Gödel, Escher, Bach" by Douglas Hofstadter)
>
> On Nov 15, 2007 9:52 PM, Josh Tauberer <taub...@govtrack.us
> http://razor.occams.info <http://razor.occams.info>