HTTP REST-based (i.e. GET) API for getting records, or just HTTP/FTP
access to files directly. No SOAP, web services, or whatever. Simple
simple simple.
> Is a complete dump
> of the entire database necessary?
It would be a good idea, for sure. Considering THOMAS has records for
somewhere in the ballpark of 200,000 bills (probably around 1GB of data,
based on my own database), if you want all of it, no one is going to be
happy with 200,000 uncompressed HTTP requests (esp. at their current
maxmimum permitted rate of one per second). If you're trying to get a
new project going, you might want the whole database.
> * How could sites be made aware of changes to the system? Rather
> than accessing every bill record every night, is there a way that
> sites could only access records that had been updated (i.e. new
> cosponsors, bill action, etc).
That's an absolute must. That's one of the biggest problems I have with
GovTrack. Not all bill updates are reflected in the Daily Digest, and
there's no other way to get a list of changed bills. (The D.D. is also
not machine readable...)
That could be done simply by updating a file with the last modified time
of each record any time a record is modified, or by making a dynamic
page that gives all modified records within a given time frame.
(Critically, these pages should at the very least cover 7 days of
changes in one request and not require paging through 1-50, 51-100, etc.
That's so annoying.) This *could* be done in RSS, which would sort of
make use of standard date formats and things, so long as it refers to
records unambiguously, and that might give it a dual use for
individuals. But, that might be unnecessary.
> * Is it important that RSS feeds be made available for search terms?
> For example, an RSS feed for all new bills that contain the word
> Iraq in the text.
This shouldn't be a point that slows down anything else. RSS feeds by
LIV terms (as I do) is a good starting place, but certainly full text
search feeds would be nice. Not sure if it's computationally/cost
realistic though.
--
- Josh Tauberer
"Yields falsehood when preceded by its quotation! Yields
falsehood when preceded by its quotation!" Achilles to
Tortoise (in "Gödel, Escher, Bach" by Douglas Hofstadter)
Oh, so been there, done that!
Lately, because of the rate at which these open data things are
improving, and the fact that the LOC people that I talked to don't even
seem to have any interest in public open data, my take is that the best
hope for seeing progress is to suggest the simplest way to go forward.
That means XML, REST, etc.
(Have you see these?
http://www.govtrack.us/sparql.xpd
http://www.govtrack.us/source.xpd )
(And, btw, I take friendly issue with your blog entry that the WaPo is
leading the way in 21st century democracy with their votes database.
::grin::)
Chris Baker wrote:
> I'd like to see the data as plastic as
> possible, and to me that means RDF.
Oh, so been there, done that!
Lately, because of the rate at which these open data things are
improving, and the fact that the LOC people that I talked to don't even
seem to have any interest in public open data, my take is that the best
hope for seeing progress is to suggest the simplest way to go forward.
That means XML, REST, etc.
(Have you see these?
http://www.govtrack.us/sparql.xpd
http://www.govtrack.us/source.xpd )
(And, btw, I take friendly issue with your blog entry that the WaPo is
leading the way in 21st century democracy with their votes database.
::grin::)
Well, we're *trying* to do that. But Josh got there first, and I hope
he knows that how many folks in the media appreciate his efforts.
As to the questions raised, I'd prefer an entire database dump, or at
least sections of the database that can be regularly updated, sort of
the way the Federal Election Commission does with its data. Plenty of
other things can be extended from that, including RSS and other stuff,
so if requiring the LoC to have it delays things, then I second Josh's
recommendation. Simplicity works best.
Derek Willis
washingtonpost.com
1. (and this has been mentioned here before) "simple" gets
implemented and "complex" does not. that's an oversimplification, of
course, but we've seen examples: html being so simple that it
enabled the web, but with rdf being complex and much slower to create
the semantic web; the government's "GILS" (Government Information
Locator Service) being well-thought out, but mostly un-implemented; etc.
2. separation of data from applications is *always* better for
preservation purposes. when agencies instantiate their information in
databases it is just too easy for them to leave out data that doesn't
fit the database design and too tempting to use software-specific
functionality that gets lost in translation to any other system.
this leads me only to generic suggestions, not specific ones:
a. software-neutral and OS-neutral formats for distribution and
preservation (i.e., xml)
b. "minimal-level" rules for mark-up and metadata to help ensure that
some core information *always* gets produced and saved -- even if it
is *possible* to produce much more complex and demanding and
expensive information.
c. flexibility: standards should allow for complete, comprehensive
markup without limitations (field size, character-encoding, etc.) and
should allow for change over time as our needs change.
James A. Jacobs
Data Services Librarian Emeritus
University of California San Diego
You may be interested in this article on beltway blogging. David All is mentioned.
Beltway Blogroll: National Journal's Cover Story On Blogs
Mike Stern
-A web service with XML.
- How could sites be made aware of changes to the system? Rather
than
accessing every bill record every night, is there a way that sites
could
only access records that had been updated (i.e. new cosponsors,
bill
action, etc).
-They don't need to be made aware of changes, they just need to
develop a system for re-checking the web service and identifying new
content.
Queries don't need to be babysat--- if only some records were
available at any time, there would enevitably be a time when you'd
need the old ones that
were no longer available.
- Is it important that RSS feeds be made available for search
terms?
For example, an RSS feed for all new bills that contain the word
Iraq in the
text.
Not really--I can understand why a small slice of the population
would want it, but I don't think it's really needed by the broader
population. If you have a web
service, you have everything, and you can tease out terms,
simplifying information for others.
- What's the work around for this need today?
There's a couple of good web services created by a few states
already. They made it in house and are willing to share it freely
with others. LoC doesn't need to pay out a bunch of money for this---
it's freely available to them now, and they just need to make a few
changes to suit their needs.