All,
Now that we have a reasonable number of interested folks signed up to
this mailing list (with many more invited!), I would like to start the
design conversation around Bill Status. The goal being to arrive at a
design approach that has broad acceptance in this community or to
conclude that it isn't possible or worth it. (I believe it is both
possible and worth it but I'm open to being convinced otherwise.)
Suggested facts as our starting points
Fact 1) States currently - by and large - have their legislative
websites "scraped" by third parties.
Fact 2) Scrapers are brittle. Any change to a website can break a
scraper. This is bad both for the person doing the scraping and for
the legislature's IT staff. The latter will be on the receiving end of
complaints from third parties and may - especially if they also
scrape! - find themselves stuck with a bad layout or bad link
structure because of the knock-on impact if they make changes.
Fact 3) Links on legislative websites are also - by and large -
brittle. Links break. Especially across sessions/bienniums when links
to older documents stop working or (worse) bring you to the wrong
document. A common example is bill numbers e.g. "House Bill 95". As
the bill numbers are typically re-used every biennium (or some other
boundary)
Fact 4) We know (broadly speaking) how to fix (2) and (3). Fact 2 can
be addressed with machine-readable data formats. I.e.g
data.gov style
"feeds" of CSV, XML, JSON and notification formats such as RSS/ATOM.
Fact 3 can be addressed with Permanent URLs (PURLs). (See Rick
Jelliffe's PRESTO framework for example. Also see the URL structure on
kslegislature.org. Also, see the URL structure on
legislation.gov.uk)
Fact 5) The NIEM model (
niem.gov) can be applied (to a degree yet to
be determined!) here. Bill Status information flowing from legislative
websites is clearly "information exchange" in the NIEM sense. If we
can leverage NIEM, we end up leveraging expertise, tooling etc.
Fact 6) Figuring out a standard representation for Bill Status is
*hard*. Not because XML or REST or SOAP/WSDL or JSON is hard but
because the data model is hard. There are very significant differences
across state in terms of the workflow associated with bills and a very
rich and diverse lexicon to describe bill state and operations
performed on bills. Indeed even the concept of a "bill" is subject to
different interpretations in different legislatures.
Fact 7) Most legislative experts working in legislatures have a
natural human bias towards seeing the world through the eyes of their
legislatures' terminology and workflow. In trying to arrive at a 50-
state data model for bill status, we need to respect the fact that
legislatures are all individual snowflakes and that no one
legislature's current model for handling bill status is intrinsically
better than any other legislature's model.
I would like to propose 5 possible approaches to a 50-state bill
status data model. I will go into each one in a separate post but will
briefly summarize them here. (I'm brainstorming them here at the
moment. Not favoring any one approach. Will include pros/cons for each
in the subsequent posts)
Note also that I am concentrating on Bill Status here - as opposed to
Bill History and Bill Status codes. I.e. for now, I'm looking at a
model for capturing the state of a bill at one point in time as
opposed to looked at the set of actions (history) that have
accumulated on a bill.
Approach 1) Full Vertical Mapping Model
This is a "meta model" approach in which we identify an abstract model
to which all 50 state's can map their bill status in an Object
Oriented way. It might, for example have concepts like Measure (as a
base class for Bills, Resos, Executive Orders...) it might have "Made
Public" as a base class for "first reading", "introduced", "tabled"
etc.
Approach 2) Full Horizontal Mapping Model
We pick an existing model in use in some legislature as an exemplar
and map horizontally, concept for concept. I.e. the model might say
"first reading" and an individual legislature might map to that from
"introduced" etc.
Approach 3) Partial Vertical Mapping Model
Variation on (1) but with only a subset of a state's bill status model
mapped to the interchange model. It might for example, have
"introduced", "in commitee", "for final ction", "sent to governor" but
leave out more granular workflow states like "third reading in the
house of origin" or "committee of the whole - opposite chamber" etc.
Approach 4) Partial Horizontal Mapping Model
Variation on (2) with only a subset of a state's bill status model
mapped to the exemplar model. It might for example, have "first
reading" but not have "on general orders".
Approach 5) Percentage Complete + Probability Model
In this model, rather than enumerate particular status terms/codes we
would instead standardize a "progress bar" model with an associated
confidence metric. I.e. "bill is 60% through the process, 60%
confident". The idea behind this approach would be to avoid all the
complexities of mapping workflow states and yet give consumers of the
data an intuitive (and necessarily probabilistic) feel for how far
along the bill is in the process and how likely it is that the
provided percentage is accurate.
Again, let me stress that I am just brainstorming the possibilities
here to elicit feedback. Some of the above most certainly will not
fly and I'm pretty sure there are other good possibilities missing.
Lets get a conversation going! I'll monitor and contribute and will
gladly take on the job of summarizing emerging consensus and driving
on. I will also happily step back if somebody else on this list wants
to grab the reigns and drive. Just let me know.
Sean