Towards a Bill Status Data Model

Skip to first unread message

Oct 21, 2011, 11:59:53 AM10/21/11
to NIEM EDemocracy

Now that we have a reasonable number of interested folks signed up to
this mailing list (with many more invited!), I would like to start the
design conversation around Bill Status. The goal being to arrive at a
design approach that has broad acceptance in this community or to
conclude that it isn't possible or worth it. (I believe it is both
possible and worth it but I'm open to being convinced otherwise.)

Suggested facts as our starting points

Fact 1) States currently - by and large - have their legislative
websites "scraped" by third parties.

Fact 2) Scrapers are brittle. Any change to a website can break a
scraper. This is bad both for the person doing the scraping and for
the legislature's IT staff. The latter will be on the receiving end of
complaints from third parties and may - especially if they also
scrape! - find themselves stuck with a bad layout or bad link
structure because of the knock-on impact if they make changes.

Fact 3) Links on legislative websites are also - by and large -
brittle. Links break. Especially across sessions/bienniums when links
to older documents stop working or (worse) bring you to the wrong
document. A common example is bill numbers e.g. "House Bill 95". As
the bill numbers are typically re-used every biennium (or some other

Fact 4) We know (broadly speaking) how to fix (2) and (3). Fact 2 can
be addressed with machine-readable data formats. I.e.g style
"feeds" of CSV, XML, JSON and notification formats such as RSS/ATOM.
Fact 3 can be addressed with Permanent URLs (PURLs). (See Rick
Jelliffe's PRESTO framework for example. Also see the URL structure on Also, see the URL structure on

Fact 5) The NIEM model ( can be applied (to a degree yet to
be determined!) here. Bill Status information flowing from legislative
websites is clearly "information exchange" in the NIEM sense. If we
can leverage NIEM, we end up leveraging expertise, tooling etc.

Fact 6) Figuring out a standard representation for Bill Status is
*hard*. Not because XML or REST or SOAP/WSDL or JSON is hard but
because the data model is hard. There are very significant differences
across state in terms of the workflow associated with bills and a very
rich and diverse lexicon to describe bill state and operations
performed on bills. Indeed even the concept of a "bill" is subject to
different interpretations in different legislatures.

Fact 7) Most legislative experts working in legislatures have a
natural human bias towards seeing the world through the eyes of their
legislatures' terminology and workflow. In trying to arrive at a 50-
state data model for bill status, we need to respect the fact that
legislatures are all individual snowflakes and that no one
legislature's current model for handling bill status is intrinsically
better than any other legislature's model.

I would like to propose 5 possible approaches to a 50-state bill
status data model. I will go into each one in a separate post but will
briefly summarize them here. (I'm brainstorming them here at the
moment. Not favoring any one approach. Will include pros/cons for each
in the subsequent posts)

Note also that I am concentrating on Bill Status here - as opposed to
Bill History and Bill Status codes. I.e. for now, I'm looking at a
model for capturing the state of a bill at one point in time as
opposed to looked at the set of actions (history) that have
accumulated on a bill.

Approach 1) Full Vertical Mapping Model

This is a "meta model" approach in which we identify an abstract model
to which all 50 state's can map their bill status in an Object
Oriented way. It might, for example have concepts like Measure (as a
base class for Bills, Resos, Executive Orders...) it might have "Made
Public" as a base class for "first reading", "introduced", "tabled"

Approach 2) Full Horizontal Mapping Model

We pick an existing model in use in some legislature as an exemplar
and map horizontally, concept for concept. I.e. the model might say
"first reading" and an individual legislature might map to that from
"introduced" etc.

Approach 3) Partial Vertical Mapping Model

Variation on (1) but with only a subset of a state's bill status model
mapped to the interchange model. It might for example, have
"introduced", "in commitee", "for final ction", "sent to governor" but
leave out more granular workflow states like "third reading in the
house of origin" or "committee of the whole - opposite chamber" etc.

Approach 4) Partial Horizontal Mapping Model

Variation on (2) with only a subset of a state's bill status model
mapped to the exemplar model. It might for example, have "first
reading" but not have "on general orders".

Approach 5) Percentage Complete + Probability Model

In this model, rather than enumerate particular status terms/codes we
would instead standardize a "progress bar" model with an associated
confidence metric. I.e. "bill is 60% through the process, 60%
confident". The idea behind this approach would be to avoid all the
complexities of mapping workflow states and yet give consumers of the
data an intuitive (and necessarily probabilistic) feel for how far
along the bill is in the process and how likely it is that the
provided percentage is accurate.

Again, let me stress that I am just brainstorming the possibilities
here to elicit feedback. Some of the above most certainly will not
fly and I'm pretty sure there are other good possibilities missing.

Lets get a conversation going! I'll monitor and contribute and will
gladly take on the job of summarizing emerging consensus and driving
on. I will also happily step back if somebody else on this list wants
to grab the reigns and drive. Just let me know.


Karen Suhaka

Oct 22, 2011, 12:55:03 PM10/22/11
I'm going to jump right in.  Sean, great first pass at possible approaches.

As a third party vendor, and as a member of the general public, a Full Vertical Mapping Model (or partial but adequate) would be the best for me to consume.  Possibly paired with the percentage/probability.  That gives me the best chance at understanding the status, without needing to understand the many nuances of whichever state I'm looking at.

I'm guessing the full vertical approach would also require the most effort, both in up front work and collaboration.  Making it perhaps the least practical.  But maybe we could take baby steps, defining just three or four statuses to start and taking a crack at mapping various states into those simple terms?  Then we can refine until we've captured the most important points, without getting lost in the weeds.

Horizontal would be almost as good, from a citizen's point of view.  If there is indeed a state typical enough to start from.  Much less up front work would be required, but lots of collaboration, and also quite a bit of graciousness would be needed. 

Of course I really dig the last option, but I think the data can lead us there without requiring that much teamwork.  Just a couple devoted analysts and a good database!

My two cents.



Shay Wilson

Oct 24, 2011, 5:18:23 PM10/24/11
I'll keep it brief as I'm not well studied in data structures, but here's my opinion anyway.

Information being as valuable as it is I hate to throw out anything so I'd prefer either of the full mappings with legislatures choosing to skip statuses that they never use.  It also makes it simple where you have a one to one status mapping and you never have the mistake of two groups mapping the same specific status to two slightly different general statuses. 

Would it be in the scope of this project to identify a standardized and convenient way to advertise the statuses that would be used by a given legislature.

As an aside how many legislatures operate under some sort of uniform rules vs a statutory or constitution mandate over how they deal with the passing of legislation.  I ask because conceivably everything I know about what status a bill can be in and how it flows from one to the other can be changed on the fly.

Nov 15, 2011, 11:13:42 AM11/15/11
to NIEM EDemocracy

On Oct 22, 10:55 am, Karen Suhaka <> wrote:
> As a third party vendor, and as a member of the general public, a Full
> Vertical Mapping Model (or partial but adequate) would be the best for me to
> consume.  Possibly paired with the percentage/probability.  That gives me
> the best chance at understanding the status, without needing to understand
> the many nuances of whichever state I'm looking at.

Ok. Thanks for the input. I like the sound of "partial by adequate".
It speaks
to the pragmatic approach we need to take here. Find the 80/20 point
that I'm sure exists.

The percentage/probability one (Tom Bruce commented on this too) is
not something we can have directly in the data as it probably crosses
the line
into interpretation/opinion too much. It might however, be a great
place for
third party vendors to add value as folks like you will be in a
position to add
interpretation as part of your value-add.

Thanks again for the input.

Reply all
Reply to author
0 new messages