Legislative Data Model

10 views

Skip to first unread message

sean.m...@gmail.com

unread,

Oct 24, 2011, 10:28:47 AM10/24/11

to NIEM EDemocracy

All,

(The e-mail below is fwd'ed for Tom Bruce who is having some
difficulties with Google Groups "come back later" messages.

Sean

----------------------------------
Sean and all:

On Oct 21, 11:59 am, "sean.mcgr...@propylon.com"
<sean.mcgr...@gmail.com> wrote:

> > Fact 2) Scrapers are brittle. Any change to a website can break a
> > scraper. This is bad both for the person doing the scraping and for
> > the legislature's IT staff. The latter will be on the receiving end of
> > complaints from third parties and may - especially if they also
> > scrape! - find themselves stuck with a bad layout or bad link
> > structure because of the knock-on impact if they make changes.
Yes. Though legislative staff are, IME, impervious to complaints.

> > Fact 3) Links on legislative websites are also - by and large -
> > brittle. Links break. Especially across sessions/bienniums when links
> > to older documents stop working or (worse) bring you to the wrong
> > document. A common example is bill numbers e.g. "House Bill 95". As
> > the bill numbers are typically re-used every biennium (or some other
> > boundary)
Yeah. This is just bad identifier design, or more accurately, a
problem that stems from not yet adapting legacy identifier designs to
the new environment that arises from exposure to the Web.

> > Fact 4) We know (broadly speaking) how to fix (2) and (3). Fact 2 can
> > be addressed with machine-readable data formats. I.e.g data.gov style
> > "feeds" of CSV, XML, JSON and notification formats such as RSS/ATOM.
> > Fact 3 can be addressed with Permanent URLs (PURLs). (See Rick
> > Jelliffe's PRESTO framework for example. Also see the URL structure on
> > kslegislature.org. Also, see the URL structure on legislation.gov.uk)
This isn't just a matter of fixing URIs, though that would be a very
good and welcome start. There are also problems of things simply not
having identifiers at all under legacy systems, and often of some
confusion about what identifiers identify and why. Some work is
needed to tie existing identifiers to Web-capable identifier schemes
(it's unlikely that the old identifiers will be legislated out of
existence overnight, and many have useful semantics that won't survive
Web exposure but are worth retaining).

> > Fact 5) The NIEM model (niem.gov) can be applied (to a degree yet to
> > be determined!) here. Bill Status information flowing from legislative
> > websites is clearly "information exchange" in the NIEM sense. If we
> > can leverage NIEM, we end up leveraging expertise, tooling etc.
> >
> > Fact 6) Figuring out a standard representation for Bill Status is
> > *hard*. Not because XML or REST or SOAP/WSDL or JSON is hard but
> > because the data model is hard. There are very significant differences
> > across state in terms of the workflow associated with bills and a very
> > rich and diverse lexicon to describe bill state and operations
> > performed on bills. Indeed even the concept of a "bill" is subject to
> > different interpretations in different legislatures.
Yes. There's a question here as to whether you want to build one
standard, or an abstract interchange standard that is not as fully
expressive or head-compatible as purpose-built approaches done by
individual legislatures. I tend to favor the latter -- a standard
intended for interchange, plus each legislature kinda doing its own
thing, with an eye to best practices. CEN MetaLex is a reasonable
point of departure for an interchange standard for markup, and would
point the way toward a data model; we did, at one point in the past,
make it work with the US Code but have never worked with it for a
lifecycle system that would start with drafting. You will probably
never get buy-in on a standard, at least not across-the-board in a
normal lifetime.

The key, I think, is careful documentation and reconciliation. Good
ER diagrams -- or better still, RDF-based models -- from each
legislature would quickly reveal points held in common and areas where
detailed modelling is best left to local extension.

> > Fact 7) Most legislative experts working in legislatures have a
> > natural human bias towards seeing the world through the eyes of their
> > legislatures' terminology and workflow. In trying to arrive at a 50-
> > state data model for bill status, we need to respect the fact that
> > legislatures are all individual snowflakes and that no one
> > legislature's current model for handling bill status is intrinsically
> > better than any other legislature's model.
See above.

> >
> > I would like to propose 5 possible approaches to a 50-state bill
> > status data model. I will go into each one in a separate post but will
> > briefly summarize them here. (I'm brainstorming them here at the
> > moment. Not favoring any one approach. Will include pros/cons for each
> > in the subsequent posts)
> >
> > Note also that I am concentrating on Bill Status here - as opposed to
> > Bill History and Bill Status codes. I.e. for now, I'm looking at a
> > model for capturing the state of a bill at one point in time as
> > opposed to looked at the set of actions (history) that have
> > accumulated on a bill.
Right, although something that provides points of contact between a
process/workflow model and a set of document models would probably
enable the construction of history more or less automagically.

> >
> > Approach 1) Full Vertical Mapping Model
> >
> > This is a "meta model" approach in which we identify an abstract model
> > to which all 50 state's can map their bill status in an Object
> > Oriented way. It might, for example have concepts like Measure (as a
> > base class for Bills, Resos, Executive Orders...) it might have "Made
> > Public" as a base class for "first reading", "introduced", "tabled"
> > etc.
Useful as a reference or interchange standard, and as a first step in
figuring out what to do. Also, you'll want to roll in some existing
models -- UK did a decent one for organizations, FOAF is useful for
some stuff, DCMI, etc. etc. Not the final answer, but good for a
process of analysis that others could make more fully expressive.

> > Approach 2) Full Horizontal Mapping Model
> >
> > We pick an existing model in use in some legislature as an exemplar
> > and map horizontally, concept for concept. I.e. the model might say
> > "first reading" and an individual legislature might map to that from
> > "introduced" etc.
Let's think of all the political reasons why Alabama would never use
anything made in Massachusetts, let alone France. This is, however, a
useful consolidation step after doing 1.

> > Approach 3) Partial Vertical Mapping Model
> >
> > Variation on (1) but with only a subset of a state's bill status model
> > mapped to the interchange model. It might for example, have
> > "introduced", "in commitee", "for final ction", "sent to governor" but
> > leave out more granular workflow states like "third reading in the
> > house of origin" or "committee of the whole - opposite chamber" etc.
Right. For me this has a kind of inevitability. There are certain
"legislative events" in the document lifecycle that are going to be
fairly universal, and others that while interesting to some kinds of
audience are not worth pursuing even within one legislature -- I am
thinking, in particular, of all the fine-grained shoving and hauling
that goes on around procedural rules and so on, and would need
extensibility into finer granularity anyway.

> > Approach 4) Partial Horizontal Mapping Model
> >
> > Variation on (2) with only a subset of a state's bill status model
> > mapped to the exemplar model. It might for example, have "first
> > reading" but not have "on general orders".
Right; useful to the extent that the horizontal approach is
worthwhile, only more practicable I think.

> > Approach 5) Percentage Complete + Probability Model
> >
> > In this model, rather than enumerate particular status terms/codes we
> > would instead standardize a "progress bar" model with an associated
> > confidence metric. I.e. "bill is 60% through the process, 60%
> > confident". The idea behind this approach would be to avoid all the
> > complexities of mapping workflow states and yet give consumers of the
> > data an intuitive (and necessarily probabilistic) feel for how far
> > along the bill is in the process and how likely it is that the
> > provided percentage is accurate.
Hoo boy. Just thinking about the number of ways that this would be
abused for political purposes makes my head spin, assuming it said
anything useful in the first place. I invite you to imagine the
amount of collective legislative intern/zealot energy that could be
applied to spinning this given a do-nothing legislature and an
imminent election, for instance. Not to mention that apples to apples
comparisons across bills within one legislature ("look at what my
party brought home for the voters this year, as opposed to those other
schleps") or across legislatures ("how's that immigration thing going,
anyway?") are going to be difficult.

> >
> > Again, let me stress that I am just brainstorming the possibilities
> > here to elicit feedback. Some of the above most certainly will not
> > fly and I'm pretty sure there are other good possibilities missing.
Despite the inevitable witticisms above, I think this is a great list,
but I'd like to look at it a little differently. All of the methods,
except the progress bar, strike me as useful modes of analysis to
bring to bear on the problem at different stages of the
standards-development process. None will suffice to carry the whole
project, which I think also needs a little pragmatism. So, I'd
suggest:

a) The target might more pragmatically be imagined as a core
interchange standard, plus a series of data-modeling "playbooks" aimed
at specific techniques for improving local practices and making them
more Web-of-Data-friendly; those local models would extend the core,
principally toward greater granularity. This partakes of the
limited-horizontal approach to modeling of documents and other things,
like people, and a limited-vertical approach to modeling legislative
process.

b) The overarching strategy would be to develop a things-and-people
model, and then to develop a process model, and then tie the two
together at specific "known-good" points. Here I am imagining that a
partial-vertical approach would reveal a set of common milestones in
each process -- big landmarks. Bill "status" -- a property of a
"thing" called a "bill" -- would be described by a limited vocabulary
very carefully controlled in terms of those milestones. Local
milestones could be tied to landmarks by a series of relationships
(very simple "followsAfter" or "isSuccessorTo" or what have you ), as
could documents in a workflow by similar means. The latter is needed
in situations where, in fact, you can have multiple competing drafts
of stuff scattered around in three commitees each with two or three
competing approaches internally. The trick is to separate the idea of
bill "status" from the problem of bill tracing (for legislative
history) or bill tracking (for process management or whatever other
purpose). They need not be the same.

c) You might get your progress meter out of the "landmarks" approach,
but you didn't hear me say that.

d) For the RDF junkies in the crowd, I'll say that I don't think you
can build these models without substantial use of reification. My aim
in making this statement is to increase the pool of people throwing
darts at me from my colleagues here at Cornell to a much wider world
;).

There's much more to say about this, but I think I've rambled on too
long already.

All the best,
Tb.

-- +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Thomas R. Bruce
Director, Legal Information Institute Cornell Law School http://www.law.cornell.edu/
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

Reply all

Reply to author

Forward

0 new messages