-James
> --
> You received this message because you are subscribed to the Google Groups "Open State Project" group.
> To post to this group, send email to fifty-sta...@googlegroups.com.
> To unsubscribe from this group, send email to fifty-state-pro...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/fifty-state-project?hl=en.
>
>
I actually find the way that OpenState has decided to model this to be incomplete and it leads to this sort of problem. It is an understandable design, but it is legislator-centric. The more important entity in the legislature is a legislative seat. I think it makes sense to model legislators as being attached to and unattached from seats, since this is exactly what happens when people are elected and then leave office or transition to another office. This is obviously a "next major revision" kind of proposal and I may have to put it on the issues list as such.
For example, back in 2010, we had a legislature who resigned to become Lieutenant Governor and someone else, later, won a special election for the seat. This would be modeled as:
Senate seat 15 ->
legislator001: name: Abel Maldonado; start: Dec 1, 2008; end: Apr 27, 2010;
legislator999: name: Vacancy; start: Apr 27, 2011; end: Aug 23, 2010;
legislator002: name: Sam Blakeslee; start: Aug 23, 2010; end: NULL;
As far as I can tell, in OpenStates system, this would be modeled as something like this:
leg: leg_id: legislator001; name: Abel Maldonado; old_roles: [{chamber: upper; district: 15; term: 2009-2010; start_date: NULL; end_date: NULL}]
leg: leg_id: legislator002; name: Sam Blakeslee; roles: [{chamber: upper; district: 15; term: 2009-2010; start_date: NULL; end_date: NULL}]
It is just not possible to figure out what has happened here. Perhaps if the start_date or end_date fields were not always NULL, this would be clearer. Actually, as I look at it, I am realizing that if the start_date or end_date fields were always accurate, the OpenStates design would be sufficient.
Of course, one would still want to have:
leg: leg_id: legislator999; name: Vacancy; roles: [{chamber: upper; district: 15; term: 2009-2010; start_date: Apr 27, 2010; end_date: Aug 23, 2010}]
If one only has the two legislators above, if the start_date and end_date fields are properly filled in, the lack of a legislator for Apr - Aug could be an error. It is hard to turn the negative fact: "there was no legislator listed for the seat for the time period" to the positive fact: "there was a vacancy for the time period".
I have filter methods that I am using that try to characterize the data and the results of this, for data pulled on 6/21/2011, for the roles are below. If the dates are going to be filled in, there is obviously work to do.
cheers - ray
state: ak, checkLegislatorRolesHaveDates found roles # 475, having dates # 0 and missing dates # 475
state: az, checkLegislatorRolesHaveDates found roles # 425, having dates # 0 and missing dates # 425
state: ca, checkLegislatorRolesHaveDates found roles # 1092, having dates # 0 and missing dates # 1092
state: dc, checkLegislatorRolesHaveDates found roles # 60, having dates # 0 and missing dates # 60
state: fl, checkLegislatorRolesHaveDates found roles # 398, having dates # 0 and missing dates # 398
state: in, checkLegislatorRolesHaveDates found roles # 659, having dates # 0 and missing dates # 659
state: la, checkLegislatorRolesHaveDates found roles # 372, having dates # 0 and missing dates # 372
state: md, checkLegislatorRolesHaveDates found roles # 1089, having dates # 0 and missing dates # 1089
state: mi, checkLegislatorRolesHaveDates found roles # 465, having dates # 0 and missing dates # 465
state: mn, checkLegislatorRolesHaveDates found roles # 995, having dates # 0 and missing dates # 995
state: nc, checkLegislatorRolesHaveDates found roles # 465, having dates # 0 and missing dates # 465
state: nj, checkLegislatorRolesHaveDates found roles # 673, having dates # 0 and missing dates # 673
state: nv, checkLegislatorRolesHaveDates found roles # 390, having dates # 0 and missing dates # 390
state: oh, checkLegislatorRolesHaveDates found roles # 656, having dates # 0 and missing dates # 656
state: pa, checkLegislatorRolesHaveDates found roles # 1376, having dates # 0 and missing dates # 1376
state: sd, checkLegislatorRolesHaveDates found roles # 406, having dates # 0 and missing dates # 406
state: tx, checkLegislatorRolesHaveDates found roles # 1478, having dates # 0 and missing dates # 1478
state: ut, checkLegislatorRolesHaveDates found roles # 352, having dates # 0 and missing dates # 352
state: va, checkLegislatorRolesHaveDates found roles # 1251, having dates # 0 and missing dates # 1251
state: vt, checkLegislatorRolesHaveDates found roles # 539, having dates # 0 and missing dates # 539
state: wa, checkLegislatorRolesHaveDates found roles # 622, having dates # 0 and missing dates # 622
state: wi, checkLegislatorRolesHaveDates found roles # 1204, having dates # 0 and missing dates # 1204
A seat-based system would present its own trade-offs as the more
natural unit of organization is "how did this legislator's career
evolve" and organizing it by legislator makes that a much easier
question to answer.
In our own usage and that of most people we've talked to the
legislator-centric model is more suited to quickly answering those
questions. Especially when you notice that many legislators hold
different seats over the course of their career (redistricting and
changes between chamber).
Organizing by seat is also quite challenging when you take into
account multi-seat districts, some districts elect 2-5 people with no
distinction between them, so saying "District 8 is currently served by
Bob Smith" is less correct than saying "Bob Smith currently serves in
district 8."
We're unlikely to switch to a seat-based model for these and other reasons.
You're correct however in noticing that begin/end_date are not well
used but exist for the purpose of helping to show resignations/etc.
At the moment there aren't good resources that we've identified to
help find these dates, as often official sites leave old legislators
up for days or even months after they leave office due to resignation
or death.
The approach we are taking is simple for now, when we're notified that
someone we have as active is no longer in office, we'll attempt to
find the date they left office and mark them as such. This has worked
decently at the federal level but will require a bit of effort to
scale it to the state level.
-James
There are certainly two ways to model this, and to directly find
vacancies it is true that the current system is not ideal.
A seat-based system would present its own trade-offs as the more
natural unit of organization is "how did this legislator's career
evolve" and organizing it by legislator makes that a much easier
question to answer.
In our own usage and that of most people we've talked to the
legislator-centric model is more suited to quickly answering those
questions. Especially when you notice that many legislators hold
different seats over the course of their career (redistricting and
changes between chamber).
Organizing by seat is also quite challenging when you take into
account multi-seat districts, some districts elect 2-5 people with no
distinction between them, so saying "District 8 is currently served by
Bob Smith" is less correct than saying "Bob Smith currently serves in
district 8."
We're unlikely to switch to a seat-based model for these and other reasons.
You're correct however in noticing that begin/end_date are not well
used but exist for the purpose of helping to show resignations/etc.
At the moment there aren't good resources that we've identified to
help find these dates, as often official sites leave old legislators
up for days or even months after they leave office due to resignation
or death.
The approach we are taking is simple for now, when we're notified that
someone we have as active is no longer in office, we'll attempt to
find the date they left office and mark them as such. This has worked
decently at the federal level but will require a bit of effort to
scale it to the state level.
-James
On Sat, Jun 25, 2011 at 6:32 PM, Ray Kiddy <r...@ganymede.org> wrote:On Jun 23, 2011, at 3:14 PM, Dane wrote:I just noticed as I'm getting familiar with the Missouri legislatorscraper that three seats are vacant. Should this scraper somehow catchthese seats and note that they are vacant? The district information isstill valid and maybe useful, like the term/chamber/disctrictinformation.Thanks...I actually find the way that OpenState has decided to model this to be incomplete and it leads to this sort of problem. It is an understandable design, but it is legislator-centric. The more important entity in the legislature is a legislative seat. I think it makes sense to model legislators as being attached to and unattached from seats, since this is exactly what happens when people are elected and then leave office or transition to another office. This is obviously a "next major revision" kind of proposal and I may have to put it on the issues list as such.
<snip>
Implementing the start and end dates might be tricky in a few states (where it isn't obvious the change occurred) and necessitate a manual remunge, unless you just rely on the run dates of the scrapers and flag changes whenever there is different legislator bio info ... Then you get into multiple DB table snapshots and fuzzy matching. I'm all in favor of fuzzifying matches (given that it's sort of black magic to me), but I'd really like to see that put to use on transparency data contributors.
I'm thinking something like
# simple listing of all seats, useful for mapping districts, etc.
upper_chamber_seats = ['1','2','3','4','5'...]
# if the value is a dict, names are a mapping of seat name to # of
simultaneous holders
upper_chamber_seat_occupancy = {'1': 2, '2': 2, '3': 3}
# if the value is a number, assume all seats in this chamber have the
same number of occupants
upper_chamber_seat_occupancy = 2
Compiling these lists will take some time (and I'll be honest, I'm
inclined to delay doing a full sweep of all 50 until redistricting
takes place since we're so close) but would they be useful?
-james
On Tue, Jun 28, 2011 at 12:33 AM, Gregory Combs <gco...@gmail.com> wrote:
> Legislator-centric works well for me, since I tie in connections to nimsp and votesmart directly ... I wind up constructing a separate district/seat schema myself, since districts don't change but once every ten years (unless you're feeling frisky in TX), I just use clues from open states to give me the legislator id with a district matching the one I'm inquiring about ... It's relational, but if there's no legislator for that seat, I can still hit up the district to check it's map boundaries, etc. Granted, I'm not particularly concerned about who once had the seat, but rather who, if anyone, has it now.
>
> Implementing the start and end dates might be tricky in a few states (where it isn't obvious the change occurred) and necessitate a manual remunge, unless you just rely on the run dates of the scrapers and flag changes whenever there is different legislator bio info ... Then you get into multiple DB table snapshots and fuzzy matching. I'm all in favor of fuzzifying matches (given that it's sort of black magic to me), but I'd really like to see that put to use on transparency data contributors.
>
> --
> You received this message because you are subscribed to the Google Groups "Open State Project" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/fifty-state-project/-/oK85wQ73sAsJ.