Trying a more direct route in NH

4 views
Skip to first unread message

James Turner

unread,
Sep 16, 2009, 8:25:13 PM9/16/09
to Fifty State Project
Since some data (previous year's legislators, for example) isn't
available on the NH General Court web site, I thought I'd try a more
direct route. I talked to my state senator (who I happen to know),
and he gave me an introduction to the staff who operate the site. I'm
going to see if I can convince them to set up something like a RESTful
interface to the data. No idea if I'll be able to, but they didn't
say no immediately.

I was wondering if anyone else had tried this route, trying to get
cleaner data to begin with by working with the state gov?

James

Greg Osuri

unread,
Sep 17, 2009, 2:54:24 AM9/17/09
to fifty-sta...@googlegroups.com
Any other states still availble, might have some time to write a parser.

Sent from my iPhone

mindleak

unread,
Sep 18, 2009, 4:50:38 PM9/18/09
to Fifty State Project


On Sep 17, 2:54 am, Greg Osuri <greg_os...@sbilabs.com> wrote:
> Any other states still availble, might have some time to write a parser.
>
> Sent from my iPhone
>

There still are, including some juicy states that would be very nice
to get covered like New York.

-- Michael Stephens

Adam Nelson

unread,
Sep 18, 2009, 5:05:21 PM9/18/09
to fifty-sta...@googlegroups.com
New York has all sorts of problems. Thom Neale has more information
about the latest on that front (twn...@gmail.com).
--
Adam Nelson

http://unhub.com/varud

Joe Germuska

unread,
Sep 19, 2009, 8:34:35 AM9/19/09
to fifty-sta...@googlegroups.com
On Sep 18, 2009, at 4:05 PM, Adam Nelson wrote:
> New York has all sorts of problems. Thom Neale has more information
> about the latest on that front (twn...@gmail.com).


Is there a state more well suited to "the more direct route" than New
York? The NY Senate office seems very technologically enlightened.

http://www.nysenate.gov/department/cio

Or maybe Thom has already tried that and hence the "all sorts of
problems?"

Joe

--
Joe Germuska
J...@Germuska.com * http://blog.germuska.com

"Participation. That's what's gonna save the human race." --Pete Seeger

Adam Nelson

unread,
Sep 19, 2009, 10:21:51 AM9/19/09
to fifty-sta...@googlegroups.com
There are three people involved:

Thom Neale who did some programming work (although I haven't kept in
touch so I'm not sure what the status is there)
Benjamin Kallos <kal...@gmail.com>, a lawyer who recently started a
legal request to get more data.
Andrew Hoppin, CIO of the state Senate, is already aware of these two
people and their involvement.

Keep in mind though that the NY Senate had a meltdown over the past 6
months - so that created its own ripples.

Unfortunately, I haven't put in very much time at all into this since
June. I had thought I would be able to make time but I had to take a
contract creating an advertising agency platform in order to pay for
an upcoming wedding :-)

Regards,
Adam

Thom

unread,
Sep 20, 2009, 11:42:09 AM9/20/09
to Fifty State Project
Hi Everybody,

Actually, in my experience scraping NY for archival data presents some
minor problems, but scraping the current session is a snap. The
assembly's website serves everything up in plain text, and the bills
are formatted pretty consistently. I have some not-too-pretty code
that parses bills from the assembly website just fine, and all that
really remains is plugging it into the 50 states utils. If anyone is
up to take on that task, email me and I can send you what I have. It
would be an easy kill. I feel embarrassed for not getting NY up and
running already.

Thom

On Sep 19, 10:21 am, Adam Nelson <a...@varud.com> wrote:
> There are three people involved:
>
> Thom Neale who did some programming work (although I haven't kept in
> touch so I'm not sure what the status is there)
> Benjamin Kallos <kal...@gmail.com>, a lawyer who recently started a
> legal request to get more data.
> Andrew Hoppin, CIO of the state Senate, is already aware of these two
> people and their involvement.
>
> Keep in mind though that the NY Senate had a meltdown over the past 6
> months - so that created its own ripples.
>
> Unfortunately, I haven't put in very much time at all into this since
> June.  I had thought I would be able to make time but I had to take a
> contract creating an advertising agency platform in order to pay for
> an upcoming wedding :-)
>
> Regards,
> Adam
>
>
>
> On Sat, Sep 19, 2009 at 8:34 AM, Joe Germuska <j...@germuska.com> wrote:
>
> > On Sep 18, 2009, at 4:05 PM, Adam Nelson wrote:
> >> New York has all sorts of problems.  Thom Neale has more information
> >> about the latest on that front (twne...@gmail.com).
>
> > Is there a state more well suited to "the more direct route" than New
> > York?  The NY Senate office seems very technologically enlightened.
>
> >http://www.nysenate.gov/department/cio
>
> > Or maybe Thom has already tried that and hence the "all sorts of
> > problems?"
>
> > Joe
>
> > --
> > Joe Germuska
> > J...@Germuska.com *http://blog.germuska.com

Thom

unread,
Sep 20, 2009, 11:55:41 AM9/20/09
to Fifty State Project
Yes, they have developed an API that serves up legislation data from
the current session at a decent level of granularity--in either XML or
JSON: http://open.nysenate.gov/openleg/

They report that it updates once per hour (approximately), and they
plan to add additional functionality over time.

Just from looking at output for a few bills, the service looks pretty
spotty at the moment. It doesn't currently supply all the fields or
granularity this project requires. Better to stick with scraping
www.assembly.state.ny.us for now, and I have some decent code that can
fetch and parse that site. I'm pretty sure the stuff I have on github
is old and not very good, but I'll happily email my current stuff to
anyone who is interested in plugging it into the fifty states utils.

Thom



On Sep 19, 8:34 am, Joe Germuska <j...@germuska.com> wrote:
> On Sep 18, 2009, at 4:05 PM, Adam Nelson wrote:
>
> > New York has all sorts of problems.  Thom Neale has more information
> > about the latest on that front (twne...@gmail.com).
>
> Is there a state more well suited to "the more direct route" than New  
> York?  The NY Senate office seems very technologically enlightened.
>
> http://www.nysenate.gov/department/cio
>
> Or maybe Thom has already tried that and hence the "all sorts of  
> problems?"
>
> Joe
>
> --
> Joe Germuska
> J...@Germuska.com *http://blog.germuska.com

Dan McCreary

unread,
Sep 21, 2009, 8:34:00 AM9/21/09
to Fifty State Project
Hi,

We have found that working with the source systems and generating XML
directly from these systems usually give much better results than
trying to scrape HTML that may change each time the web site format is
updated. It is also much more resilient to change. For the Library
of Congress NDIIPP project this is our preferred approach but it is
slow going. Not all states use XML yet so their might need to be some
conversion process from other formats into well-formed XML formats.
We are also attempting to create a controlled vocabulary for mapping
the most common fields into canonical values. Since states have very
different systems it is almost impossible to use a single XML Schema.
Each state will have to provide the XPath expressions to get these
values out of their XML documents.

The key is to work with each state to create some stable RESTful web
service that generates XML documents for each bill.

I hope this helps.

- Dan McCreary

Thom

unread,
Sep 21, 2009, 12:14:53 PM9/21/09
to Fifty State Project
In NY scraping HTML is not a problem because the info is served as
preformatted plain text. The HTML can be sliced off, and the format of
the bill text has changed little in decades. Some robust regular
expressions are quicker and easier to implement than an xml solution
(at least at this point and in this state).
> > James- Hide quoted text -
>
> - Show quoted text -

James Turner

unread,
Sep 25, 2009, 3:47:22 PM9/25/09
to fifty-sta...@googlegroups.com
Just an FYI, I'm meeting Monday afternoon with the NH State House web guy, who is really excited about the idea of providing data in more consumable formats, and is in the process of offering XML access.  He's looking to me for ideas, and I have some, but anything you folks can suggest?

James Turner
Correspondent for the Christian Science Monitor
Contributing Editor, O'Reilly Media

tur...@blackbear.biz
603-513-2383

James Turner

unread,
Sep 29, 2009, 11:31:43 PM9/29/09
to fifty-sta...@googlegroups.com
Just as a followup, I met with one of the two guys who runs the General Court website for NH on Monday, we chatted for about 25 minutes, and he pretty much asked me what I wanted him to implement...

We settled on a RESTful interface of the form /bills/, /bills/2009/, /bills/2009/hr1 (all bills summary format, all bills 2009 summary format, full details for hr1 including voting).  The one thing that REST doesn't have an easy answer for is searching, I suggested something like /bills/2009/?title=wildflowers to find all 2009 bills with wildflowers in the title.  He's going to try to deliver the data in JSON and XML.  It's a side project for him, so it may take a bit, but he seemed really struck by the idea and looking for me (and us) to provide guidance.

James Turner
Correspondent for the Christian Science Monitor
Contributing Editor, O'Reilly Media

tur...@blackbear.biz
603-513-2383


Eric Mill

unread,
Sep 30, 2009, 3:00:52 PM9/30/09
to fifty-sta...@googlegroups.com
One suggestion - encourage them to join the Fifty State Project
mailing list too. If they knew more about what we were doing, and saw
what kind of data we're pulling, that might help them figure out what
to put out.

-- Eric

James Turk

unread,
Sep 30, 2009, 3:07:02 PM9/30/09
to fifty-sta...@googlegroups.com
This is a really great outcome, having him join the list if he is
willing is a great idea so he can bounce ideas off the community as a
whole, and I'm sure there's a lot we can learn from him as well.

Google and Pew are attempting to get developers from all 50 Secretary
of State offices to work on their Voting Information Project, it'd be
nice if we could do something similar here and start getting
developers from different state gov'ts to start talking even
informally.

-James
--
James Turk
Sunlight Labs Web Developer | www.sunlightfoundation.com
jt...@sunlightfoundation.com
Reply all
Reply to author
Forward
0 new messages