themiddleclass.org data

3 views
Skip to first unread message

didier deshommes

unread,
Aug 25, 2008, 1:56:16 PM8/25/08
to watchdog-...@googlegroups.com
Hi there,
I've taken a look at the data at http://www.themiddleclass.org/csv and
here's how I propose the data returned be organized:

{
rep_name: rep_name,
issue : {
issue_name: issue_name,
middle_class_position: middle__class_position,
issue_area: issue_area,
rep_vote: rep_vote
},
rep_score: rep_score,
rep_party: rep_party,
rep_state: rep_state,
year: year
}

What do people think about this?

In addition, if you go to any legislator's page, (e.g.
http://www.themiddleclass.org/legislator/daniel-akaka-405), you'll see
a list of topics associated with each bill. One could find out which
topics are "popular" with a candidate by looking the number of times
a topic's name occurs on its page. It would be even easier to find
which "issue area" a legislator holds dear by looking at
issue[issue_area] for a given rep_name over the years. Thoughts?

didier

Shahin

unread,
Aug 27, 2008, 12:58:38 AM8/27/08
to Watchdog Volunteers
I see no problem with the data structure, but I'll yield to the
experts.

> One could find out which
> topics are "popular"  with a candidate by looking  the number of times
> a topic's name occurs on its page. It would be even easier to find
> which "issue area" a legislator holds dear by looking at
> issue[issue_area] for a given rep_name over the years. Thoughts?

I like the idea of parsing the legislators' pages, voting histories
are pretty sweet.

I don't think associating a politician with issues by tag frequency
tells us anything intrinsic about the politician. It definitely tells
us something about themiddleclass.org's priorities since they compile
the lists, but:

- they're biased, and
- their domain of issue areas doesn't include major things like, say,
national security or civil liberties

It could be a cool exercise and we'd have to be very cautious about
how we interpret/present the results. Am I making sense here?

Shahin

didier deshommes

unread,
Aug 29, 2008, 11:49:05 AM8/29/08
to watchdog-...@googlegroups.com
On Wed, Aug 27, 2008 at 12:58 AM, Shahin <ssan...@gmail.com> wrote:
> I like the idea of parsing the legislators' pages, voting histories
> are pretty sweet.
>
> I don't think associating a politician with issues by tag frequency
> tells us anything intrinsic about the politician. It definitely tells
> us something about themiddleclass.org's priorities since they compile
> the lists, but:
>
> - they're biased, and
> - their domain of issue areas doesn't include major things like, say,
> national security or civil liberties

Agreed, maybe it would be more appropriate to associate them with
their "middle class bias" or show it as a list of their "top
middle-class issues". I ssupect themiddleclass.org is choosy in the
sort of issues they consider "middle-class" but nevertheless it would
be interesting to see.

didier

>
> Shahin
> >
>

Aaron Swartz

unread,
Sep 2, 2008, 8:11:23 PM9/2/08
to watchdog-...@googlegroups.com
On Mon, Aug 25, 2008 at 1:56 PM, didier deshommes <dfde...@gmail.com> wrote:
>
> Hi there,
> I've taken a look at the data at http://www.themiddleclass.org/csv and
> here's how I propose the data returned be organized:
>
> {
> rep_name: rep_name,
> issue : {
> issue_name: issue_name,
> middle_class_position: middle__class_position,
> issue_area: issue_area,
> rep_vote: rep_vote
> },
> rep_score: rep_score,
> rep_party: rep_party,
> rep_state: rep_state,
> year: year
> }
>
> What do people think about this?

Looks good. It might make sense to break it up a bit more and have one
dictionary per rep, issue combination instead of sorting them rep and
then issue. Also, do they use any kind of IDs? We should be sure to
include those if they exist. It'd be annoying if issue_name was the
only identifier they provided.

didier deshommes

unread,
Sep 3, 2008, 1:52:04 PM9/3/08
to watchdog-...@googlegroups.com
On Tue, Sep 2, 2008 at 8:11 PM, Aaron Swartz <m...@aaronsw.com> wrote:
> Looks good. It might make sense to break it up a bit more and have one
> dictionary per rep, issue combination instead of sorting them rep and
> then issue.

So you're saying it might be better to have something like this for a
representative:
rep =
{
specific_issue_name: rep_vote,
rep_name: rep_name,


rep_score: rep_score,
rep_party: rep_party,
rep_state: rep_state,
year: year
}

The details associated with of each issue would be stored as so:
issue_facts =
{
issue_name:
{
issue_area: area,
middle_class_position: position
}

}


Is this right? Note that $rep would be yielded at each iteration while
$issue_facts would only need to be computed/loaded once.

> Also, do they use any kind of IDs? We should be sure to
> include those if they exist. It'd be annoying if issue_name was the
> only identifier they provided.

There are no IDs associated with any of this data. We could associate
an ID with each issue_name by listing issue names for *all* the years
so far and assigning them an an increasing number (1,2,3,etc).
Thoughts?

didier

>
> >
>

Aaron Swartz

unread,
Sep 19, 2008, 5:40:51 PM9/19/08
to watchdog-...@googlegroups.com
I'm gonna write them and ask for IDs, since this is silly.
Reply all
Reply to author
Forward
0 new messages