Municipal Finance Dataset going from V 1.0 to V 2.0

8 views
Skip to first unread message

Joey Coleman

unread,
Mar 29, 2011, 9:01:01 AM3/29/11
to OpenHamilton
The Municipal Finance Dataset is now live on OGDI (http://
datadotgc.cloudapp.net/DataBrowser/def/
OpenHamiltonElectFinancials#param=NOFILTER--DataView--Results)

We're presently at version 1.0 and need to start planning for version
2.0 which will expand to include much more information. We'll continue
to expand 1.0 to include information. Once we get contributor
information into the dataset, we'll officially reach 2.0.

Please discuss what data we should be adding by the end of the first
weekend in April within this thread and how we get organized to
hackfest the contribution information.

Thus far, these are the fields we have in the dataset are:

*Candidate name's with first and last in separate columns (firstname
and lastname)
**Self explanatory - who is the candidate

*pdflink
**The URL to the original pdf allowing developers to link to the
original disclosure documents.

*filedreturn
**a binary check (y/n) to allow developers to skip null records for
candidates who failed to file returns.

*ward
**M for Mayor, Ward number for Wards

*running for
**At present only lists Council. Will be used to allow separation of
public school board, Catholic school board, and city council
candidates.

*incumbent
**A binary (y/n) that allows developers to identify incumbents

*winner
**A binary (y/n) that allows developers to identify winners of races

*spendinglimit
**The spending limit for the race the candidate registered in

*spentwithin
**The amount of money a candidate spent on expenses restricted by the
spending limit.

*spentwithout
**The amount of money a candidate spent on expenses not restricted by
the spending limit and not counted towards the limit

*campaigndeficitsurplus
**The amount of money remaining at the end of a candidate's campaign.

Joey Coleman

unread,
Mar 29, 2011, 9:04:20 AM3/29/11
to OpenHamilton
I'll add the total number of votes received, percentage of voters
recieved, and for victors the number of votes in the margin of
victory. This will take us to version 1.1

In terms of getting to version 2.0, are people available next
Wednesday (April 6) for a hackfest at Think|haus?

Matt Grande

unread,
Mar 29, 2011, 9:49:03 AM3/29/11
to OpenHamilton
Hi all,

I had some free time at work this morning, so I worked on parsing the
donors to campaigns from the PDFs. Don't tell my boss.

All my work thus far has been on Bob Bratina's finances, simply
because it's machine written and has a lot of test data. Here's what
I did.

Step 1: Split the PDF and get the relevant pages using pdftk. Why did
I do this? See step 2.

>pdftk BobBratina.pdf cat 4 output BB1.pdf
>pdftk BobBratina.pdf cat 5 output BB2.pdf
>pdftk BobBratina.pdf cat 6 output BB3.pdf

Step 2: I ran each of the files through FreeOCR.com. I had to do them
seperately because FreeOCR only seems to recognize the first page.
(Side note: I know some people recommended Tesseract, but I couldn't
get it working under Windows. Any suggestions?)
I saved the output files as money1.txt (https://gist.github.com/
892382), money2.txt (https://gist.github.com/892385), and money3.txt
(https://gist.github.com/892386).

Step 3: I did some preliminary fix-ups to help my parser. The only
things done were changing a lowercase v to an uppercase V in one
postal code, and adding a blank line where another postal code was
missing.

Step 4: I ran the files through my parser (https://gist.github.com/
892387) and performed any necessary data changes

At the end of this exercisize, I have a JSON file (https://
gist.github.com/892389) with all of the campaign contributions to Bob
Bratina (in excess of $100).

(And just one thing to notice: Mary Kozar seems to have donated to the
maximum limit... twice)

Joey Coleman

unread,
Mar 29, 2011, 5:19:59 PM3/29/11
to OpenHamilton
How's next Wednesday for everyone?

Also, think|haus offers the space every Wednesday for open data if
anyone is interested in coming out tomorrow evening.

- Joey

Ryan

unread,
Mar 30, 2011, 8:44:37 AM3/30/11
to OpenHamilton
On Mar 29, 9:49 am, Matt Grande <matt.gra...@gmail.com> wrote:
> (And just one thing to notice: Mary Kozar seems to have donated to the
> maximum limit... twice)

I contacted Mayor Bratina to ask about this. It turns out to be a
copying error on the original submitted form. Bratina's accountant
caught and corrected it and resubmitted the form, but apparently the
city posted the original uncorrected form.

Regards,
Ryan

Milton Friesen

unread,
Mar 30, 2011, 4:15:19 PM3/30/11
to openha...@googlegroups.com, Ryan
Have you guys seen this data feed from a German politicians cell phone?


Open Data City did the visualization on the data (http://www.opendatacity.de/

Milton
--

_____________
Milton Friesen
Cardus :: research for the common good
Ingenuity Arts :: adaptive leadership
Profile
Recent Article
Adaptive Cities

Twitter: @ingenuityarts
Skype: ingenuityskype


Matt Grande

unread,
Mar 31, 2011, 11:24:41 AM3/31/11
to OpenHamilton
Does OGDI have a way to create sub-tables, or is it all essentially
flat files? Do we have a table set up to manage specific donations?

On Mar 30, 4:15 pm, Milton Friesen <ingenuitya...@gmail.com> wrote:
> Have you guys seen this data feed from a German politicians cell phone?
>
> http://www.zeit.de/datenschutz/malte-spitz-data-retention
>
> Open Data City did the visualization on the data (http://www.opendatacity.de/)
>
> <http://www.zeit.de/datenschutz/malte-spitz-data-retention>Milton
>
> On Wed, Mar 30, 2011 at 8:44 AM, Ryan <edi...@raisethehammer.org> wrote:
> > On Mar 29, 9:49 am, Matt Grande <matt.gra...@gmail.com> wrote:
> > > (And just one thing to notice: Mary Kozar seems to have donated to the
> > > maximum limit... twice)
>
> > I contacted Mayor Bratina to ask about this. It turns out to be a
> > copying error on the original submitted form. Bratina's accountant
> > caught and corrected it and resubmitted the form, but apparently the
> > city posted the original uncorrected form.
>
> > Regards,
> > Ryan
>
> --
>
> *_____________
> **Milton Friesen
> **Cardus <http://www.cardus.ca>* :: *research for the common good*
> *Ingenuity Arts <http://www.ingenuityarts.com>* :: *adaptive leadership
> *Profile <http://www.cardus.ca/organization/team/milton/>
> Recent Article <http://www.cardus.ca/comment/article/2203/>
> A<https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0Bw...>daptive
> Cities<https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0Bw...>
>
> Twitter: @ingenuityarts
> Skype: ingenuityskype

NikGarkusha

unread,
Apr 4, 2011, 2:37:26 PM4/4/11
to OpenHamilton
Matt -

sorry for the delayed response.

OGDI at its core is a "NoSQL" type data service where you can store
any type of *entities* (in our case we're just loading simple CSV
fields that get mapped out to an entity defined "fields", which are
also recorded in a container-type entity defined as "table"). It's all
"entities" and there are no explicit relationships defined between
entities, only data that gets stored in the data service.

The SDK (visual UI that we currently use to present/visualize/map/
chart the data), is just the web front-end to the data service. The
service itself is exposed via APIs and services like oDATA, which can
be queried any way you want, but it's not your traditional table
[join] table on a field [ID]=[ID] type of query. It's around filtering/
sorting/etc. but for one specific "container" entity only.

So, long answer is no, there's no 1-step query across multiple
entities that were defined as "tables", because OGDI using not using a
database-like relationship, or nested tables, or stored procedures,
etc. -- all for the purposes of being flexible and faster than a
database.

Nik

Ryan McGreal, Raise the Hammer

unread,
Apr 4, 2011, 3:04:16 PM4/4/11
to openha...@googlegroups.com
On Mon, April 4, 2011 2:37 pm, NikGarkusha wrote:
> OGDI at its core is a "NoSQL" type data service

What are you using for the backend datastore?

NikG

unread,
Apr 4, 2011, 3:17:29 PM4/4/11
to openha...@googlegroups.com
Not sure if my previous message just disappeared, but here it is:

OGDI is using all Azure cloud backend, specifically Azure Tables for the data service
The code (open source) and more tech details on codeplex: http://ogdi.codeplex.com/

Joey Coleman

unread,
May 31, 2011, 1:13:31 AM5/31/11
to openha...@googlegroups.com
I've started listing the donations by candidate.

I'll be focusing on the winners first.

Here's a first map of Mayor Bratina's donations: http://www.google.com/fusiontables/DataSource?snapid=S199998fCjM

I've colour coded the dots by donation amount. The location is based upon Postal Code, not street address.

I have a bit of work to do with the data. Thought this be a fun teaser.

- Joey

Bill Dunphy

unread,
May 31, 2011, 8:56:26 AM5/31/11
to openha...@googlegroups.com
Very nice Joey - wouldn't you know that every single one of the Oakville, Mississuaga and Toronto donations comes from... developers. (With the exception of a VP from Tim Hortons - but arguably their interest is also development....) 

b
--
Bill Dunphy
Twitter- Typist



Joey Coleman

unread,
May 31, 2011, 9:11:32 AM5/31/11
to openha...@googlegroups.com
Not at all surprised. Likely be the same for all other candidates.
 
Interestingly, his official statement says "Cerello, Clark" of Dundas donated twice in the amount of $750.00 each time. http://www.hamilton.ca/NR/rdonlyres/65D83F6B-BB25-424E-9E15-84D3DD0E05DF/0/BobBratina.pdf
 
I have to add the corporate in-kind donations to the list, standardize the street address information (thank you Google Refine), and then I'll have the first dataset complete.
 
In terms of the map, I have to manually work on the postal codes with multiple donors (the dots overlap and only one is visible). Then I'll create a nice HTML bubble for each and use javascript to create a legend for the dots.
 
Once this is complete, I'll publish the map publicly.
 
- Joey

Bill Dunphy

unread,
May 31, 2011, 9:37:40 AM5/31/11
to openha...@googlegroups.com
Looking forward to it.

b

Ryan McGreal, Raise the Hammer

unread,
May 31, 2011, 9:53:43 AM5/31/11
to openha...@googlegroups.com, openha...@googlegroups.com
On Tue, May 31, 2011 9:11 am, Joey Coleman wrote:
> Interestingly, his official statement says "Cerello, Clark" of Dundas
> donated twice in the amount of $750.00 each time.

I just checked the PDF, and it lists a donation of $300.00 followed by a
donation of $450. If the dataset says differently, it's an input error.

Joey Coleman

unread,
May 31, 2011, 10:00:40 AM5/31/11
to openha...@googlegroups.com
Thanks Ryan,
 
I must have read it wrong among all the other numbers. Checked my table and it's correct as well.
 
Sleep may be something I consider tonight.
 
- Joey

Reply all
Reply to author
Forward
0 new messages