Hansard Parser

47 views
Skip to first unread message

J Wells

unread,
Mar 24, 2012, 6:53:41 PM3/24/12
to openaust...@googlegroups.com
Hi all,

I went to the openaustralia.org site the other day and noticed some missing speeches, then saw the twitter post about the parser.

I've made a few changes to the parser which got it working again on my local machine for all of Feb/March. Looks like the APH XML format has changed to what looks like a slightly cleaner output.


It might be worth someone with a better understanding of the code base testing the results but hopefully it'll help. For late Feb data I needed to add 'Robert John Carr' with alt name 'Bob Carr' to the senators/people csv files to fix an error that was occurring.

I also noticed after all the hansard data was loaded that I wasn't getting any speeches listed on the Senator's or Member's pages. It looks like there might be a different search system used for that rather then pulling the data directly out of mysql but I wasn't sure so perhaps I missed something.

Regards,
Justin

Henare Degan

unread,
Mar 24, 2012, 11:09:56 PM3/24/12
to openaust...@googlegroups.com
On 25 March 2012 09:53, J Wells <def...@gmail.com> wrote:
Hi all,

I went to the openaustralia.org site the other day and noticed some missing speeches, then saw the twitter post about the parser.

I've made a few changes to the parser which got it working again on my local machine for all of Feb/March. Looks like the APH XML format has changed to what looks like a slightly cleaner output.


It might be worth someone with a better understanding of the code base testing the results but hopefully it'll help. For late Feb data I needed to add 'Robert John Carr' with alt name 'Bob Carr' to the senators/people csv files to fix an error that was occurring.

Very awesome Justin!

I've parsed the first day of sitting for 2012 (2012-02-07) and I found it wasn't getting these speeches under Questions without Notice: http://parlinfo.aph.gov.au/parlInfo/search/display/display.w3p;db=CHAMBER;id=chamber%2Fhansardr%2Fabbbcf3b-f1cf-40d7-bf3e-751a75751885%2F0029;query=Id%3A%22chamber%2Fhansardr%2Fabbbcf3b-f1cf-40d7-bf3e-751a75751885%2F0001%22

Any ideas?

Also the speaker is not being picked up as a person in the Personal Explanations debates (i.e. his speeches are just appearing in the text): http://parlinfo.aph.gov.au/parlInfo/search/display/display.w3p;db=CHAMBER;id=chamber%2Fhansardr%2Fabbbcf3b-f1cf-40d7-bf3e-751a75751885%2F0060;query=Id%3A%22chamber%2Fhansardr%2Fabbbcf3b-f1cf-40d7-bf3e-751a75751885%2F0001%22

BTW, here's the ticket related to this latest round of APH breakages: http://tickets.openaustraliafoundation.org.au/browse/OA-499

In my quick testing I also picked up another bug with the parser but I think it's from before this latest problem: http://tickets.openaustraliafoundation.org.au/browse/OA-502

I also noticed after all the hansard data was loaded that I wasn't getting any speeches listed on the Senator's or Member's pages. It looks like there might be a different search system used for that rather then pulling the data directly out of mysql but I wasn't sure so perhaps I missed something.

That uses the Xapian search index so it needs to be up to date (see `twfy/search/index.pl`). That's probably not clear in the install instructions so if you can suggest improvements that'd be great.

Cheers,

Henare

Justin Wells

unread,
Mar 25, 2012, 2:44:06 AM3/25/12
to openaust...@googlegroups.com

The speaker should be picked up now too, however I noticed that on the 28th Feb (and possibly other dates) it's assigning the speaker to the wrong person. Jenkins instead of Slipper. Not quite sure why that is yet, will keep looking. 

Henare Degan

unread,
Apr 12, 2012, 6:41:38 PM4/12/12
to openaust...@googlegroups.com
Hey Justin,

I might try to take a look at getting the parser running again this weekend. Did you make any more progress you want me to look at integrating?

Cheers,

Henare

On 25 March 2012 17:44, Justin Wells <def...@gmail.com> wrote:

The speaker should be picked up now too, however I noticed that on the 28th Feb (and possibly other dates) it's assigning the speaker to the wrong person. Jenkins instead of Slipper. Not quite sure why that is yet, will keep looking. 

--
You received this message because you are subscribed to the Google Groups "OpenAustralia Community" group.
To view this discussion on the web visit https://groups.google.com/d/msg/openaustralia-dev/-/ZFhSLfP9nBgJ.

To post to this group, send email to openaust...@googlegroups.com.
To unsubscribe from this group, send email to openaustralia-...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/openaustralia-dev?hl=en.

Justin Wells

unread,
Apr 13, 2012, 6:38:03 PM4/13/12
to openaust...@googlegroups.com
Hi Henare,

I added a small change to get the questions without notices working,
https://github.com/JWells/openaustralia-parser/commit/b35c3da6507e443720dfa1d666976d70eeb41d88

The issue with the speaker showing as the wrong person appears to be because the speaker hasn't been updated in the list of people in reprasentatives.csv. No commits on that as I'm still not all that sure how it works and what else it might effect.

I did take a brief look at the issue with the tabling of documents but wasn't able to find the cause. Haven't had time to get back for another look.

Regards,
Justin


On Friday, April 13, 2012 8:41:38 AM UTC+10, Henare Degan wrote:
Hey Justin,

I might try to take a look at getting the parser running again this weekend. Did you make any more progress you want me to look at integrating?

Cheers,

Henare

On 25 March 2012 17:44, Justin Wells <def...@gmail.com> wrote:

The speaker should be picked up now too, however I noticed that on the 28th Feb (and possibly other dates) it's assigning the speaker to the wrong person. Jenkins instead of Slipper. Not quite sure why that is yet, will keep looking. 

--
You received this message because you are subscribed to the Google Groups "OpenAustralia Community" group.
To view this discussion on the web visit https://groups.google.com/d/msg/openaustralia-dev/-/ZFhSLfP9nBgJ.

To post to this group, send email to openaustralia-dev@googlegroups.com.
To unsubscribe from this group, send email to openaustralia-dev+unsubscribe@googlegroups.com.

Henare Degan

unread,
Jun 24, 2012, 3:48:59 AM6/24/12
to openaust...@googlegroups.com
On 14 April 2012 08:38, Justin Wells <def...@gmail.com> wrote:
I added a small change to get the questions without notices working,
https://github.com/JWells/openaustralia-parser/commit/b35c3da6507e443720dfa1d666976d70eeb41d88

The issue with the speaker showing as the wrong person appears to be because the speaker hasn't been updated in the list of people in reprasentatives.csv. No commits on that as I'm still not all that sure how it works and what else it might effect.

I've merged these changes and updated the sitting calendar, updated the speaker, replaced a few Senators and loaded all of 2012 to date.

I've also added you to the list of contributors, thanks Justin! http://www.openaustralia.org/about/

Cheers,

Henare
Reply all
Reply to author
Forward
0 new messages