Send theyworkforyou-wales mailing list submissions to
theyworkfo...@lists.mysociety.org
To subscribe or unsubscribe via the World Wide Web, visit
https://secure.mysociety.org/admin/lists/mailman/listinfo/theyworkforyou-wales
or, via email, send a message with subject or body 'help' to
theyworkforyou...@lists.mysociety.org
You can reach the person managing the list at
theyworkforyo...@lists.mysociety.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of theyworkforyou-wales digest..."
Today's Topics:
1. Re: theyworkforyou-wales Digest, Vol 5, Issue 1 (K THOMPSON)
----------------------------------------------------------------------
Message: 1
Date: Fri, 25 Jun 2010 23:01:40 +0000 (GMT)
From: K THOMPSON <kthomp...@btinternet.com>
Subject: Re: [TheyWorkForYou-Wales] theyworkforyou-wales Digest, Vol
5, Issue 1
To: theyworkfo...@lists.mysociety.org
Message-ID: <509069....@web87111.mail.ird.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"
Thanks to all who responded; it sounds as thought the computer skills necessary to assist in this are a little beyond me! It seems the bi-lingual nature of proceedings in the Assembly make things difficult; a recent review ( http://news.bbc.co.uk/1/hi/wales/wales_politics/8692117.stm?) suggests that soon all spoken business will be in English. If there are any simple tasks to assist in 'theyworkforyou' Welsh Assembly - please let me know!
Caebrwyn
________________________________
From: "theyworkforyou...@lists.mysociety.org" <theyworkforyou...@lists.mysociety.org>
To: theyworkfo...@lists.mysociety.org
Sent: Friday, 25 June, 2010 12:00:03
Subject: theyworkforyou-wales Digest, Vol 5, Issue 1
Send theyworkforyou-wales mailing list submissions to
??? theyworkfo...@lists.mysociety.org
To subscribe or unsubscribe via the World Wide Web, visit
??? https://secure.mysociety.org/admin/lists/mailman/listinfo/theyworkforyou-wales
or, via email, send a message with subject or body 'help' to
??? theyworkforyou...@lists.mysociety.org
You can reach the person managing the list at
??? theyworkforyo...@lists.mysociety.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of theyworkforyou-wales digest..."
Today's Topics:
? 1. welsh assembly (K THOMPSON)
? 2. Re: welsh assembly (Duncan Parkes)
? 3. Re: welsh assembly (Sam Knight)
? 4. Re: welsh assembly (Duncan Parkes)
----------------------------------------------------------------------
Message: 1
Date: Thu, 24 Jun 2010 19:44:41 +0000 (GMT)
From: K THOMPSON <kthomp...@btinternet.com>
Subject: [TheyWorkForYou-Wales] welsh assembly
To: theyworkfo...@lists.mysociety.org
Message-ID: <988224....@web87101.mail.ird.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"
Hi
has anyone made any progress for Theyworkforyou?on the Welsh Assembly Government or National Assembly for Wales? How can contributions be made?
Caebrwyn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </admin/lists/pipermail/theyworkforyou-wales/attachments/20100624/cbf5c7e6/attachment-0001.htm>
------------------------------
Message: 2
Date: Thu, 24 Jun 2010 20:47:30 +0100
From: Duncan Parkes <duncan...@gmail.com>
Subject: Re: [TheyWorkForYou-Wales] welsh assembly
To: TheyWorkForYou for the Welsh Assembly
??? <theyworkfo...@lists.mysociety.org>
Message-ID:
??? <AANLkTilcR_F3erlOOZyBo...@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Hi Caebrwyn
> has anyone made any progress for Theyworkforyou?on the Welsh Assembly
> Government or National Assembly for Wales? How can contributions be made?
> Caebrwyn
I don't think so, I'm afraid. I started working on this over a year
ago when I was just a volunteer for mySociety rather than an employee,
but I just ran out of time. I will try to go through the code I have
written, tidy it up, and make it available so that anyone else who
wants to can help with it.
Sorry for the lack of progress!
Duncan
------------------------------
Message: 3
Date: Fri, 25 Jun 2010 00:08:37 +0100
From: Sam Knight <samkn...@gmail.com>
Subject: Re: [TheyWorkForYou-Wales] welsh assembly
To: TheyWorkForYou for the Welsh Assembly
??? <theyworkfo...@lists.mysociety.org>
Message-ID:
??? <AANLkTimVoxahfiu8qclc-...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
I've managed to make some progress but the way the speeches are laid out on
the site make it difficult to scrape. i.e it's bilingual and not always
displayed in the same order. If anyone has found a simple dataset in one
language I could manage it.
On Thu, Jun 24, 2010 at 8:47 PM, Duncan Parkes <duncan...@gmail.com>wrote:
> Hi Caebrwyn
>
> > has anyone made any progress for Theyworkforyou on the Welsh Assembly
> > Government or National Assembly for Wales? How can contributions be made?
> > Caebrwyn
>
> I don't think so, I'm afraid. I started working on this over a year
> ago when I was just a volunteer for mySociety rather than an employee,
> but I just ran out of time. I will try to go through the code I have
> written, tidy it up, and make it available so that anyone else who
> wants to can help with it.
>
> Sorry for the lack of progress!
>
> Duncan
>
> _______________________________________________
> theyworkforyou-wales mailing list
> theyworkfo...@lists.mysociety.org
>
> https://secure.mysociety.org/admin/lists/mailman/listinfo/theyworkforyou-wales
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </admin/lists/pipermail/theyworkforyou-wales/attachments/20100625/7f0ed966/attachment-0001.htm>
------------------------------
Message: 4
Date: Fri, 25 Jun 2010 00:15:17 +0100
From: Duncan Parkes <duncan...@gmail.com>
Subject: Re: [TheyWorkForYou-Wales] welsh assembly
To: TheyWorkForYou for the Welsh Assembly
??? <theyworkfo...@lists.mysociety.org>
Message-ID:
??? <AANLkTil3jvaJUGUwQebjE...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
The order is what was actually spoken on the left and the translation I'm
the right. Sometimes this changes mid speech. I've not yet found an example
mid paragraph.
My half built scraper works out which language by a technique roughly as
sophisticated as counting the els...
Cheers
Duncan
On 25 Jun 2010 00:09, "Sam Knight" <samkn...@gmail.com> wrote:
I've managed to make some progress but the way the speeches are laid out on
the site make it difficult to scrape. i.e it's bilingual and not always
displayed in the same order. If anyone has found a simple dataset in one
language I could manage it.
On Thu, Jun 24, 2010 at 8:47 PM, Duncan Parkes <duncan...@gmail.com>
wrote:
> >
> > Hi Caebrwyn
> >
> > > has anyone made any progress for Theyworkforyou on the Welsh Assembly
> > > Gove...
> _______________________________________________
> theyworkforyou-wales mailing list
> theyworkfo...@lists.mysociety.org
>
> https://secure.mysociety.org/admin/lists/mailman/listinfo/theyworkforyou-wales
>
_______________________________________________
theyworkforyou-wales mailing list
theyworkfo...@lists.mysociety.org
https://secure.mysociety.org/admin/lists/mailman/listinfo/theyworkforyou-wales
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </admin/lists/pipermail/theyworkforyou-wales/attachments/20100625/1257068b/attachment-0001.htm>
------------------------------
_______________________________________________
theyworkforyou-wales mailing list
theyworkfo...@lists.mysociety.org
https://secure.mysociety.org/admin/lists/mailman/listinfo/theyworkforyou-wales
End of theyworkforyou-wales Digest, Vol 5, Issue 1
**************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </admin/lists/pipermail/theyworkforyou-wales/attachments/20100625/99c433c6/attachment.html>
------------------------------
_______________________________________________
theyworkforyou-wales mailing list
theyworkfo...@lists.mysociety.org
https://secure.mysociety.org/admin/lists/mailman/listinfo/theyworkforyou-wales
End of theyworkforyou-wales Digest, Vol 5, Issue 2
**************************************************
Sam, why is language a challenge here? Can you not just scrape the blob
of text (ignoring language) - can the resulting interface just display
both side-by-side?
There are language detection libraries out there, e.g.
http://googlesystem.blogspot.com/2008/03/google-launched-another-ajax-api-this.html
About the provision of translation, Si�n is correct. The original speech
will always been transcribed and published. But there will only be
published translation for Welsh speeches into English (not vice-versa).
This decision will of course underprivilege people who wish to use
Welsh, here's more background for the curious:
http://hedyn.net/y_cofnod_llawn
Carl
I don't think language is a big problem. It was solvable before, and
now the translations only appear when the original was in Welsh, it's
obvious which is the English and which is the Welsh (though not
translating the English into Welsh makes producing a Welsh version of
the site impossible).
>> Should I follow this up?
Absolutely! I'm going to have another go at parsing it - I got quite a
long way before, but just ran out of spare time.
The problems I had before were mostly due to inconsistencies in the
markup rather than the welsh/english thing. I've just had a quick look
and I /think/ it's got better. We'll see!
Cheers,
Duncan
I meant to say that the most important thing is that they understand
what a good format for publishing it in is. It needs to be published
in such a way as it's easy for a computer to read and understand it
rather than just a human. I'm sure we have some guidelines about
somewhere for how to make it good.
Cheers,
Duncan
I do see a distinction between the interface and the content here.
There would be a warm welcome for a Welsh language interface, what do
you think? Could it be run in the same fashion as the Pledgebank interface?
http://www.cy.pledgebank.com/
(ah, this translation also needs updating...)
I'd like to volunteer to translate the interface for TWFY Welsh Assembly
into Welsh - if you'll take it. I'm sure there will be other people
interested in contributing too. I'm happy to coordinate it.
If it's anything like Pledgebank it would be done with gettext/po files
- correct? It's the same method I've used before for WordPress localisation.
I wanted to lodge this offer early. On the content side the recent
change in translation provision *may* turn out to be a momentary glitch
in service FWIW.
> The problems I had before were mostly due to inconsistencies in the
> markup rather than the welsh/english thing. I've just had a quick look
> and I /think/ it's got better. We'll see!
This is tremendously exciting, thanks Duncan!
Carl
The archive can be read here:
https://secure.mysociety.org/admin/lists/pipermail/theyworkforyou-wales/
A couple of old posts by me:
https://secure.mysociety.org/admin/lists/pipermail/theyworkforyou-wales/2009-July/000002.html
https://secure.mysociety.org/admin/lists/pipermail/theyworkforyou-wales/2009-December/000031.html
I would just use the existing XML data for the bodies TheyWorkForYou covers
as guidance. And ask Duncan to put the code he's already written somewhere :)
ATB,
Matthew
I was in contact with someone at the Assembly last year about a possible
project they were considering about outputting the Record of Proceedings
in some form of machine readable format. As far as I know, the project
was approved but there were lots of other unrelated things that also had
to be done, so I don't know where it's currently at. I'll write again
and ask next week.
For interest, here's basically what I wrote to them in July 2009 or so:
| Your project sounds like it would be of great value. Coincidentally,
| we've recently set up a mailing list for volunteers to discuss being
| able to get TheyWorkForYou for the Welsh Assembly up and running, and
| if the Assembly could provide machine readable data (such as XML, but
| the format isn't that important) in the first place, that would make
| it much easier than having to parse HTML or PDF and convert it into
| machine-readable data.
| Whatever information you have to be made available in a
| machine-readable format, I'm sure people would be glad if you provided
| it (you'd be better than the UK Parliament, the Scottish Parliament,
| or the Northern Ireland Assembly if you provided machine-readable data
| of your proceedings :) ). All I'd say at this stage is that wherever
| possible, you link things together with IDs. For example, give each
| Assembly Member an ID, and in the machine-readable data for a day's
| proceedings, mark each speech with that ID so anyone can pick out who said what.
| Also, give each speech its own ID so it can be referred to by other
| people (and perhaps by you when someone refers back to a previous
| speech or ansewr they gave, for example). I'm afraid I'm not an expert
| on Welsh procedure, but say you're discussing a Bill and someone
| proposes an Amendment, having each bit of the Bill marked up with some
| form of ID means the Amendment can "know" which bit of the Bill it is
| amending by referring to those IDs. Just having that sort of mindset
| when you approach any data will help.
|
| From TheyWorkForYou's point of view, having machine-readable data on
| Assembly Members (not just current, but historical including
| ministerial positions, party changes, etc.), proceedings, and things
| like that would be most useful, but I'm sure someone can do something
| with whatever you can produce :)
ATB,
Matthew
Not sure it was ever fully finished. If you wanted to help there too...
:-)
> I'd like to volunteer to translate the interface for TWFY Welsh Assembly
> into Welsh - if you'll take it. I'm sure there will be other people
> interested in contributing too. I'm happy to coordinate it.
Of course we'd take it. :) I would have to say that the codebase has
probably next to no i18n support, so it would be quite some effort just
to get to the point of having a .po file to translate - but it would
certainly be doable.
> If it's anything like Pledgebank it would be done with gettext/po files
> - correct? It's the same method I've used before for WordPress localisation.
Yes.
ATB,
Matthew
On 3 July 2010 23:25, Sam Knight <samkn...@gmail.com> wrote:
> As I am fairly new to this mailing list. Is there any standards that are
> required for the project. So far I'm using ruby outputting to an xml feed.
No standard exactly, but it would probably be best to do things in
Python as that's what all the other scrapers and infrastructure for
running them are in. Have you got much written already? I have some
code already written in Python for parsing Welsh Assembly pages,
though it's unfinished and needs work.
Are you happy to work in Python? If so, I suggest we work together on
this and host it temporarily on github. I'll try to tidy my stuff up a
bit and put it up on my github account. What I've got at the moment is
a start on parsing a page of The Record. I've not really done anything
on scraping the pages. Ideally we should try to do all this roughly
like the scrapers for the other parliaments.
Can we chat on XMPP? I'm duncan...@gmail.com
Cheers,
Duncan