GWT-based word alignment APP

11 views
Skip to first unread message

Edward Gao

unread,
Feb 22, 2010, 10:00:56 AM2/22/10
to NAACL 2010 Mechanical Turk Workshop
Hi All,

Here is the word alignment interface I used in collecting word
alignment data

http://alt-aligner.appspot.com


It takes five URL parameters:

srcsent, tgtsent : source and target sentences, space separated
string, UTF-8, URL encoded.
idxonsrc : "true" or "false", whether you want people to label source
sentence or target sentence
idx : a zero-based sequence of integer, which words do you want users
to label, underscore separated. e.g. 0_1_3_5 means you want user to
label the first, second, fourth and sixth word.
assignmentId : used by mturk, automatically appended.

A sample URL:

http://alt-aligner.appspot.com/?srcsent=%E8%BF%99%E4%BA%9B+%E8%A2%AB+%E4%BA%BA%E4%BB%AC+%E6%89%80+%E7%86%9F%E7%9F%A5+%E7%9A%84+%E5%8D%A1%E9%80%9A+%E5%BD%A2%E8%B1%A1+%E4%BB%A5+%E5%85%B6+%E7%8B%AC%E6%9C%89+%E7%9A%84+%E9%AD%85%E5%8A%9B+%E5%86%8D+%E4%B8%80+%E6%AC%A1+%E8%AE%A9+%E4%B8%96%E4%BA%BA+%E7%9A%84+%E7%9B%AE%E5%85%89+%E8%81%9A%E9%9B%86+%E5%88%B0+%E9%A6%99%E6%B8%AF+,&tgtsent=With+their+unique+charm+,+these+well+-+known+cartoon+images+once+again+caused+Hong+Kong+to+be+a+focus+of+worldwide+attention+.&idxonsrc=true&idx=0_1_2_3_4_5_6_7_8_9_10_11_12_13_14_15_17_16_19_18_21_20_23_22_&assignmentId=ASSIGNMENT_ID_NOT_AVAILABLE&hitId=1KBVYL71878F2DT1VT6YTS1CSVL692

When submit is clicked, it will post back result like this:

1-5 2-4 3-3 4-6 5-7 6-7 7-0 8-10 9-11 9-12 9-13 9-14 9-15 9-16 10-0
11-8

The first index is source, the second is target, and all 1-based, zero
means aligned to null-word.

Hope it helps.

Qin

Edward Gao

unread,
Feb 23, 2010, 11:53:50 AM2/23/10
to NAACL 2010 Mechanical Turk Workshop
I shared the GWT source code on google code

http://code.google.com/p/alt-aligner/

The whole eclipse project is in the svn trunk, and you can directly
publish it to appspot with GWT eclipse plugin.

On Feb 22, 10:00 am, Edward Gao <q...@cs.cmu.edu> wrote:
> Hi All,
>
> Here is the word alignment interface I used in collecting word
> alignment data
>
> http://alt-aligner.appspot.com
>
> It takes five URL parameters:
>
> srcsent, tgtsent : source and target sentences, space separated
> string, UTF-8, URL encoded.
> idxonsrc : "true" or "false", whether you want people to label source
> sentence or target sentence
> idx : a zero-based sequence of integer, which words do you want users
> to label, underscore separated. e.g. 0_1_3_5 means you want user to
> label the first, second, fourth and sixth word.
> assignmentId : used by mturk, automatically appended.
>
> A sample URL:
>

> http://alt-aligner.appspot.com/?srcsent=%E8%BF%99%E4%BA%9B+%E8%A2%AB+...

Reply all
Reply to author
Forward
0 new messages