Simple text edition task

7 views
Skip to first unread message

wajdi zaghouani

unread,
Feb 19, 2010, 11:30:02 AM2/19/10
to NAACL 2010 Mechanical Turk Workshop
Hi all,

I have a very simple task that consists of text Editing /Correction

So the goal of my hits is to edit 500 lines in a file in the following
format

23|3|4|ٱللَّغْوِ|the vain talk|GEN|Genitive
23|3|5|ٱللَّوِغْوِ|the vain talk|ACC|Accusative

The Worker is only expected to edit/corrected the last 2 fields, which
are in this case GEN|Genitive or ACC|Accusative.

I did a simple Hit and I asked them to copy paste the whole 500 lines
in the comments box and include the answer in the comment box but when
I tried to look at the results I found that all the lines get linked
togeteher like this:

23|3|4|ٱللَّغْوِ|the vain talk|GEN|Genitive 23|3|5|ٱللَّوِغْوِ|the
vain talk|ACC|Accusative.......


is there any easy way to give the annotator a text file for edition
and get it back without format alteration ?


Thank you so much for your help.


Wajdi Zaghouani

Chris Callison-Burch

unread,
Feb 19, 2010, 12:00:20 PM2/19/10
to naacl-m...@googlegroups.com
I would recommend making each of your "|" delineated fields into a field in your CSV input file.

Your first row would be the names of the fields:
sent_id,start_index,end_index,foriegn_text,english_text,case,case_name

Then you put your data into those fields, instead of showing it unstructured to the Turkers. Specifically you would reformat your data to be

23,3,4,ٱللَّغْوِ,the vain talk,GEN,Genitive
23,3,5,ٱللَّوِغْوِ,the vain talk,ACC,Accusative

careful to escape any commas in your text with their HTML equivalents. Then you can display the text to turkers as

${foriegn_text}, ${english_text}, and ${case_name}.

So that they don't even have to see your sentence index information (which obviously doesn't mean much to them and won't help them solve your real task of identifying the case). I'd recommend giving them a drop-down menu of case names, with your prediction selected. The drop down would mean that they don't have to type in the name of the case and you'll never get a spelling error. Don't make them correct both GEN and Genitive to ACC and Accusitive. Just do one, and then correct the other yourself with a perl script.

The key here is to get Turkers to do the thinking (identifying case) and not to bog them down with things that will distract them (like data formatting issues). When their answers are returned to you, all information will come in a CSV file, which you can easily convert back into your own internal format.


--Chris

wajdi zaghouani

unread,
Feb 19, 2010, 12:46:42 PM2/19/10
to NAACL 2010 Mechanical Turk Workshop
Thanks Chris,

Yes That would be the best option I guess. But I am not sure how to
implement that throught the limited web based template offered when I
want to create the hit.

I think that will need the installation of AWS ?

--Wajdi

Chris Callison-Burch

unread,
Feb 19, 2010, 12:55:11 PM2/19/10
to naacl-m...@googlegroups.com
You can do it all through the web based template. You can edit the HTML directly, add HTML form elements for collecting data, and even add javascript if you need it (you probably don't for this project). The important things are:

1) Preprocessing your data file appropriate so it has the correct columns that you need.

2) Writing some HTML forms that collect the data that you need (the returned variables with be the form elements' name= or id= fields), and that display the data that you want Turkers to inspect (with ${columnname} variables being the way that you access that.

3) Reconstructing the information that you need using the returned results CSV file (note: all your input CSV files get returned in the results file, and you can include columns that the turkers never see, which can ease the task of reconstructing your data).

Really #2 is the only one that uses the web interface, but you need to be aware of #1 and #3 when you're designing things.

--Chris

Chris Callison-Burch

unread,
Feb 19, 2010, 4:50:53 PM2/19/10
to naacl-m...@googlegroups.com
Another thing: CrowdFlower makes it a lot easier to design the HTML forms. You might consider using their services. --C
Reply all
Reply to author
Forward
0 new messages