Fwd: Discriminating between useful and spurious Turk labels

Chris Callison-Burch

unread,

Feb 20, 2010, 9:52:24 AM2/20/10

to naacl-m...@googlegroups.com, Delip Rao

Here's a pointer from one of the Hopkins on some open source code for trying to estimate the quality of Turkers' labels. --C

Begin forwarded message:

> From: Delip Rao <deli...@gmail.com>
> Date: February 20, 2010 9:46:30 AM EST
> To: Chris Callison-Burch <c...@cs.jhu.edu>
> Subject: Discriminating between useful and spurious Turk labels
>
> http://code.google.com/p/get-another-label/

Vamshi Ambati

unread,

Feb 20, 2010, 3:33:57 PM2/20/10

to naacl-m...@googlegroups.com

Has anyone had any luck using CrowdFlower ? I find it quite
confusing to create and post tasks.

My gripes -
- The form creation without an 'edit html' feature isn't that useful
after all. Not all tasks fit into the 'judgement' style. Or may be I
am not able to figure out a way of doing so.
- The interface calls every task a 'judgment' and this is also
confusing to me as I am posting 'translation tasks' which are clearly
not judgments.
- The 'calibration' feature makes it immensely difficult to post
tasks. I am ordering 'one unit' at a time, and each unit takes less
than 20 seconds to complete. But the interface for some reason thinks
that I am posting 10 units at a time and scolds me for being 'cheap'.
This is even after going through the 'Advanced settings' in the
calibration to explicitly say that it is '1 unit' and not 10.

The list goes on. But I am curious to know what others think?

Thanks
-Vamshi

WangRui

unread,

Feb 22, 2010, 10:57:15 PM2/22/10

to vam...@cs.cmu.edu, naacl-m...@googlegroups.com

Hi Ves,

Sorry for the late reply. I hope it's still useful for you.

> - The form creation without an 'edit html' feature isn't that useful
> after all. Not all tasks fit into the 'judgement' style. Or may be I
> am not able to figure out a way of doing so.

In fact, it does support html tags (e.g. ,,, ,<li>,etc.), although sometimes you need to refresh/go back several times to make it "parsed". Also, I just noticed that there is a button on the top left "Edit in CML Editor" (a newly added feature?) which allows you to edit the source code.

> - The interface calls every task a 'judgment' and this is also
> confusing to me as I am posting 'translation tasks' which are clearly
> not judgments.

Hehe, I agree.

> - The 'calibration' feature makes it immensely difficult to post
> tasks. I am ordering 'one unit' at a time, and each unit takes less
> than 20 seconds to complete. But the interface for some reason thinks
> that I am posting 10 units at a time and scolds me for being 'cheap'.
> This is even after going through the 'Advanced settings' in the
> calibration to explicitly say that it is '1 unit' and not 10.

I think "calibration" is just a tool to help you to estimate the cost or assign your "proper" salary to the workers. You may ignore the results and manually set it to 10, and just continue.

I think CrowdFlower has another powerful feature of controlling the quality of the collected data, but it might not be very effective for people collecting texts. I'm still exploring these functions now.

Best,

Rui

Hotmail��Microsoft ǿ��ʼ��Ϊ��ṩ�ɿ��ĵ��ʼ��ϡ� ��ע�ᡣ

Chris Van Pelt

unread,

Feb 23, 2010, 2:04:51 AM2/23/10

to naacl-m...@googlegroups.com, vam...@cs.cmu.edu

Hey everyone,

My name is Chris Van Pelt and I created CrowdFlower. I would be
thrilled if any of you are able to leverage our platform. I'll
address the existing questions inline below:

2010/2/22 WangRui <rw...@coli.uni-sb.de>:

> Hi Ves,
>
> Sorry for the late reply. I hope it's still useful for you.
>
>> - The form creation without an 'edit html' feature isn't that useful
>> after all. Not all tasks fit into the 'judgement' style. Or may be I
>> am not able to figure out a way of doing so.
>
> In fact, it does support html tags (e.g. ,,, ,<li>,etc.),
> although sometimes you need to refresh/go back several times to make it
> "parsed". Also, I just noticed that there is a button on the top left "Edit
> in CML Editor" (a newly added feature?) which allows you to edit the source
> code.
>

We just pushed the "Edit in CML Editor" button today. This gives you
a syntax highlighted HTML / CML editor for your task. Any valid HTML
and all of our custom CML tags are allowed here. Javascript is also
allowed, we provide the MooTools framework by default. You can learn
more about CML here:

http://crowdflower.com/docs/cml

>> - The interface calls every task a 'judgment' and this is also
>> confusing to me as I am posting 'translation tasks' which are clearly
>> not judgments.
>
> Hehe, I agree.

A judgment on CrowdFlower actually refers to an individual response
from a worker. Every CrowdFlower job starts with a set of "Units".
"Units" are rows from the spreadsheet or feed you use to seed your
task. CrowdFlower combines multiple Units into a single HIT on turk,
this is defined with the "units_per_assignment" parameter in the
advanced setting of the calibration page. Each Unit can have multiple
questions associated with it. Questions are usually simple HTML form
elements generated by the CML helper tags. A judgment is an
individual workers response to a set of questions for a given Unit. A
single HIT on turk will generate multiple judgments from a single
worker. The CML for a translation task might look like:

{{something_to_translate}}
<cml:textarea label="Translation to spanish" />

>
>> - The 'calibration' feature makes it immensely difficult to post
>> tasks. I am ordering 'one unit' at a time, and each unit takes less
>> than 20 seconds to complete. But the interface for some reason thinks
>> that I am posting 10 units at a time and scolds me for being 'cheap'.
>> This is even after going through the 'Advanced settings' in the
>> calibration to explicitly say that it is '1 unit' and not 10.
>
> I think "calibration" is just a tool to help you to estimate the cost or
> assign your "proper" salary to the workers. You may ignore the results and
> manually set it to 10, and just continue.

You can explicitly set the units_per_assignment from the advanced
settings which it looks like you have. The behavior you saw may have
occurred after returning to the calibration after a job has already
been calibrated? Currently the numbers are reset each time the
calibration page is accessed.

CrowdFlower is most powerful when gold is defined. Gold is simply
predefined correct answers which are intermixed with the unknown
questions. Gold is best used in multiple choice scenarios. In the
case of translation you may be able to add an objective multiple
choice question alongside the actual translation. At the very least a
second approval step could be used to ensure translation quality.
When using gold, currently you must have at least 2
units_per_assignment. This is so that the known answered can be mixed
in efficiently.

>
> I think CrowdFlower has another powerful feature of controlling the quality
> of the collected data, but it might not be very effective for people
> collecting texts. I'm still exploring these functions now.
>

The primary advantage of CrowdFlower is the ability to define a
quality control / training set of questions. Another big advantage is
access to labor pools outside of mechanical turk. Please don't
hesitate to reach out to me with any questions about the platform.

Best,

Chris

> Hotmail：Microsoft 强大的垃圾邮件防护技术，为您提供可靠的电子邮件保障。立即注册。

Reply all

Reply to author

Forward