Trial data

Domagoj Alagić

unread,

Oct 28, 2016, 8:54:36 AM10/28/16

to SemEval-2017 Task 7: Detection and Interpretation of English Puns

Hello there!

I am rather confused by the size of the published trial dataset. Was it really planned to have a single positive and negative example per each of (subtask, pun class) pair? Task summary stated the following:

Participants will be provided with two data sets:
Data set 1: Homographic puns.
The first data set will contain several thousand short contexts (jokes, slogans, aphorisms, etc.). In some of these contexts, a single word will be used as a homographic pun; in the rest, there will be no pun.

Data set 2: Heterographic puns.
The second data set will be similar to the first, except that the puns will be heterographic rather than homographic.

On top of that, on the "Data and resources" tab it is stated:

Machine-readable data for implementing Hempelmann's pun-based model of sound similarity will be made freely available to participants.

Am I missing something or these datasets are yet to be provided? Or should we get it elsewhere? Sorry if I am asking dumb questions :)

Thanks!

Best,

Domagoj Alagić

Tristan Miller

unread,

Oct 28, 2016, 9:12:37 AM10/28/16

to semeva...@googlegroups.com

Dear Domagoj,

On 28/10/16 02:54 PM, Domagoj Alagić wrote:
> I am rather confused by the size of the published trial dataset. Was it

> really planned to have a *single *positive and negative example per each

> of (subtask, pun class) pair? Task summary stated the following:
>
> Participants will be provided with two data sets:
> Data set 1: Homographic puns.

> The first data set will contain *several thousand* short contexts

> (jokes, slogans, aphorisms, etc.). In some of these contexts, a
> single word will be used as a homographic pun; in the rest, there
> will be no pun.
>
>
>
> Data set 2: Heterographic puns.
> The second data set will be similar to the first, except that the
> puns will be heterographic rather than homographic.

The parts of the page you quote refer to the test data, not the trial
data. The trial data is intended only to illustrate the file format,
and so contains only a couple examples. The test data, consisting of
several thousand examples, will be released once the shared task begins.

> On top of that, on the "Data and resources" tab it is stated:
>
> Machine-readable data for implementing Hempelmann's pun-based model
> of sound similarity will be made freely available to participants.

Whoops, looks like that is indeed an oversight. I'll see about getting
that data posted soon.

Regards,
Tristan

--
Tristan Miller, Research Scientist
Ubiquitous Knowledge Processing Lab (UKP-TUDA)
Department of Computer Science, Technische Universität Darmstadt
Tel: +49 6151 162 5296 | Web: https://www.ukp.tu-darmstadt.de/

signature.asc

Domagoj Alagić

unread,

Oct 28, 2016, 9:24:24 AM10/28/16

to SemEval-2017 Task 7: Detection and Interpretation of English Puns

Dear Tristan,

thank you for your fast reply!

Just to make sure you understood me correctly: so, there will be no development data of any kind for this task, i.e., we have to wait for the evaluation period to get our hands on more data?

Best,

Domagoj

Tristan Miller

unread,

Oct 28, 2016, 9:56:20 AM10/28/16

to semeva...@googlegroups.com

Dear Domagoj,

On 28/10/16 03:24 PM, Domagoj Alagić wrote:
> thank you for your fast reply!
>

> Just to make sure you understood me correctly: so, there will be *no
> development data* of any kind for this task, i.e., we have to wait for

> the evaluation period to get our hands on more data?

There will be no "development data" other than the trial data already
posted. In particular, there won't be any training data, due to the
difficulty of sourcing a sufficient number and variety of puns. We
anticipate that knowledge-based approaches might work best for our
tasks, though we would of course be delighted to be proved wrong.

signature.asc

Domagoj Alagić

unread,

Nov 3, 2016, 9:16:42 AM11/3/16

to SemEval-2017 Task 7: Detection and Interpretation of English Puns

Dear Tristan,

is it allowed to manually create a small development set? It would be used for picking the best unsupervised approach for the task. Sorry if I missed that information on the task webpages.

Best,

Domagoj

Tristan Miller

unread,

Nov 3, 2016, 9:44:39 AM11/3/16

to semeva...@googlegroups.com

Greetings.

On 03/11/16 02:16 PM, Domagoj Alagić wrote:
> is it allowed to manually create a small development set? It would be
> used for picking the best unsupervised approach for the task. Sorry if I
> missed that information on the task webpages.

Sure, you can do whatever you want as long as you document it in your
paper submission.

signature.asc

Reply all

Reply to author

Forward