Dear Domagoj,
On 28/10/16 02:54 PM, Domagoj Alagić wrote:
> I am rather confused by the size of the published trial dataset. Was it
> really planned to have a *single *positive and negative example per each
> of (subtask, pun class) pair? Task summary stated the following:
>
> Participants will be provided with two data sets:
> Data set 1: Homographic puns.
> The first data set will contain *several thousand* short contexts
> (jokes, slogans, aphorisms, etc.). In some of these contexts, a
> single word will be used as a homographic pun; in the rest, there
> will be no pun.
>
>
>
> Data set 2: Heterographic puns.
> The second data set will be similar to the first, except that the
> puns will be heterographic rather than homographic.
The parts of the page you quote refer to the test data, not the trial
data. The trial data is intended only to illustrate the file format,
and so contains only a couple examples. The test data, consisting of
several thousand examples, will be released once the shared task begins.
> On top of that, on the "Data and resources" tab it is stated:
>
> Machine-readable data for implementing Hempelmann's pun-based model
> of sound similarity will be made freely available to participants.
Whoops, looks like that is indeed an oversight. I'll see about getting
that data posted soon.
Regards,
Tristan
--
Tristan Miller, Research Scientist
Ubiquitous Knowledge Processing Lab (UKP-TUDA)
Department of Computer Science, Technische Universität Darmstadt
Tel:
+49 6151 162 5296 | Web:
https://www.ukp.tu-darmstadt.de/