WSI Evaluator Tool Document Ordering (UPDATED)

Michael

unread,

Feb 17, 2013, 8:08:31 AM2/17/13

to semeval-2013-ws...@googlegroups.com

Hey there,

I got a little bit confused by the WSI Java evaluation tool and what implicit information my input actually contains. I provide the following clustering:

subTopicID resultID

1.1 1.1

1.4 1.2

1.5 1.3

1.4 1.4

1.1 1.5

1.7 1.6

1.1 1.7

1.6 1.8

1.6 1.9

1.2 1.10

1.4 1.11

1.5 1.12

1.1 1.13

1.6 1.14

1.4 1.15

1.10 1.16

... ....

When I swap line two and three (1.1 1.1 with 1.4 1.2), but do not change anything else, the WSI evaluation tool output changes (see below "Resulting list"), as cluster 1 and 2 swap lables, and thus result in a different 'resulting list'.

Why is that the case? As I understand, the line position of a cluster-document-mapping in the file has only an influence on the position of a document within each cluster.

And: How can the information about the order of the clusters themselves be encoded? When swapping cluster labels (eg. making 1.1 to 1.10) , it seems to have no effect on the resulting document order.

Thanks a lot, Michael

============= Query 134 : "aida" ==============

================== 10 snippet clusters: ====================

The cluster 1 contains the snippets: [1, 5, 7, 13, 25, 26, 28, 31, 32, 38, 40, 46, 48, 53, 55, 58, 59, 70, 71, 74, 76, 77, 85, 86, 91, 92, 98]

The cluster 3 contains the snippets: [3, 12, 23, 34, 36, 41, 47, 50, 51, 54, 57, 62, 68, 75, 81, 83, 94]

The cluster 2 contains the snippets: [2, 4, 11, 15, 27, 29, 33, 42, 52, 60, 61, 67, 69, 82, 88, 99]

The cluster 6 contains the snippets: [10, 24, 56, 63, 78, 84, 90, 93, 95, 100]

The cluster 5 contains the snippets: [8, 9, 14, 17, 21, 39, 43, 66, 87]

The cluster 7 contains the snippets: [16, 18, 35, 45, 64, 65]

The cluster 4 contains the snippets: [6, 20, 79, 80]

The cluster 8 contains the snippets: [19, 30, 37, 49]

The cluster 10 contains the snippets: [72, 89, 96, 97]

The cluster 9 contains the snippets: [22, 44, 73]

[ INFO ] WSIEvaluator -

================== Starting Evaluation ==================

Resulting list: [1, 2, 3, 6, 8, 10, 16, 19, 22, 72, 5, 4, 12, 20, 9, 24, 18, 30, 44, 89, 7, 11, 23, 79, 14, 56, 35, 37, 73, 96, 13, 15, 34, 80, 17, 63, 45, 49, 97, 25, 27, 36, 21, 78, 64, 26, 29, 41, 39, 84, 65, 28, 33, 47, 43, 90, 31, 42, 50, 66, 93, 32, 52, 51, 87, 95, 38, 60, 54, 100, 40, 61, 57, 46, 67, 62, 48, 69, 68, 53, 82, 75, 55, 88, 81, 58, 99, 83, 59, 94, 70, 71, 74, 76, 77, 85, 86, 91, 92, 98]

============= Query 134 : "aida" ==============

================== 10 snippet clusters: ====================

The cluster 2 contains the snippets: [1, 5, 7, 13, 25, 26, 28, 31, 32, 38, 40, 46, 48, 53, 55, 58, 59, 70, 71, 74, 76, 77, 85, 86, 91, 92, 98]

The cluster 3 contains the snippets: [3, 12, 23, 34, 36, 41, 47, 50, 51, 54, 57, 62, 68, 75, 81, 83, 94]

The cluster 1 contains the snippets: [2, 4, 11, 15, 27, 29, 33, 42, 52, 60, 61, 67, 69, 82, 88, 99]

The cluster 6 contains the snippets: [10, 24, 56, 63, 78, 84, 90, 93, 95, 100]

The cluster 5 contains the snippets: [8, 9, 14, 17, 21, 39, 43, 66, 87]

The cluster 7 contains the snippets: [16, 18, 35, 45, 64, 65]

The cluster 4 contains the snippets: [6, 20, 79, 80]

The cluster 8 contains the snippets: [19, 30, 37, 49]

The cluster 10 contains the snippets: [72, 89, 96, 97]

The cluster 9 contains the snippets: [22, 44, 73]

[ INFO ] WSIEvaluator -

================== Starting Evaluation ==================

Resulting list: [2, 1, 3, 6, 8, 10, 16, 19, 22, 72, 4, 5, 12, 20, 9, 24, 18, 30, 44, 89, 11, 7, 23, 79, 14, 56, 35, 37, 73, 96, 15, 13, 34, 80, 17, 63, 45, 49, 97, 27, 25, 36, 21, 78, 64, 29, 26, 41, 39, 84, 65, 33, 28, 47, 43, 90, 42, 31, 50, 66, 93, 52, 32, 51, 87, 95, 60, 38, 54, 100, 61, 40, 57, 67, 46, 62, 69, 48, 68, 82, 53, 75, 88, 55, 81, 99, 58, 83, 59, 94, 70, 71, 74, 76, 77, 85, 86, 91, 92, 98]

Ted Pedersen

unread,

Feb 17, 2013, 10:00:26 AM2/17/13

to semeval-2013-ws...@googlegroups.com

Hi Michael,

Thanks for getting some discussion going! To be clear I'm a
participant who is running somewhat behind, so I'm not a reliable
source of information here. :)

But, I'm wondering if the numeric value assigned to a cluster really
has any significance? Does it matter if a cluster is called 1 or 2 as
long as the contents is the same? Does your overall "score" change as
a result of these different "labels"?

I have been meaning to use the evaluation program a bit, so thanks for
spurring me along a little with this.

Cordially,
Ted

> --
> You received this message because you are subscribed to the Google Groups
> "Semeval-2013 Task 11: WSI & Disambiguation within An Application" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to semeval-2013-wsi-in-a...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Radu ION

unread,

Feb 18, 2013, 1:58:21 AM2/18/13

to semeval-2013-ws...@googlegroups.com

Hello everybody,

To answer Ted's question: the integer ID assigned to a cluster has no significance: you could name your first cluster 1, 56 or 98 or whatever. What really matters is what hits are put in a cluster because the F1, ARI and Jaccard will search for a Gold Standard cluster that has the highest overlap with your cluster. There is a paper cited on the task's page which explains the evaluation measures.

On an unrelated topic, I have noticed that the Gold Standard annotation of clusters does not contain all the 100 hits of a query (at least for the trial data).

The next question is then: how the remaining hits are clustered? I would guess that one hit per cluster but then again, if this were true, all the scores (F1, ARI, Jaccard) would be directly influenced by this decision (indeed it's the case the our application does not cluster unrelated hits but this does not really help in the evaluation ;) ).

We made this simple test on trial data: we removed from the "results.txt" file all the hits that were not in the "STRel.txt" (the Gold Standard) and made a clustering run. We were surprised to find out that all the scores went up by a very large margin (30-40%). It follows that the decision of what hits to get into the Gold Standard will have a significant effect on the evaluation scores.

Proposal: would it be possible to represent all the hits in the Gold Standard of the real test set such that unrelated hits are put in their own clusters (one hit per cluster)? This way you could also measure if the candidate clustering application does not jump ahead and construct forced clusters.

Thank you,

Radu

Michael

unread,

Feb 18, 2013, 11:37:10 AM2/18/13

to semeval-2013-ws...@googlegroups.com

Hey everybody,

@Radu: Thanks for confirming the observation by Ted and me, that the cluster id label has no influence on the evaluation tool. And I agree that this makes totally sense for the cluster quality measures F1, ARI, etc,.

Nevertheless, as far as I understand, when it comes to the subtopic related measures (s-recall/s-prec), the cluster does matter. Because it makes a difference in what order the clusters get flattened.

For the within-cluster order, the evaluation tool respects the given snippet order, as it reads the input file top-down line-by-line. Bur for the cluster order I can't figure out how it works.

Cheers, Michael

> email to semeval-2013-wsi-in-application+unsubscribe@googlegroups.com.

> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
You received this message because you are subscribed to the Google Groups "Semeval-2013 Task 11: WSI & Disambiguation within An Application" group.

To unsubscribe from this group and stop receiving emails from it, send an email to semeval-2013-wsi-in-application+unsubscribe@googlegroups.com.

Daniele Vannella

unread,

Feb 21, 2013, 8:16:28 AM2/21/13

to semeval-2013-ws...@googlegroups.com

Hi Michael

For the within-cluster order, the evaluation tool respects the given snippet order, as it reads the input file top-down line-by-line. Bur for the cluster order I can't figure out how it works.

The order of the clusters depends only on the order of the file, and not on the labels.

for example:

subTopicID resultID
1.4 1.2
1.1 1.3
1.4 1.4
1.1 1.5

in this case the order of the cluster is:

cluster 1 [2, 4]
cluster 2 [3, 5]

where 1.4 and 1.1 are the labels of the clusters.

if you invert line 1 with 2, the order clusters will be changed.

Best,
Daniele

Date: Mon, 18 Feb 2013 08:37:10 -0800
From: mic...@informatik.uni-mannheim.de
To: semeval-2013-ws...@googlegroups.com
Subject: Re: WSI Evaluator Tool Document Ordering (UPDATED)

To unsubscribe from this group and stop receiving emails from it, send an email to semeval-2013-wsi-in-a...@googlegroups.com.

Roberto Navigli

unread,

Feb 23, 2013, 3:26:26 AM2/23/13

to semeval-2013-ws...@googlegroups.com

Hi Michael,

you are absolutely right. This is already remarked in the task information page:

"In order to perform the flattening procedure, WSD/WSI must provide snippets in each cluster already sorted by the confidence according to which the snippet belongs to the cluster, and must rank clusters according to their diversity."

http://www.cs.york.ac.uk/semeval-2013/task11/index.php?id=task-description

All the best,
Roberto Navigli

Reply all

Reply to author

Forward