results evaluation

1 view
Skip to first unread message

bar...@gmail.com

unread,
May 1, 2009, 10:58:40 AM5/1/09
to PAN'09 Competition on Plagiarism Detection
Hi
there are quite a lot of texts that don't have any plagiarized text in
them at all.
now, since:
# chars of correctly detected passages
recall = --------------------------------------
# chars of all plagiarized passages


# chars of correctly detected passages
precision = --------------------------------------
# chars of all reported passages


# detected passages that overlap with plagiarized
passages
granularity =
----------------------------------------------------------
# distinct plagiarized passages detected

then:
if i don't detect any plagiairism (which is great) i get:
recall = 0/0
precision = 0/0
granularity = 0/0

if i do declare some passage (i.e of size 1000 chars) i get:
recall = 0/0
precision = 0/1000
granularity = 0/1

A) how do you calculate the above zero division ? what are the results
for the above calculations ?
B) if you consider it (precision, recall and granularity) as 0, than
the average score on all the corpus would be seriously damaged.
C) if you consider it (precision, recall and granularity) as 1
(perfect score), than the detection is meaningless...

Martin Potthast

unread,
May 4, 2009, 4:09:34 AM5/4/09
to pan09-co...@googlegroups.com
Hi Barak,

sorry for my late answer!

First of all, precision and recall are normalized to the interval [0, 1] where 0 means worst quality and 1 best/perfect quality. Granularity is not normalized and it should be minimized, i.e., a granularity of 1 means best quality; the higher the granularity the worse the quality.

then:
if i don't detect any plagiairism (which is great)  i get:

This would not be great since there is definitely plagiarism to be found. So, if you report nothing you will receive an overall score of 0.
 
recall = 0/0
precision = 0/0
granularity = 0/0

The above explanation in mind, you see that recall and precision are the worst possible values. Granularity is in this case an extreme value, and for the calculation of the overall performance it will be set to 1, i.e., granularity = min(1, granularity). Although this means you achieved best granularity you will still get an overall 0 because of the bad precision and recall values.
 
if i do declare some passage (i.e of size 1000 chars) i get:
recall = 0/0
precision = 0/1000
granularity = 0/1

I presume that with "some passage" you refer to a passage which has not actually been plagiarized, i.e., a false-positive. The overall performance would therefore be the same as above, 0.

Anyway, note that you miscalculated the granularity: Since you reported a 1000 char-passage which was not actually plagiarized the denominator of the granularity would be 0, too.
 
I hope I could clarify the evaluation measures, but if not, don't hesitate to get back to me.

Best,
Martin

barak hagbi

unread,
May 4, 2009, 4:52:41 AM5/4/09
to pan09-co...@googlegroups.com
Hi Martin !

1. what do you mean by "there is definitely plagiarism to be found" ? in the intrinsic corpus you published there are 1545 out of 3091 texts that don't contain any plagiarism at all. would it be different in the "real competition corpus" ?

2. i think i totally misunderstood the granularity measure. please explain more. for example if a text contain 20 plagiarized passages, if my algorithm finds 10 passages which are plagiarized, and let's say that 4 of them actually overlap (in part) the true plagiarized passages. than wouldn't granularity be 4/10 = 0.4 ? how  can granularity be more than 1 ? 

3. what do you mean by  "a granularity of 1 means best quality; the higher the granularity the worse the quality" ? doesn't granularity between [0,1] ?

Martin Potthast

unread,
May 4, 2009, 5:25:50 AM5/4/09
to pan09-co...@googlegroups.com
Hi Barak,

1. what do you mean by "there is definitely plagiarism to be found" ? in the intrinsic corpus you published there are 1545 out of 3091 texts that don't contain any plagiarism at all. would it be different in the "real competition corpus" ?

There is definitely plagiarism to be found in the whole corpus, but there are of course suspicious documents which do not contain plagiarism.
And now I see your point: If you report nothing on a document which contains no plagiarism then you did nothing wrong and therefore your performance was perfect rather than flawed. Thanks for pointing this out, we'll find a solution for this.
 
2. i think i totally misunderstood the granularity measure. please explain more. for example if a text contain 20 plagiarized passages, if my algorithm finds 10 passages which are plagiarized, and let's say that 4 of them actually overlap (in part) the true plagiarized passages. than wouldn't granularity be 4/10 = 0.4 ? how  can granularity be more than 1 ? 

Here are some examples from your setting:
If each of the 4 passages overlaps with a different passage out of the 10: 4/4 = 1, i.e. best granularity.
If 2 of the 4 passages overlap with the same passage out of the 10: 4/3 = 1.33
If 3 of the 4 passages overlap with the same passage out of the 10: 4/2 = 2
If 4 passages overlap with the same passage out of the 10: 4/1 = 4

So, the denominator of the granularity refers to the distinct number of plagiarized passages which you hit, not the number of plagiarized passages which are actually there. The numerator on the other hand refers to the non-distinct number of plagiarized passages which have been found by you.

Granularity is not about the coverage of plagiarism, but how report it, i.e., whether a single plagiarized passage is reported in small pieces, or not.
 
3. what do you mean by  "a granularity of 1 means best quality; the higher the granularity the worse the quality" ? doesn't granularity between [0,1] ?

No, as you can see from the above, the minimum value is 1 and there is no definite upper bound. In the extreme case, if you report a single plagiarized passage P word for word you'd get a granularity of |P|/1 = |P|, where |P| is the number of words of the passage.

Best,
Martin



--
Martin Potthast
Bauhaus-Universität Weimar
www.webis.de - netspeak.webis.de

If you do things right, people won't be sure you've done anything at all.

barak hagbi

unread,
May 7, 2009, 10:26:36 AM5/7/09
to pan09-co...@googlegroups.com
thanks for the clarification!
but i still have some questions about the calculation of the corpus average:
1. what are the precision/recall/granularity scores for a document that is not plagiarized at all, and i correctly reported it as such.
2. if a long document (50000 chars long) has no plagiarism, but i wrongfully reported that it has 10 chars of plagiarism - my precision/recall/granularity scores would all be 0 for this document, even though my mistake was rather small with relation to the document length. this score would affect the corpus average severely! taking into account that about 505 of the documents doesn't have plagiarized sections in them at all, that could be real bias to the average.

the overall average is important, since you evaluate our results by it. so it is important to understand exactly how you calculate it.

Martin Potthast

unread,
May 11, 2009, 5:09:38 AM5/11/09
to pan09-co...@googlegroups.com
Hi Barak,

thank you for mail, and for pointing out these problems. I really appreciate this.

On Thu, May 7, 2009 at 4:26 PM, barak hagbi <bar...@gmail.com> wrote:
thanks for the clarification!
but i still have some questions about the calculation of the corpus average:
1. what are the precision/recall/granularity scores for a document that is not plagiarized at all, and i correctly reported it as such.
2. if a long document (50000 chars long) has no plagiarism, but i wrongfully reported that it has 10 chars of plagiarism - my precision/recall/granularity scores would all be 0 for this document, even though my mistake was rather small with relation to the document length. this score would affect the corpus average severely! taking into account that about 505 of the documents doesn't have plagiarized sections in them at all, that could be real bias to the average.

I think, the new measures cover for both of these problems. They measure precision and recall at the case-level now so that 1 won't be a problem. With respect to 2 a false alarm will not hurt you that much anymore.

the overall average is important, since you evaluate our results by it. so it is important to understand exactly how you calculate it.

It was given in the rules, but there is no good reason not to show the formula next to the measures. The formula for the overall score is now given there as well.
Reply all
Reply to author
Forward
0 new messages