Evaluation Metrics

19 views
Skip to first unread message

Isabel Segura

unread,
Feb 20, 2013, 5:19:41 AM2/20/13
to ddiextract...@googlegroups.com
Dear participants, 

You can find details on evaluation metrics for our tasks 9.1 and 9.2 here: 


Best, 

Isabel

Kevin B. Cohen

unread,
Feb 20, 2013, 12:12:39 PM2/20/13
to Isabel Segura, ddiextract...@googlegroups.com
Dear Isabel,

I have a question about Section 2 in the document on metrics for Task
9.2. In Section 2 it says:

"...only relations are evaluated."

Does this mean that only lines where the prediction value is 1 are
considered in calculating scores?

Also, the document suggests that an interaction is correct only if
both the prediction (1 or 0) and the type are correct. Am I
interpreting the document correctly? Is there no plan to calculate
partial correctness, in the case where the prediction is correct but
the type is not?

Thanks,

Kevin

--
Kevin Bretonnel Cohen, PhD
Biomedical Text Mining Group Lead, Computational Bioscience Program,
U. Colorado School of Medicine
303-916-2417 (cell) 303-377-9194 (home)
http://compbio.ucdenver.edu/Hunter_lab/Cohen

Isabel Segura

unread,
Feb 20, 2013, 12:59:11 PM2/20/13
to Kevin B. Cohen, ddiextract...@googlegroups.com
Dear Kevin,

 
"...only relations are evaluated."

I mean that entities will be not evaluated in this task 9.2 (of course, this is obvious because they will be provided in the dataset).


Regarding to the evaluation for DDI extraction task, our evaluation script will output two sets of scores according to:

1) Strict evaluation (both prediction and type should be correct.)
2) Partial evaluation: a pair is correct only if the prediction label is correct (regardless to the type). 

Additionally, we will calculate precision, recall and f-measure for each type of ddi. 

I am sorry for ambiguity in these documents. I will try to improve them as soon as I can do.

Please, Let me know about any error, omission or ambiguity, so that I can correct it.

Thanks, 

Isabel



2



--
Isabel Segura Bedmar
Despacho 2.2.A.10
Telf: 91 624 99 88
Departamento de Informática
Universidad Carlos III de Madrid,

http://www.inf.uc3m.es/component/comprofiler/userprofile/isegura

Behrouz Bokharaeian

unread,
Feb 20, 2013, 1:33:10 PM2/20/13
to Isabel Segura, ddiextract...@googlegroups.com

Dear Isabel
I have a question. You have provided two data set in two folder DrugBank (572 files) folder and Medline folder (142 files),

What is the difference between two training files?

Can you explain why you have provided two data set?

Thank you


--
You received this message because you are subscribed to the Google Groups "ddiextraction_semeval" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ddiextraction_se...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Isabel Segura

unread,
Feb 20, 2013, 1:50:20 PM2/20/13
to Behrouz Bokharaeian, ddiextract...@googlegroups.com
Dear Behrouz, 


·       DrugBank contains 572 documents describing drug-drug interactions from the  DrugBank database.

·       MedLine contains 142 abstracts on the subject of drug-drug interactions.

 You can find detail information on the corpus, guidelines, etc  in the following link: http://www.cs.york.ac.uk/semeval-2013/task9/


Best,


Isabel


Alberto Lavelli

unread,
Feb 21, 2013, 5:33:19 AM2/21/13
to Isabel Segura, ddiextract...@googlegroups.com
Hi Isabel.

thanks for the documents about the evaluation.
The documents clearly describe the evaluation metrics but it seems to
us that you are not distributing the scripts for the automatic
evaluation of system performance. Is it correct or are we missing
something?

thanks
alberto


2013/2/20 Isabel Segura <iseg...@gmail.com>:

Isabel Segura

unread,
Feb 21, 2013, 5:58:19 AM2/21/13
to Alberto Lavelli, ddiextract...@googlegroups.com
Dear Alberto, 

We will distribute these scripts when the evaluation period is finished. 

Best, 
Isabel

Alberto Lavelli

unread,
Feb 21, 2013, 6:20:42 AM2/21/13
to Isabel Segura, ddiextract...@googlegroups.com
But they would be useful now, when we are comparing different
algorithms/configurations on the training set to decide which one to
apply on the test set.


2013/2/21 Isabel Segura <iseg...@gmail.com>:

Isabel Segura

unread,
Feb 21, 2013, 6:25:23 AM2/21/13
to Alberto Lavelli, ddiextract...@googlegroups.com
Ok, I understand. Then I will try to distribute them as soon as we can do it.
Reply all
Reply to author
Forward
0 new messages