I've been going over the evaluation results, and just want to make
sure I understand the system scores for the numeric and the coarse
grained results. I see the note about Spearman's rho and Kendall's tau
on the pdf file, but it doesn't really clearly align with either set
of results, so I just wanted to double check what was used on what set
of results....could you clarify that?
Thanks!
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Thanks for clarifying this. I think it's falling into place now...let
me just use an example from the results to make sure I'm
understanding...
For numeric scoring there is a system called pred.en.submit which
reported 174 answers...the spearman's rho for that was .27 and the
kendall's tau was .18. Is that correct?
Then as we move across that table there are values of 16.19, 14.93,
21.64 and 14.66....what are those?
For that same system in the coarse grained it answered 118 times, and
there are values of .356, .346, .5 and .275....what are those?
Thanks!
Ted
Thanks for these clarifications, this is quite helpful.
I didn't actually catch on til just now that the (-0.01) and (-0.01)
reported for duluth-1 (for example) was the actual spearman's and
kendal's values!! That's quite unexpected....After realizing this I
thought this might have something to do with ties, but it seems like
the gold standard also has ties so that didn't seem like a likely
explanation....
For example the top 12 (most literal) pairs from duluth-1 are put into
5 ranks ...
100 EN_V_OBJ develop methods
99 EN_V_SUBJ fans want
98 EN_V_OBJ wait minute
98 EN_V_OBJ raise bar
98 EN_V_OBJ foot bill
97 EN_V_SUBJ economist call
97 EN_V_OBJ take plunge
97 EN_V_OBJ pay visit
97 EN_V_OBJ help children
96 EN_V_OBJ spread word
96 EN_V_OBJ double number
96 EN_V_OBJ collect data
From the gold standard...these are the top 13 pairs, organized in 7 ranks....
98 EN_ADJ_NN small island
98 EN_ADJ_NN early version
97 EN_V_OBJ help people
96 EN_ADJ_NN red wine
96 EN_ADJ_NN rechargeable battery
95 EN_V_OBJ provide service
95 EN_V_OBJ provide information
95 EN_V_OBJ collect data
95 EN_ADJ_NN cheap price
94 EN_ADJ_NN statistical analysis
93 EN_V_OBJ obtain information
93 EN_ADJ_NN little girl
Interestingly enough there seems to be almost no pair in common
between these two sets :) ...so I think the hypothesis about the
disordered buckets in duluth-1 is very valid...
Anyway, just wanted to say thanks and confirm your observations
here....quite interesting....I'll keep thinking on this...
Cordially,
Ted
On Tue, May 3, 2011 at 7:43 AM, Organizer DISCo Workshop 2011