Anyone care to cross check numbers on most frequent sense for this
data?
I am computing it just by counting up the number of instances that
belong to the most frequent sense and dividing by the number of
instances.
For the test data I get 79.84, which is 3873/4851
For the test+train data I get 78.55, which is 21313/27132
Anyone able to confirm/deny?
Thanks!
Ted
On 2007/04/18, tped...@d.umn.edu-k wrote:
>
> I am computing it just by counting up the number of instances that
> belong to the most frequent sense and dividing by the number of
> instances.
I have slightly different results. I compute the mfs by:
- creating a "mfs solution" with the create_mfs_key.pl script
- running scorer2 on the keyfile
> For the test data I get 79.84, which is 3873/4851
% perl create_mfs_key.pl ../keys/senseinduction_train.key ../keys/senseinduction_test.key > mfs_test.key
% ./scorer2 mfs_test.key ../keys/senseinduction_test.key
Fine-grained score for "mfs_test.key" using key "../keys/senseinduction_test.key":
precision: 0.787 (3816.00 correct of 4851.00 attempted)
recall: 0.787 (3816.00 correct of 4851.00 in total)
attempted: 100.00 % (4851.00 attempted of 4851.00 in total)
So it gives me 78.7, which is 3816/4851
> For the test+train data I get 78.55, which is 21313/27132
% perl create_mfs_key.pl ../keys/senseinduction_train.key ../keys/senseinduction.key > mfs_full.key
% ./scorer2 mfs_full.key ../keys/senseinduction.key
Fine-grained score for "mfs_full.key" using key "../keys/senseinduction.key":
precision: 0.785 (21286.00 correct of 27132.00 attempted)
recall: 0.785 (21286.00 correct of 27132.00 in total)
attempted: 100.00 % (27132.00 attempted of 27132.00 in total)
It gives 78.5, which is 21286/27132
best,
aitor
You can download the script here:
http://ixa2.si.ehu.es/semeval-senseinduction/create_mfs_key.pl
best
aitor
Yes, I see what you are doing here, thanks for the clarification. So
you are using the most frequent sense from the training data to tell
you what the most frequent sense in the test data will be, and you are
also using the most frequent sense in the training data to tell you
what the most frequent sense in the train+test data should be.
I'm using the most frequent sense in the test data to tell me what the
most frequent sense in the test data should be, and I'm using the most
frequent sense in the train+test data to tell me what the most
frequent sense in the train+test data should be. :)
The use of training data to determine a most frequent sense that is
then applied to test data very clearly corresponds to a supervised
mfs....The use of training data to determine a mfs that is then
applied to train+test seems a little bit of a hybrid between
supervised and unsupervised I guess...
I am thinking that perhaps my mfs is an unsupervised variant, meaning
that we don't have any notion of a separate set of training data. We
get our values of mfs from the very data that we cluster (where we
happen to have some gold standard for that data available) In that
view, I think that 1 cluster per word should degenerate to this
unsupervised mfs, at least in terms of a precision/recall score. That
ties back to an earlier bit of mail about mfs and 1 cluster per word.
I think in the supervised framework they might not be equivalent, but
in the unsupervised it seems like they could be...
Thanks,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
On 2007/04/18, Ted Pedersen wrote:
> Yes, I see what you are doing here, thanks for the clarification. So
> you are using the most frequent sense from the training data to tell
> you what the most frequent sense in the test data will be, and you are
> also using the most frequent sense in the training data to tell you
> what the most frequent sense in the train+test data should be.
Actually, we only use the "tag all test instances with the most frequent
sense of training" approach to have a baseline in the supervised
evaluation. The other approach "tag train+test instances with the most
frequent sense of training" was only for comparing the results you gave, but
we don't use it.
> [...]
> I am thinking that perhaps my mfs is an unsupervised variant, meaning
> that we don't have any notion of a separate set of training data. We
> get our values of mfs from the very data that we cluster (where we
> happen to have some gold standard for that data available) In that
> view, I think that 1 cluster per word should degenerate to this
> unsupervised mfs, at least in terms of a precision/recall score. That
> ties back to an earlier bit of mail about mfs and 1 cluster per word.
> I think in the supervised framework they might not be equivalent, but
> in the unsupervised it seems like they could be...
Yes, in the unsupervised evaluation they should give the same results. After
all, we are always creating an 1 cluster per word solution, be it the mfs or
any other random cluster (the unsupervised evaluation script doesn't really
care about it).
best,
aitor