Well my excitement has subsided a bit. :) It seems like F1 is
correlates pretty strongly to the number of clusters found, so if the
number of clusters is larger F1 is generally larger (although not
always). I guess we can see that in the results with the singleton
baseline getting an F1 of 1.000. But,
I am running some random baselines to try and understand things a bit,
and at least in that case as the number of clusters increases, F1
likewise increases, leading me to wonder how much F1 is actually
showing us...
1 of 2 random senses assigned (balanced distribution)
average F1 = 0.54890625
average Rand Index = 0.5006200396825399
average Adj Rand Index = -3.7523664631867494E-4
average Jaccard Index = 0.2698606769455676
============ average number of created clusters: 2.0
============ average cluster size: 32.0
1 of 5 random senses assigned (balanced distribution)
average F1 = 0.56734375
average Rand Index = 0.5612549603174602
average Adj Rand Index = 0.0011575900428277936
average Jaccard Index = 0.14526247612439977
============ average number of created clusters: 5.0
============ average cluster size: 12.799999999999976
1 of 10 random senses assigned (balanced distribution)
average F1 = 0.59671875
average Rand Index = 0.5810218253968255
average Adj Rand Index = 1.7524984876489096E-4
average Jaccard Index = 0.08182754500715404
============ average number of created clusters: 9.98
============ average cluster size: 6.41422222222221
1 of 25 random senses assigned (balanced distribution)
average F1 = 0.66890625
average Rand Index = 0.5924454365079366
average Adj Rand Index = -0.0015843012763914102
============ average number of created clusters: 23.2
============ average cluster size: 2.769600269448325
1 of 50 random senses assigned (balanced distribution)
average F1 = 0.761875
average Rand Index = 0.5972817460317459
average Adj Rand Index = 0.0010363484473483394
average Jaccard Index = 0.01997375346134799
============ average number of created clusters: 35.88
============ average cluster size: 1.790505301429098