We've uploaded a new version of the scorer script on the task's
website, please download and use this version. We modified the way the
compute the agreement by chance. If you are familiar with the kappa
metric, you will know that this component can be interpreted in
different ways and essentially our previous interpretation made the
overall kappa scores unnecessarily over pessimistic. We'll explain the
way the computation was done in the task description paper.
Best,
Lucia