Hi,
We've resolved discrepancies with the data up to the point where we'll call it final. The final version emerging.test.annotated is the one you can download now. The score table is:
SYSTEM F1 (ENTITY) F1 (SURFACE)
arcada 39.98 37.77
drexel_cci 26.30 25.26
flytxt 38.35 36.31
mic-cis 37.06 34.25
sjtu_adapt 40.42 37.62
spinningbytes 40.78 39.33
uh_ritual 41.86 40.24
Sorry this took a few iterations. Establishing a new task and getting its evaluation right is often tricky, and we appreciate your bearing with us through this. As the results show, this presents a challenge for NLP systems, and generalizing effectively has proven tough; now we have the analyses in system papers and a dataset to work from when tackling this new NLP challenge.
All the best, and congratulations to all!
Leon