Hi all!
First of all, I am absolutely blown away by the amount of interest we've had in this task. Thank you for participating and for all of your comments and bug reports! Thank you also for bearing with us while we work out/with our Codalab and leaderboard problems. I have some updates and reminders regarding the evaluation period:
Regarding extra submissions for subtask 1 and 2: IF you have run out of submissions because you would like to submit 5 results each for both subtasks, please send me an email. You will finish up your subtask 1 submissions first, then use your best performing files to submit with your subtask 2 tries. We will monitor all extra submissions and do have a backup of all submissions on Codalab; your submission will be removed and you may be disqualified if your subtask 1 files change with your subtask 2 submissions after you have exhausted all your subtask 1 tries. In other words, once you've finished your 5 subtask 1 submissions, they may NOT change during your subtask 2 submissions. You may NOT exceed 5 subtask 1 submissions and 5 subtask 2 submissions total. Please do not cheat, and PLEASE do not wait for the last minute to send me an email. It does take time to find your submissions, make a log of them, and add submissions.
Regarding the leaderboard: The leaderboard is currently hiding evaluation results and rankings. The rankings are not necessarily in the correct order. Please just submit your best scoring results to the leaderboard and we will announce the official rankings and scores next week.
Regarding subtask 3 files: The overlapping relationship issue has been solved the same way we solved overlapping token tags: by repeating the sentence with the overlapped relation label. The new files have been uploaded to git and are now on the training tab for you to evaluate against if you would like. If you are still trying to evaluate your subtask 1 or 2 submissions in the training tab on Codalab, make sure you're using the most recent version of the dev data (available on Github). It shouldn't impact your model performance, but you will need to have the right version since some sentences are now repeated to handle the overlapping relation label.
Regarding subtask 3 evaluation: Just to confirm as we have in the forums, you will NOT be evaluated on "supplements" relations that come from unevaluated labels in subtask 2 (e.g., Secondary-definition). These labels will still appear in the test data in case any of you are using them as input to your models for whatever reason, but you will not be evaluated on their relations.
Regarding subtask 3 test data: We will release test data on Sunday. I will try to post them first thing in the morning EST.
Again, as always, if you have any questions, please reach out.
Best,
Sasha