Hi everyone,
I just pushed an update to the SQuAD 2.0 leaderboard. Behind the scenes, we have patched some vulnerabilities in our testing procedure, which necessitated re-running all previously submitted models. As a result, some models that use randomness (such as ELMo-based models) had slight fluctuations in performance (usually within 0.1 F1, max diff of 0.4 F1). Please let me know if you have any questions!
Robin