Hi Dimitrios,
There are less than 200 countries in the world, hence we decided to keep 100 in the train, 50 in the test, and the rest in dev. This ensures consistency in the number of entities present in the test set for a fair evaluation. Since the dev set is typically used for parameter tuning, this shouldn't be an issue.
Best,
Sneha