Same rouge metrics will be used for evaluation of all languages, as ROUGE is language independent. The only language dependent component is a stemmer which we will not be using.
We will release evaluation script after test submission closes, meanwhile we encourage all teams to evaluate using the submission platform and leaderboard.
And yes, you are free to use any new models during test phase irrespective of what you submitted in validation phase. However, the maximum submissions in test phase will be limited to 3, so we suggest choosing your models wisely.
Let us know if you have any other questions.