Hi Ana-maria,
Thanks for your question
We are currently preparing the online evaluation platform, this will be published on this Google groups soon, including a description of the desired format and the possibility to try submit predictions on dev data. (The data should be prepared as a folder per language, and then each directory contains a file in the same format as the input files, i.e. 2 columns, raw data and predicted normalization). It should be noted that the dev prediction phase does not include all languages, because some datasets are too small to have dev data in my opinion.
For the intrinsic evaluation (main metric), it will be one file per language, but for the extrinsic evaluation, we will use additional files (3*en, 2*it, 1*de, and 1*tr). You are required to upload both for a submission (if you do not have a system for a certain language, you can use one of the baselines provided in the repo)
Best,
Rob