Hello,
Can you please check if the data processing [1] in the track 2 is correct?
L110: dataset = dataset.dropna(subset=["gloss", "example", "word”])
It is said that instances will be removed when they are empty in one of the three columns. L110 works on dev data but fails on test data. I think we should keep the instances with empty glosses in test data, as filling out the glosses is the goal of the track 2.
Below is the correction:
L110: dataset = dataset.dropna(subset=["gloss", "example", "word”])
Please confirm this issue. Thanks.
[1] https://github.com/ltgoslo/axolotl24_shared_task/blob/main/code/baselines/baseline_track2.py#L110
Best,
Wei