Hi Rob,
> -----Original message-----
> From: robvanderg <
robva...@live.nl>
> Sent: 8 Oct 2021, 00:31
>
> Thanks for noticing!, during the setup of CodaLab, we noticed that there
> are minor differences between python2 and python3. So I was assuming that
> this causes the disrepancies. After taking a closer look, it seems like our
> codalab setup used a slightly older version of the data (due to different
> people being in charge of data-updates and the CodaLab). The differences
> are minor, and are only in the use of interjections, where some were
> accidentally still normalized (hahahaha->haha is now not normalized
> anymore). I don't think this should change any conclusions, as most
> probably all systems make the same mistakes with the old version of the
> data.
>
> Apologies for overseeing this! I did just push the script thats used for
> generating the table to the
> repo:
https://bitbucket.org/robvanderg/multilexnorm/src/master/scripts/mainTable.py
> so to get the correct numbers, you can just clone the repo, put your
> submissions in the submission folder, and run this script.
thanks a lot for the script, it made our work easy :-) We recomputed the
ablation experiments (obtaining the same results for the two runs as in
the official results), and updated the ablation discussion (we
originally saw a ~0.1 percent point improvement of a beam search, which
was caused purely by non-consistent evaluation in different settings)
and uploaded the final camera-ready version to SoftConf -- if it is
still possible to use it, that would be great.
Thanks & cheers,
Milan Straka
> Op donderdag 7 oktober 2021 om 23:10:20 UTC+2 schreef
>
str...@ufal.mff.cuni.cz:
>
> > Hi Rob,
> >
> > we found out that the results of the davda54 (ÚFAL) submissions in
> > CodaLab and in the overview paper (and in the results sent to the
> > multilexnorm mailing list) are a bit different -- specifically:
> > - in CodaLab, davda54-1 avg ERR is 66.34, davda54-2 avg ERR is 67.42
> > - in the overview paper, davda54-1 is 66.21, davda54-2 is 67.30
> > Regarding the individual treebank results, some are the same, but
> > several are different.
> >
> > Is this a known issue (i.e., the final evaluation is supposed to be
> > different to CodaLab)? In our paper, we present ablation experiments
> > based on the CodaLab evaluation, which are then inconsistent with the
> > official results.
> >
> > Thanks,
> > cheers,
> > Milan Straka
> >
>
> --
> You received this message because you are subscribed to the Google Groups "MultiLexNorm" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
multilexnorm...@googlegroups.com.
> To view this discussion on the web visit
https://groups.google.com/d/msgid/multilexnorm/cd6b304a-4867-4b94-ae8c-b962ffb72035n%40googlegroups.com.
> For more options, visit
https://groups.google.com/d/optout.