Hi Rob,
> -----Original message-----
> From: robvanderg <
robva...@live.nl>
> Sent: 8 Sep 2021, 03:58
>
> Updated results are including extrinsic evaluation are attached. Ranking of
> the teams seems similar, but interestingly MFR ranks much higher in
> comparison to ERR
I have a question regarding the extrinsic evaluation, because I am not
sure what the reported LAS numbers are.
The MultiLexNorm web page states that
As secondary evaluation, we will include an evaluation of the
downstream effect of normalization. We will focus on Dependency
parsing, and include the raw input data with the distributed test data
for some of the languages. Then, we train a dependency parser for each
available language on canonical data, and evaluate the effect of
having normalization versus the orginal data.
My expectation was that using the extrinsic evaluation treebanks:
- we pass the forms through the submitted systems (including LAI)
- we then run MaChAmp (trained on canonical data) to get the parses
- we then compute LAS using the predicted parses
If that would be the case, the numbers are way too low -- for example
- on it_postwita, LAI gives 66.49, but MaChAmp paper
https://arxiv.org/pdf/2005.14672.pdf
presents 74.9; the former is UD 2.8 and the latter UD 2.6, but I still
would not expect that much difference. Also, UDPipe 2 using
only UD 2.6 it-postwita training data gives us 83.6 LAS
- on it_twittiro, LAI gives 70.06, MaChAmp paper presents 77.3, UDPipe
2 trained solely on UD 2.6 it-twittiro gives 80.28. Note that twittiro
UD 2.6 and UD 2.8 is nearly identical.
Also, the lexical normalization can change the number of words. How is
then LAS score computed? I could imagine some kind of LCS and then
report F1 LAS score; but at least in case of the above two treebanks,
the gold trees are annotated on the original data (i.e., before
normalization), while we would need gold trees on normalized data.
(Ah -- there is some kind of merging in
1.machamp.pred.py, maybe that
handles it? But I am unsure how a consistent tokenization would be
obtained from the normalized text.)
Last nitpick, how are multiword tokens handled (are they passed on the
input to the lexical normalization, or are the syntax trees remapped to
tokens instead to words)?
Thanks very much,
cheers,
Milan Straka
> Op zaterdag 4 september 2021 om 12:03:38 UTC+2 schreef
ro...@itu.dk:
>
> > Please send the information as soon as possible, and latest 3 days before
> > the paper deadline: September 19
> >
> > On Saturday, 4 September 2021 at 11:58:34 UTC+2 Rob v wrote:
> >
> >> Dear participants,
> >>
> >> Attached is the pdf with the results of the shared task (I have included
> >> MoNoise scores for reference), some interesting results are obtained and I
> >> am very interested in learning about your approaches!
> >> Thanks to you all for participation!
> >>
> >> I would like to ask all participants to e-mail the following information
> >> to
multil...@gmail.com
> >>
> >> team-name:
> >> members + affiliation:
> >> short description of system for overview paper (1 paragraph):
> >> Will publish code: yes/no
> >> Used additional annotated normalization data: yes/no (and which?)
> >> Other external resources used:
> >>
> >> We are still running the external evaluation, and will post the results
> >> here when available.
> >>
> >
>
> --
> You received this message because you are subscribed to the Google Groups "MultiLexNorm" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
multilexnorm...@googlegroups.com.
> To view this discussion on the web visit
https://groups.google.com/d/msgid/multilexnorm/da5cbe19-61e4-4e4a-a155-96ff4db6c898n%40googlegroups.com.
> For more options, visit
https://groups.google.com/d/optout.