Hi Ozan
Thanks, that should be fixed now. Also I think the tr-en corpus was misaligned but that should also be fixed,
cheers - Barry
--
You received this message because you are subscribed to the Google Groups "Workshop on Statistical Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
It was misaligned in the preprocessed data.
--
You received this message because you are subscribed to the Google Groups "Workshop on Statistical Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+unsubscribe@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.
| Test data released | May 2, 2017 |
| Translation submission deadline | May 8, 2017 |
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Hi Maksym
Thanks for pointing that out. I have identified the problem and I'm regenerating the ru-en data.
Yes, there will be noise in the original data, and the only filtering I do is length-based. I tried to keep the pre-processing fairly 'light touch'.To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.