Dear all,
Unfortunately, I have still found some issues in the data. I have fixed these a minute ago and pushed the fixes to the repo. Two things have changed:
- The Danish dataset became much larger
- Some previously uncaught interjections (of the following 4 types: hmm, haha, eeh, xxx), where normalized. We decided to keep those in their original form all throughout the data.
I would strongly suggest to at least re-train your Danish model.
Sorry for the inconvenience,
Rob