Hello George,
We have a dataset that contains a few sentences in English (that describe some aspects of the game) and translated version in the target language.
For example:
English: "Track damaged. Risk of breakage increased."
Japanese: "履帯損傷、耐久性低下"
As you can see the English version has two sentences and a Japanese one. We are afraid to split those cases into sentences automatically.
Can we use cases like this in training dataset or we have to avoid it?