Clarification on some languages' training data lacking English translation

34 views
Skip to first unread message

Murad Rasulov

unread,
Mar 7, 2024, 11:21:41 AMMar 7
to Ideology and Power in Parliamentary Speeches
Hello!
As we are working on building a model, we have 3 clarifying questions, which would help us greatly at this stage in the proces:

- Will the test data also have text_en fields for the same languages that the data does now?
- Will the test data contain more, initially not shown, languages?
- May a language that has text_em fields in the train data not have those in its test data version?


Thank you!

Best,
Muad

cagri coltekin

unread,
Mar 7, 2024, 6:07:22 PMMar 7
to Murad Rasulov, Ideology and Power in Parliamentary Speeches
Hi,

On Thu, Mar 07, 2024 at 08:21:41AM -0800, Murad Rasulov wrote:
> Hello!
> As we are working on building a model, we have 3 clarifying questions,
> which would help us greatly at this stage in the proces:
>
> - Will the test data also have text_en fields for the same languages that
> the data does now?

Yes. Test sets will also have automatic translations.

> - Will the test data contain more, initially not shown, languages?

No. No surprise languages.

> - May a language that has text_em fields in the train data not have those
> in its test data version?

All test data will have the English translations. Please note,
however, all translations are automatic translations, and may
contain errors.

Best,
Cagri
Reply all
Reply to author
Forward
0 new messages