Retrain a non factor pretrained model with factors

41 views
Skip to first unread message

Kai Piontek

unread,
Sep 6, 2023, 4:44:40 AM9/6/23
to marian-nmt
Hello there,

I am wondering if it would be possible to retrain a pretrained model with factor inputs as I would like to improve the capability of the opus mt models to deal with terminology inflection.   

My idea would be to use an opus mt model as base and train a dataset of like 5million factorized sentences on top.

I would be happy to hear your opinion on this. Maybe someone tried it already, or maybe there is a fundamental problem that I do not see right now.

Happy to hear your thoughts :)

Best regards,
Kai

Adrian Yalda

unread,
Mar 29, 2024, 1:28:53 AM3/29/24
to marian-nmt
Hello,

You may have already seen this, but the only documentation I found on using Marian with factors is here: https://marian-nmt.github.io/docs/api/factors.html. Here it only talks about training with factors, not fine-tuning. I couldn't find anything on fine-tuning with factors (but it would be great if this were possible)!

I have a similar situation, where I have a set of core terminology and a variance of anywhere between 1 million to 2.5 million words of content across multiple languages. I would like to see if I can fine-tune with an .fsv file containing the terminology as a custom factored vocabulary. Currently for my model to learn these terms I have to get the text translated and loaded in. This can be done as datasets all as one batch, but it seems easier for linguists to just adjust the terminology as they move through the text that they need to translate.

Did you ever have any success on this?

Best, 
Adrian
Reply all
Reply to author
Forward
Message has been deleted
0 new messages