Model behaviour changes drastically with change in Marian Version

Sunil Kumar

unread,

Aug 15, 2022, 11:59:10 PM8/15/22

to marian-nmt

Hi all,

We upgraded our Marian version to 1.11 from 1.9.58 but the decoding outputs from the new version are almost unintelligible or blank when old models are used. We also tried loading the model in the new framework and then saving a new checkpoint without any additional training, wondering if that could fix this issue, but it didn't.

Following are few examples:

English source:

1. I am going to school.
2. What about you?
3. No, I am not. But Johanes may be going to school.
4. In the meantime, let's check LDAP Access.
5. Famed author Salman Rushdie is recovering at a hospital after he was repeatedly stabbed on stage Friday in front of a New York audience in an attack that left him with severe injuries.
Old model Japanese output:

1. 私は学校に行きます。
2. あなたは?
3. いいえ、違います。でも、ジョハネスは学校に行くかもしれません。
4. その間、LDAP Accessをチェックしておこう。
5. 著名な作家サルマン・ラシュディは、ニューヨークの観客の前で金曜日に舞台上で何度も刺されて重傷を負った攻撃を受け、病院で回復している。

Old model with newer framework.:

1. BLANK

2. BLANK

3.、、、、、、、、、、、、、、、、、。

4.その間,LDAPアクセスをチェックする。
5. 彼が重度の傷害を負って、ニューヨークの聴衆の前で、繰り返し、彼を刺した後に、有名なサルマン・ラシュディは、病院で回復しています。

Is there any way to transfer to the new version without re-training models from scratch? Also, what is the fundamental reason for such behavior from science/engineering perspective?

Thanks.

--

Sunil

Marcin Junczys-Dowmunt

unread,

Aug 16, 2022, 12:03:33 AM8/16/22

to maria...@googlegroups.com

Hi,

What models are these? How were they trained?

We have a lot of regression tests that should make sure the your situation doesn’t happen, but maybe we missed some exotic type.

From: Sunil Kumar
Sent: Monday, August 15, 2022 8:59 PM
To: marian-nmt
Subject: [外部] [marian-nmt] Model behaviour changes drastically with change in Marian Version

You don't often get email from fyns...@gmail.com. Learn why this is important

--
You received this message because you are subscribed to the Google Groups "marian-nmt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marian-nmt+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/3e3968fd-898d-48d2-97a9-d99fdf681581n%40googlegroups.com.

Sunil Kumar

unread,

Aug 16, 2022, 12:08:09 AM8/16/22

to maria...@googlegroups.com

Hi Marcin,

These are standard transformer 6-6 models trained on V100s, non-quantised. Below is the training script.

`

$MARIAN_BINDIR/marian --model $MODELDIR/model.npz \
    --type transformer \
    --train-sets $DATADIR/final_train.${BT_DIR}.$SRC $DATADIR/final_train.${BT_DIR}.$TRG \
    --valid-sets $DATADIR/valid.$SRC $DATADIR/valid.$TRG \
    --sentencepiece-max-lines $V1 \
    --sentencepiece-options "--character_coverage=1.0 --user_defined_symbols=$SPL_TOKENS" \
    --vocabs $MODELDIR/vocab.$SRC-$TRG.spm $MODELDIR/vocab.$SRC-$TRG.spm \
    --dim-vocabs $V2 $V2 \
    --workspace $WORKSPACE \
    --mini-batch-fit --maxi-batch 1000 \
    --optimizer-delay $OPT_DELAY --sync-sgd \
    --valid-freq 1000 --save-freq 10000 --disp-freq 500 --keep-best \
    --valid-mini-batch 64 --valid-metrics ce-mean-words bleu-detok --quiet-translation \
    --beam-size 3 --normalize=0.6 --early-stopping 50 --cost-type=ce-mean-words \
    --log $MODELDIR/train.log --valid-log $MODELDIR/valid.log \
    --enc-depth 6 --dec-depth 6 \
    --transformer-preprocess n --transformer-postprocess da \
    --tied-embeddings-all --dim-emb 1024 --transformer-dim-ffn 4096 \
    --transformer-dropout 0.1 --transformer-dropout-attention 0.1 \
    --transformer-dropout-ffn 0.1 --label-smoothing 0.1 --exponential-smoothing \
    --learn-rate 0.0001 --lr-warmup 8000 --lr-decay-inv-sqrt 8000 --lr-report \
    --optimizer-params 0.9 0.98 1e-09 --clip-norm 5 \
    --fp16 --overwrite \
    --max-length 200 \
    --shuffle-in-ram -T tmp \
    --after-epochs 20 \
    --devices $GPUS --seed $SEED \
    --no-restore-corpus --valid-reset-stalled

Kumar S.

You received this message because you are subscribed to a topic in the Google Groups "marian-nmt" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/marian-nmt/tUdg_UNEbE8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to marian-nmt+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/MN0PR21MB33868C4ED7EF899F9F4235EBE96B9%40MN0PR21MB3386.namprd21.prod.outlook.com.

Marcin Junczys-Dowmunt

unread,

Aug 16, 2022, 12:10:11 AM8/16/22

to maria...@googlegroups.com

That should work. What is your exact decoding command?

To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CADx19V2kTnfCmywVb-_yUJy7onB0G-yxU6XXSk%3D4rqR4JgiVPA%40mail.gmail.com.

Sunil Kumar

unread,

Aug 16, 2022, 12:18:54 AM8/16/22

to maria...@googlegroups.com

./marian-decoder -m model.npz -v vocab.src.spm vocab.trg.spm -b 1

On Tue, 16 Aug 2022 at 12:11 PM 'Marcin Junczys-Dowmunt' via marian-nmt <maria...@googlegroups.com> wrote:

This message is eligible for Automatic Cleanup! (maria...@googlegroups.com) Add cleanup rule | More info

That should work. What is your exact decoding command?

From: Sunil Kumar
Sent: Monday, August 15, 2022 9:08 PM
To: maria...@googlegroups.com

To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CADx19V2kTnfCmywVb-_yUJy7onB0G-yxU6XXSk%3D4rqR4JgiVPA%40mail.gmail.com.

--
You received this message because you are subscribed to a topic in the Google Groups "marian-nmt" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/marian-nmt/tUdg_UNEbE8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to marian-nmt+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/MN0PR21MB338615112474F98BCC112506E96B9%40MN0PR21MB3386.namprd21.prod.outlook.com.

--

Sent from Gmail Mobile

Marcin Junczys-Dowmunt

unread,

Aug 16, 2022, 12:22:00 AM8/16/22

to maria...@googlegroups.com

Are you using the correct vocabs? According to your command you have vocab.src.spm and vocab.trg.spm while in the training command your embeddings are tied and you use the same vocab twice. Are the above files identical?

To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CADx19V2CpKBJw%2BmPqQ0wp9_EF8XOHaz35WoiJO6Wd26JgXTGRA%40mail.gmail.com.

Sunil Kumar

unread,

Aug 16, 2022, 12:33:38 AM8/16/22

to maria...@googlegroups.com

Yes, it’s just the renaming of same file.

On Tue, 16 Aug 2022 at 12:27 PM 'Marcin Junczys-Dowmunt' via marian-nmt <maria...@googlegroups.com> wrote:

This message is eligible for Automatic Cleanup! (maria...@googlegroups.com) Add cleanup rule | More info

Are you using the correct vocabs? According to your command you have vocab.src.spm and vocab.trg.spm while in the training command your embeddings are tied and you use the same vocab twice. Are the above files identical?

To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CADx19V2CpKBJw%2BmPqQ0wp9_EF8XOHaz35WoiJO6Wd26JgXTGRA%40mail.gmail.com.

--
You received this message because you are subscribed to a topic in the Google Groups "marian-nmt" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/marian-nmt/tUdg_UNEbE8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to marian-nmt+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/MN0PR21MB3386EBFBCDD0B0C80E1D0137E96B9%40MN0PR21MB3386.namprd21.prod.outlook.com.

Marcin Junczys-Dowmunt

unread,

Aug 16, 2022, 12:38:12 AM8/16/22

to maria...@googlegroups.com

Oh, I see the issue. You have pre-norm models: --transformer-preprocess n --transformer-postprocess da

The implementation of prenorm was buggy, the skip connection was incorrectly routed. We fixed the bug, but that will mean that pre-norm models trained before the fix don’t work anymore. Sorry about that.

To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CADx19V1-ZqQDYFXBmCgOWyEJJeMGajisuKwv2YV3-Cj94qxf%3DA%40mail.gmail.com.

Sunil Kumar

unread,

Aug 16, 2022, 12:48:21 AM8/16/22

to maria...@googlegroups.com

Hmm, thanks for the clarification Marcin. At least we know the issue :)

On Tue, 16 Aug 2022 at 12:42 PM 'Marcin Junczys-Dowmunt' via marian-nmt <maria...@googlegroups.com> wrote:

This message is eligible for Automatic Cleanup! (maria...@googlegroups.com) Add cleanup rule | More info

Oh, I see the issue. You have pre-norm models: --transformer-preprocess n --transformer-postprocess da

The implementation of prenorm was buggy, the skip connection was incorrectly routed. We fixed the bug, but that will mean that pre-norm models trained before the fix don’t work anymore. Sorry about that.

To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CADx19V1-ZqQDYFXBmCgOWyEJJeMGajisuKwv2YV3-Cj94qxf%3DA%40mail.gmail.com.

--
You received this message because you are subscribed to a topic in the Google Groups "marian-nmt" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/marian-nmt/tUdg_UNEbE8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to marian-nmt+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/MN0PR21MB33862EF298B0CE67B1844EC8E96B9%40MN0PR21MB3386.namprd21.prod.outlook.com.

Marcin Junczys-Dowmunt

unread,

Aug 16, 2022, 12:52:43 AM8/16/22

to maria...@googlegroups.com

I can look tomorrow for the commit that fixes the routing, maybe that’s helpful?

To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CADx19V0A8CNUPFFiUqUUz0BKJOYJ2305EQyvLtgM3XaJFsMcOg%40mail.gmail.com.

Sunil Kumar

unread,

Aug 16, 2022, 12:58:17 AM8/16/22

to maria...@googlegroups.com

Sure, we will dig on our end as well today, and let you know if we find that. Thanks for such prompt

response.

To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/MN0PR21MB338661300466663A08D2E01DE96B9%40MN0PR21MB3386.namprd21.prod.outlook.com.

Marcin Junczys-Dowmunt

unread,

Aug 16, 2022, 1:03:10 AM8/16/22

to maria...@googlegroups.com

Here, if I am not wrong:

marian-dev/src/models/transformer.h at master · marian-nmt/marian-dev · GitHub

To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CADx19V1YUNy%2BVuCOo23%2B%3DFQ275dZd9RK10xz6YcVvENku3i4LQ%40mail.gmail.com.

Reply all

Reply to author

Forward

Message has been deleted