Marian keeps failing for large vocabulary size

241 views

Skip to first unread message

charan n.a

unread,

Jul 2, 2020, 3:03:26 PM7/2/20

to marian-nmt

I have a vocabulary size of around 388597 and 304290, I went through them and most of it seems fine. Even if I use a subword segmentation it shouldn't reduce by a large number. This is for a translation model and the training seems to crash multiple times after lines like below appear in the log.

tcmalloc: large alloc 1555718144 bytes == 0x56440fc24000 @

The command I am running is

nohup marian --model /home/ubuntu/model.s2s/model.npz --type s2s --train-sets ~/lang1.txt ~/lang2.txt --max-length 150 --mini-batch-fit -w 7000 --maxi-batch 1000 --save-freq 1000 --disp-freq 1000 --overwrite --keep-best --early-stopping 5 --after-epochs 10 --cost-type=ce-mean-words --log /home/ubuntu/model.s2s/train.log --tied-embeddings --layer-normalization --seed 0 --exponential-smoothing --devices 0

and the log I end up with is:

[2020-07-02 18:29:07] [marian] Marian v1.9.0 ba94c5b9 2020-05-17 10:42:17 +0100

[2020-07-02 18:29:07] [marian] Running on ip-172-31-28-255 as process 5525 with command line:

[2020-07-02 18:29:07] [marian] /home/ubuntu/marian/build/marian --model /home/ubuntu/model.s2s/model.npz --type s2s --train-sets /home/ubuntu/lang1.txt /home/ubuntu/lang2.txt --max-length 150 --mini-batch-fit -w 5000 --maxi-batch 1000 --save-freq 1000 --disp-freq 1000 --overwrite --keep-best --early-stopping 5 --after-epochs 10 --cost-type=ce-mean-words --log /home/ubuntu/model.s2s/train.log --tied-embeddings --layer-normalization --seed 0 --exponential-smoothing --devices 0

[2020-07-02 18:29:07] [config] after-batches: 0

[2020-07-02 18:29:07] [config] after-epochs: 10

[2020-07-02 18:29:07] [config] all-caps-every: 0

[2020-07-02 18:29:07] [config] allow-unk: false

[2020-07-02 18:29:07] [config] authors: false

[2020-07-02 18:29:07] [config] beam-size: 12

[2020-07-02 18:29:07] [config] bert-class-symbol: "[CLS]"

[2020-07-02 18:29:07] [config] bert-mask-symbol: "[MASK]"

[2020-07-02 18:29:07] [config] bert-masking-fraction: 0.15

[2020-07-02 18:29:07] [config] bert-sep-symbol: "[SEP]"

[2020-07-02 18:29:07] [config] bert-train-type-embeddings: true

[2020-07-02 18:29:07] [config] bert-type-vocab-size: 2

[2020-07-02 18:29:07] [config] build-info: ""

[2020-07-02 18:29:07] [config] cite: false

[2020-07-02 18:29:07] [config] clip-gemm: 0

[2020-07-02 18:29:07] [config] clip-norm: 1

[2020-07-02 18:29:07] [config] cost-scaling:

[2020-07-02 18:29:07] [config] []

[2020-07-02 18:29:07] [config] cost-type: ce-mean-words

[2020-07-02 18:29:07] [config] cpu-threads: 0

[2020-07-02 18:29:07] [config] data-weighting: ""

[2020-07-02 18:29:07] [config] data-weighting-type: sentence

[2020-07-02 18:29:07] [config] dec-cell: gru

[2020-07-02 18:29:07] [config] dec-cell-base-depth: 2

[2020-07-02 18:29:07] [config] dec-cell-high-depth: 1

[2020-07-02 18:29:07] [config] dec-depth: 1

[2020-07-02 18:29:07] [config] devices:

[2020-07-02 18:29:07] [config] - 0

[2020-07-02 18:29:07] [config] dim-emb: 512

[2020-07-02 18:29:07] [config] dim-rnn: 1024

[2020-07-02 18:29:07] [config] dim-vocabs:

[2020-07-02 18:29:07] [config] - 0

[2020-07-02 18:29:07] [config] disp-first: 0

[2020-07-02 18:29:07] [config] disp-freq: 1000

[2020-07-02 18:29:07] [config] disp-label-counts: false

[2020-07-02 18:29:07] [config] dropout-rnn: 0

[2020-07-02 18:29:07] [config] dropout-src: 0

[2020-07-02 18:29:07] [config] dropout-trg: 0

[2020-07-02 18:29:07] [config] dump-config: ""

[2020-07-02 18:29:07] [config] early-stopping: 5

[2020-07-02 18:29:07] [config] embedding-fix-src: false

[2020-07-02 18:29:07] [config] embedding-fix-trg: false

[2020-07-02 18:29:07] [config] embedding-normalization: false

[2020-07-02 18:29:07] [config] embedding-vectors:

[2020-07-02 18:29:07] [config] []

[2020-07-02 18:29:07] [config] enc-cell: gru

[2020-07-02 18:29:07] [config] enc-cell-depth: 1

[2020-07-02 18:29:07] [config] enc-depth: 1

[2020-07-02 18:29:07] [config] enc-type: bidirectional

[2020-07-02 18:29:07] [config] english-title-case-every: 0

[2020-07-02 18:29:07] [config] exponential-smoothing: 0.0001

[2020-07-02 18:29:07] [config] factor-weight: 1

[2020-07-02 18:29:07] [config] grad-dropping-momentum: 0

[2020-07-02 18:29:07] [config] grad-dropping-rate: 0

[2020-07-02 18:29:07] [config] grad-dropping-warmup: 100

[2020-07-02 18:29:07] [config] gradient-checkpointing: false

[2020-07-02 18:29:07] [config] guided-alignment: none

[2020-07-02 18:29:07] [config] guided-alignment-cost: mse

[2020-07-02 18:29:07] [config] guided-alignment-weight: 0.1

[2020-07-02 18:29:07] [config] ignore-model-config: false

[2020-07-02 18:29:07] [config] input-types:

[2020-07-02 18:29:07] [config] []

[2020-07-02 18:29:07] [config] interpolate-env-vars: false

[2020-07-02 18:29:07] [config] keep-best: true

[2020-07-02 18:29:07] [config] label-smoothing: 0

[2020-07-02 18:29:07] [config] layer-normalization: true

[2020-07-02 18:29:07] [config] learn-rate: 0.0001

[2020-07-02 18:29:07] [config] lemma-dim-emb: 0

[2020-07-02 18:29:07] [config] log: /home/ubuntu/model.s2s/train.log

[2020-07-02 18:29:07] [config] log-level: info

[2020-07-02 18:29:07] [config] log-time-zone: ""

[2020-07-02 18:29:07] [config] lr-decay: 0

[2020-07-02 18:29:07] [config] lr-decay-freq: 50000

[2020-07-02 18:29:07] [config] lr-decay-inv-sqrt:

[2020-07-02 18:29:07] [config] - 0

[2020-07-02 18:29:07] [config] lr-decay-repeat-warmup: false

[2020-07-02 18:29:07] [config] lr-decay-reset-optimizer: false

[2020-07-02 18:29:07] [config] lr-decay-start:

[2020-07-02 18:29:07] [config] - 10

[2020-07-02 18:29:07] [config] - 1

[2020-07-02 18:29:07] [config] lr-decay-strategy: epoch+stalled

[2020-07-02 18:29:07] [config] lr-report: false

[2020-07-02 18:29:07] [config] lr-warmup: 0

[2020-07-02 18:29:07] [config] lr-warmup-at-reload: false

[2020-07-02 18:29:07] [config] lr-warmup-cycle: false

[2020-07-02 18:29:07] [config] lr-warmup-start-rate: 0

[2020-07-02 18:29:07] [config] max-length: 150

[2020-07-02 18:29:07] [config] max-length-crop: false

[2020-07-02 18:29:07] [config] max-length-factor: 3

[2020-07-02 18:29:07] [config] maxi-batch: 1000

[2020-07-02 18:29:07] [config] maxi-batch-sort: trg

[2020-07-02 18:29:07] [config] mini-batch: 64

[2020-07-02 18:29:07] [config] mini-batch-fit: true

[2020-07-02 18:29:07] [config] mini-batch-fit-step: 10

[2020-07-02 18:29:07] [config] mini-batch-track-lr: false

[2020-07-02 18:29:07] [config] mini-batch-warmup: 0

[2020-07-02 18:29:07] [config] mini-batch-words: 0

[2020-07-02 18:29:07] [config] mini-batch-words-ref: 0

[2020-07-02 18:29:07] [config] model: /home/ubuntu/model.s2s/model.npz

[2020-07-02 18:29:07] [config] multi-loss-type: sum

[2020-07-02 18:29:07] [config] multi-node: false

[2020-07-02 18:29:07] [config] multi-node-overlap: true

[2020-07-02 18:29:07] [config] n-best: false

[2020-07-02 18:29:07] [config] no-nccl: false

[2020-07-02 18:29:07] [config] no-reload: false

[2020-07-02 18:29:07] [config] no-restore-corpus: false

[2020-07-02 18:29:07] [config] normalize: 0

[2020-07-02 18:29:07] [config] normalize-gradient: false

[2020-07-02 18:29:07] [config] num-devices: 0

[2020-07-02 18:29:07] [config] optimizer: adam

[2020-07-02 18:29:07] [config] optimizer-delay: 1

[2020-07-02 18:29:07] [config] optimizer-params:

[2020-07-02 18:29:07] [config] []

[2020-07-02 18:29:07] [config] overwrite: true

[2020-07-02 18:29:07] [config] precision:

[2020-07-02 18:29:07] [config] - float32

[2020-07-02 18:29:07] [config] pretrained-model: ""

[2020-07-02 18:29:07] [config] quiet: false

[2020-07-02 18:29:07] [config] quiet-translation: false

[2020-07-02 18:29:07] [config] relative-paths: false

[2020-07-02 18:29:07] [config] right-left: false

[2020-07-02 18:29:07] [config] save-freq: 1000

[2020-07-02 18:29:07] [config] seed: 0

[2020-07-02 18:29:07] [config] shuffle: data

[2020-07-02 18:29:07] [config] shuffle-in-ram: false

[2020-07-02 18:29:07] [config] skip: false

[2020-07-02 18:29:07] [config] sqlite: ""

[2020-07-02 18:29:07] [config] sqlite-drop: false

[2020-07-02 18:29:07] [config] sync-sgd: false

[2020-07-02 18:29:07] [config] tempdir: /tmp

[2020-07-02 18:29:07] [config] tied-embeddings: true

[2020-07-02 18:29:07] [config] tied-embeddings-all: false

[2020-07-02 18:29:07] [config] tied-embeddings-src: false

[2020-07-02 18:29:07] [config] train-sets:

[2020-07-02 18:29:07] [config] - /home/ubuntu/hin_all.txt

[2020-07-02 18:29:07] [config] - /home/ubuntu/eng_all.txt

[2020-07-02 18:29:07] [config] transformer-aan-activation: swish

[2020-07-02 18:29:07] [config] transformer-aan-depth: 2

[2020-07-02 18:29:07] [config] transformer-aan-nogate: false

[2020-07-02 18:29:07] [config] transformer-decoder-autoreg: self-attention

[2020-07-02 18:29:07] [config] transformer-depth-scaling: false

[2020-07-02 18:29:07] [config] transformer-dim-aan: 2048

[2020-07-02 18:29:07] [config] transformer-dim-ffn: 2048

[2020-07-02 18:29:07] [config] transformer-dropout: 0

[2020-07-02 18:29:07] [config] transformer-dropout-attention: 0

[2020-07-02 18:29:07] [config] transformer-dropout-ffn: 0

[2020-07-02 18:29:07] [config] transformer-ffn-activation: swish

[2020-07-02 18:29:07] [config] transformer-ffn-depth: 2

[2020-07-02 18:29:07] [config] transformer-guided-alignment-layer: last

[2020-07-02 18:29:07] [config] transformer-heads: 8

[2020-07-02 18:29:07] [config] transformer-no-projection: false

[2020-07-02 18:29:07] [config] transformer-postprocess: dan

[2020-07-02 18:29:07] [config] transformer-postprocess-emb: d

[2020-07-02 18:29:07] [config] transformer-preprocess: ""

[2020-07-02 18:29:07] [config] transformer-tied-layers:

[2020-07-02 18:29:07] [config] []

[2020-07-02 18:29:07] [config] transformer-train-position-embeddings: false

[2020-07-02 18:29:07] [config] type: s2s

[2020-07-02 18:29:07] [config] ulr: false

[2020-07-02 18:29:07] [config] ulr-dim-emb: 0

[2020-07-02 18:29:07] [config] ulr-dropout: 0

[2020-07-02 18:29:07] [config] ulr-keys-vectors: ""

[2020-07-02 18:29:07] [config] ulr-query-vectors: ""

[2020-07-02 18:29:07] [config] ulr-softmax-temperature: 1

[2020-07-02 18:29:07] [config] ulr-trainable-transformation: false

[2020-07-02 18:29:07] [config] unlikelihood-loss: false

[2020-07-02 18:29:07] [config] valid-freq: 10000u

[2020-07-02 18:29:07] [config] valid-log: ""

[2020-07-02 18:29:07] [config] valid-max-length: 1000

[2020-07-02 18:29:07] [config] valid-metrics:

[2020-07-02 18:29:07] [config] - cross-entropy

[2020-07-02 18:29:07] [config] valid-mini-batch: 32

[2020-07-02 18:29:07] [config] valid-reset-stalled: false

[2020-07-02 18:29:07] [config] valid-script-args:

[2020-07-02 18:29:07] [config] []

[2020-07-02 18:29:07] [config] valid-script-path: ""

[2020-07-02 18:29:07] [config] valid-sets:

[2020-07-02 18:29:07] [config] []

[2020-07-02 18:29:07] [config] valid-translation-output: ""

[2020-07-02 18:29:07] [config] vocabs:

[2020-07-02 18:29:07] [config] []

[2020-07-02 18:29:07] [config] word-penalty: 0

[2020-07-02 18:29:07] [config] word-scores: false

[2020-07-02 18:29:07] [config] workspace: 5000

[2020-07-02 18:29:07] [config] Model is being created with Marian v1.9.0 ba94c5b9 2020-05-17 10:42:17 +0100

[2020-07-02 18:29:07] Using single-device training

[2020-07-02 18:29:07] No vocabulary files given, trying to find or build based on training data. Vocabularies will be built separately for each file.

[2020-07-02 18:29:07] No vocabulary path given; trying to find default vocabulary based on data path /home/ubuntu/hin_all.txt

[2020-07-02 18:29:07] [data] Loading vocabulary from JSON/Yaml file /home/ubuntu/hin_all.txt.yml

[2020-07-02 18:29:11] [data] Setting vocabulary size for input 0 to 388597

[2020-07-02 18:29:11] No vocabulary path given; trying to find default vocabulary based on data path /home/ubuntu/eng_all.txt

[2020-07-02 18:29:11] [data] Loading vocabulary from JSON/Yaml file /home/ubuntu/eng_all.txt.yml

[2020-07-02 18:29:13] [data] Setting vocabulary size for input 1 to 304290

[2020-07-02 18:29:13] Compiled without MPI support. Falling back to FakeMPIWrapper

[2020-07-02 18:29:13] [batching] Collecting statistics for batch fitting with step size 10

[2020-07-02 18:29:13] [memory] Extending reserved space to 5120 MB (device gpu0)

[2020-07-02 18:29:14] [logits] applyLossFunction() for 1 factors