My training of LibriSpeech chain model fails at stage 15 when running steps/nnet3/chain/train.py
Here are the error messages:
2018-02-07 18:45:17,751 [steps/nnet3/chain/train.py:404 - train - INFO ] Copying the properties from exp/chain_cleaned/tdnn_1b_sp/egs to exp/chain_cleaned/tdnn_1b_sp
2018-02-07 18:45:17,779 [steps/nnet3/chain/train.py:409 - train - INFO ] Computing the preconditioning matrix for input features
2018-02-07 18:45:46,620 [steps/nnet3/chain/train.py:417 - train - INFO ] Preparing the initial acoustic model.
2018-02-07 18:45:48,047 [steps/nnet3/chain/train.py:451 - train - INFO ] Training will run for 4.0 epochs = 735 iterations
2018-02-07 18:45:48,097 [steps/nnet3/chain/train.py:493 - train - INFO ] Iter: 0/734 Epoch: 0.00/4.0 (0.0% complete) lr: 0.003000
2018-02-07 18:51:41,891 [steps/nnet3/chain/train.py:493 - train - INFO ] Iter: 1/734 Epoch: 0.00/4.0 (0.0% complete) lr: 0.002997
run.pl: job failed, log is in exp/chain_cleaned/tdnn_1b_sp/log/train.1.2.log
2018-02-07 18:51:48,389 [steps/libs/common.py:231 - background_command_waiter - ERROR ] Command exited with status 1:
run.pl --gpu 1 exp/chain_cleaned/tdnn_1b_sp/log/train.1.2.log nnet3-chain-train --apply-deriv-weights=False --l2-regularize=5e-05 --leaky-hmm-coefficient=0.1 --read-cache=exp/chain_cleaned/tdnn_1b_sp/cache.1 --xent-regularize=0.1 --print-interval=10 --momentum=0.0 --max-param-change=2.0 --backstitch-training-scale=0.0 --backstitch-training-interval=1 --l2-regularize-factor=0.333333333333 --srand=1 "nnet3-am-copy --raw=true --learning-rate=0.00299703421811 --scale=1.0 exp/chain_cleaned/tdnn_1b_sp/1.mdl - |" exp/chain_cleaned/tdnn_1b_sp/den.fst "ark,bg:nnet3-chain-copy-egs --frame-shift=2 ark:exp/chain_cleaned/tdnn_1b_sp/egs/cegs.5.ark ark:- | nnet3-chain-shuffle-egs --buffer-size=5000 --srand=1 ark:- ark:- | nnet3-chain-merge-egs --minibatch-size=128 ark:- ark:- |" exp/chain_cleaned/tdnn_1b_sp/2.2.raw
And the log file (train.1.2.log) is in the attachment.
I run training on a single computer with 8-core CPU and one GPU. It seems to me that it runs out of memory. If that is the case, how can I reduce the memory consumption by changing parameter settings?
Thanks