I have restarted the training. The previous error is gone now.
# nnet3-chain-train --apply-deriv-weights=False --l2-regularize=5e-05 --leaky-hmm-coefficient=0.1 --read-cache=exp/chain/tdnn1g_sp/cache.22 --xent-regularize=0.1 --prin
t-interval=10 --momentum=0.0 --max-param-change=2.0 --backstitch-training-scale=0.0 --backstitch-training-interval=1 --l2-regularize-factor=0.2 --srand=22 "nnet3-am-cop
y --raw=true --learning-rate=0.000812982346941 --scale=1.0 exp/chain/tdnn1g_sp/22.mdl - |nnet3-copy --edits='set-dropout-proportion name=* proportion=0.126666666667' -
- |" exp/chain/tdnn1g_sp/den.fst "ark,bg:nnet3-chain-copy-egs --frame-shift=0 ark:exp/chain/tdnn1g_sp/egs/cegs.2.ark ark
:- | nnet3-chain-shuffle-egs --buffer-size=5000 --srand=22 ark:- ark:- | nnet3-chain-merge-egs -
-minibatch-size=256,128,64 ark:- ark:- |" exp/chain/tdnn1g_sp/23.5.raw
# Started at Fri May 18 11:32:53 UTC 2018
#
nnet3-chain-train --apply-deriv-weights=False --l2-regularize=5e-05 --leaky-hmm-coefficient=0.1 --read-cache=exp/chain/tdnn1g_sp/cache.22 --xent-regularize=0.1 --print-
interval=10 --momentum=0.0 --max-param-change=2.0 --backstitch-training-scale=0.0 --backstitch-training-interval=1 --l2-regularize-factor=0.2 --srand=22 "nnet3-am-copy
--raw=true --learning-rate=0.000812982346941 --scale=1.0 exp/chain/tdnn1g_sp/22.mdl - |nnet3-copy --edits='set-dropout-proportion name=* proportion=0.126666666667' - -
|" exp/chain/tdnn1g_sp/den.fst 'ark,bg:nnet3-chain-copy-egs --frame-shift=0 ark:exp/chain/tdnn1g_sp/egs/cegs.2.ark ark:-
| nnet3-chain-shuffle-egs --buffer-size=5000 --srand=22 ark:- ark:- | nnet3-chain-merge-egs --m
inibatch-size=256,128,64 ark:- ark:- |' exp/chain/tdnn1g_sp/23.5.raw
WARNING (nnet3-chain-train[5.4.100~1-1331a]:SelectGpuId():cu-device.cc:196) Not in compute-exclusive mode. Suggestion: use 'nvidia-smi -c 3' to set compute exclusive m
ode
LOG (nnet3-chain-train[5.4.100~1-1331a]:SelectGpuIdAuto():cu-device.cc:315) Selecting from 1 GPUs
LOG (nnet3-chain-train[5.4.100~1-1331a]:SelectGpuIdAuto():cu-device.cc:330) cudaSetDevice(0): Tesla K80 free:11185M, used:254M, total:11439M, free/total:0.977797
LOG (nnet3-chain-train[5.4.100~1-1331a]:SelectGpuIdAuto():cu-device.cc:379) Trying to select device: 0 (automatically), mem_ratio: 0.977797
LOG (nnet3-chain-train[5.4.100~1-1331a]:SelectGpuIdAuto():cu-device.cc:398) Success selecting device 0 free mem ratio: 0.977797
LOG (nnet3-chain-train[5.4.100~1-1331a]:FinalizeActiveGpu():cu-device.cc:247) The active GPU is [0]: Tesla K80 free:10890M, used:549M, total:11439M, free/total:0.95198
8 version 3.7
nnet3-copy '--edits=set-dropout-proportion name=* proportion=0.126666666667' - -
nnet3-am-copy --raw=true --learning-rate=0.000812982346941 --scale=1.0 exp/chain/tdnn1g_sp/22.mdl -
LOG (nnet3-am-copy[5.4.100~1-1331a]:main():nnet3-am-copy.cc:151) Copied neural net from exp/chain/tdnn1g_sp/22.mdl to raw format as -
LOG (nnet3-copy[5.4.100~1-1331a]:ReadEditConfig():nnet-utils.cc:1247) Set dropout proportions for 8 components.
LOG (nnet3-copy[5.4.100~1-1331a]:main():nnet3-copy.cc:114) Copied raw neural net from - to -
LOG (nnet3-chain-train[5.4.100~1-1331a]:NnetChainTrainer():nnet-chain-training.cc:53) Read computation cache from exp/chain/tdnn1g_sp/cache.22
nnet3-chain-merge-egs --minibatch-size=256,128,64 ark:- ark:-
nnet3-chain-copy-egs --frame-shift=0 ark:exp/chain/tdnn1g_sp/egs/cegs.2.ark ark:-
nnet3-chain-shuffle-egs --buffer-size=5000 --srand=22 ark:- ark:-
ERROR (nnet3-chain-train[5.4.100~1-1331a]:RandUniform():cu-rand.cc:72) curandStatus_t 102 : "CURAND_STATUS_ALLOCATION_FAILED" returned from 'curandGenerateUniformWrap(g
en_, tmp.Data(), s)'
[ Stack-Trace: ]
nnet3-chain-train() [0x124bf7a]