steps/nnet3/chain/train_tdnn.sh: Will train for 4 epochs = 1069 iterationsOn iteration 0, learning rate is 0.003.Training neural net (pass 0)On iteration 1, learning rate is 0.00299796180388041.Training neural net (pass 1)On iteration 2, learning rate is 0.00299592499250862.Training neural net (pass 2)On iteration 3, learning rate is 0.00299388956494385.Training neural net (pass 3)On iteration 4, learning rate is 0.00299185552024593.Training neural net (pass 4)On iteration 5, learning rate is 0.00298982285747534.Training neural net (pass 5)queue.pl: job failed with status 255, log is in exp/chain/tdnn_2o_sp/log/train.5.1.log
KALDI_ASSERT: at nnet3-chain-train:PreconditionDirectionsInternal:natural-gradient-online.cc:348, failed: tr_Xhat_XhatT == tr_Xhat_XhatTStack trace is:kaldi::KaldiGetStackTrace()kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)kaldi::nnet3::OnlineNaturalGradient::PreconditionDirectionsInternal(int, float, kaldi::Vector<float> const&, kaldi::CuMatrixBase<float>*, kaldi::CuMatrixBase<float>*, kaldi::CuVectorBase<float>*, float*)kaldi::nnet3::OnlineNaturalGradient::PreconditionDirections(kaldi::CuMatrixBase<float>*, kaldi::CuVectorBase<float>*, float*)kaldi::nnet3::NaturalGradientAffineComponent::Update(std::string const&, kaldi::CuMatrixBase<float> const&, kaldi::CuMatrixBase<float> const&)kaldi::nnet3::AffineComponent::Backprop(std::string const&, kaldi::nnet3::ComponentPrecomputedIndexes const*, kaldi::CuMatrixBase<float> const&, kaldi::CuMatrixBase<float> const&, kaldi::CuMatrixBase<float> const&, kaldi::nnet3::Component*, kaldi::CuMatrixBase<float>*) constkaldi::nnet3::NnetComputer::ExecuteCommand(int)kaldi::nnet3::NnetComputer::Backward()kaldi::nnet3::NnetChainTrainer::Train(kaldi::nnet3::NnetChainExample const&)nnet3-chain-train(main+0x3f6) [0x820093]/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x2ac01ae9aec5]nnet3-chain-train() [0x81fbd9]WARNING (nnet3-chain-train:ExecuteCommand():nnet-compute.cc:279) Printing some background info since error was detected--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
LOG (nnet3-chain-train:Train():nnet-chain-training.cc:80) Parameter change too big: 9.08328 > --max-param-change=1, scaling by 0.110092LOG (nnet3-chain-train:PrintStatsForThisPhase():nnet-training.cc:147) Average objective function for 'output' for minibatches 23-23 is -0.545507 over 6400 frames.LOG (nnet3-chain-train:Train():nnet-chain-training.cc:80) Parameter change too big: 8.26246 > --max-param-change=1, scaling by 0.121029LOG (nnet3-chain-train:PrintStatsForThisPhase():nnet-training.cc:147) Average objective function for 'output' for minibatches 24-24 is -0.53988 over 6400 frames.LOG (nnet3-chain-train:Train():nnet-chain-training.cc:80) Parameter change too big: 7.5673 > --max-param-change=1, scaling by 0.132148LOG (nnet3-chain-train:PrintStatsForThisPhase():nnet-training.cc:147) Average objective function for 'output' for minibatches 25-25 is -0.493549 over 6400 frames.LOG (nnet3-chain-train:Train():nnet-chain-training.cc:80) Parameter change too big: 11.1135 > --max-param-change=1, scaling by 0.0899806LOG (nnet3-chain-train:PrintStatsForThisPhase():nnet-training.cc:147) Average objective function for 'output' for minibatches 26-26 is -0.498687 over 6400 frames.LOG (nnet3-chain-train:Train():nnet-chain-training.cc:80) Parameter change too big: 18.0382 > --max-param-change=1, scaling by 0.055438LOG (nnet3-chain-train:PrintStatsForThisPhase():nnet-training.cc:147) Average objective function for 'output' for minibatches 27-27 is -0.524272 over 6400 frames.LOG (nnet3-chain-train:Train():nnet-chain-training.cc:80) Parameter change too big: 10.5323 > --max-param-change=1, scaling by 0.0949463LOG (nnet3-chain-train:PrintStatsForThisPhase():nnet-training.cc:147) Average objective function for 'output' for minibatches 28-28 is -0.536255 over 6400 frames.LOG (nnet3-chain-train:Train():nnet-chain-training.cc:80) Parameter change too big: 14.1433 > --max-param-change=1, scaling by 0.0707047LOG (nnet3-chain-train:PrintStatsForThisPhase():nnet-training.cc:147) Average objective function for 'output' for minibatches 29-29 is -0.519953 over 6400 frames.LOG (nnet3-chain-train:Train():nnet-chain-training.cc:80) Parameter change too big: 9.60148 > --max-param-change=1, scaling by 0.104151LOG (nnet3-chain-train:PrintStatsForThisPhase():nnet-training.cc:147) Average objective function for 'output' for minibatches 30-30 is -0.523895 over 6400 frames.LOG (nnet3-chain-train:Train():nnet-chain-training.cc:80) Parameter change too big: 8.13516 > --max-param-change=1, scaling by 0.122923LOG (nnet3-chain-train:PrintStatsForThisPhase():nnet-training.cc:147) Average objective function for 'output' for minibatches 31-31 is -0.518401 over 6400 frames.KALDI_ASSERT: at nnet3-chain-train:PreconditionDirectionsInternal:natural-gradient-online.cc:348, failed: tr_Xhat_XhatT == tr_Xhat_XhatTif (!locked) {  // We're not updating the parameters, either because another thread is  // working on updating them, or because another thread already did so from  // the same or later starting point (making our update stale), or because  // update_period_ > 1.  We just apply the preconditioning and return.
  // note: we don't bother with any locks before incrementing  // num_updates_skipped_ below, because the worst that could happen is that,  // on very rare occasions, we could skip one or two more updates than we  // intended.  num_updates_skipped_++;
  BaseFloat tr_Xt_XtT = TraceMatMat(*X_t, *X_t, kTrans);  KALDI_ASSERT(tr_Xt_XtT == tr_Xt_XtT);  // Check for NaN.  // X_hat_t = X_t - H_t W_t  X_t->AddMatMat(-1.0, H_t, kNoTrans, W_t, kNoTrans, 1.0);  BaseFloat tr_Xt_XtT_bis = TraceMatMat(*X_t, *X_t, kTrans);  KALDI_ASSERT(tr_Xt_XtT_bis == tr_Xt_XtT_bis);  // Check for NaN.  // each element i of row_prod will be inner product of row i of X_hat_t with  // itself.  BaseFloat row_prod_sum = row_prod->Sum();  KALDI_ASSERT(row_prod_sum == row_prod_sum);  // Check for NaN.  row_prod->AddDiagMat2(1.0, *X_t, kNoTrans, 0.0);  BaseFloat tr_Xhat_XhatT = row_prod->Sum();  KALDI_ASSERT(tr_Xhat_XhatT == tr_Xhat_XhatT);  // Check for NaN.  BaseFloat gamma_t = (tr_Xhat_XhatT == 0.0 ? 1.0 :             sqrt(tr_Xt_XtT / tr_Xhat_XhatT));  *scale = gamma_t;  return; }nnet3-chain-train --verbose=5 --apply-deriv-weights=false --max-param-change=1.0 --print-interval=1 "nnet3-am-copy --raw=true --learning-rate=0.00322530991369705 exp/chain/tdnn_2o_sp/790.mdl -|" exp/chain/tdnn_2o_sp/den.fst "ark:nnet3-chain-copy-egs --truncate-deriv-weights=0 --frame-shift=1 ark:exp/chain/tdnn_2o_sp/egs/cegs.225.ark ark:- | nnet3-chain-shuffle-egs --buffer-size=5000 --srand=790 ark:- ark:-| nnet3-chain-merge-egs --minibatch-size=128 ark:- ark:- |" exp/chain/tdnn_2o_sp/791.1.raw