Errors while generating alignment for speed-perturbed data

crpatel

unread,

Aug 9, 2018, 7:40:03 AM8/9/18

to kaldi-help

HI,

When I try to generate aligenment for the speed perturbed data, 3 of my 32 align_fmllr.sh jobs fails. In addition, 7 other jobs get the
"WARNING (gmm-est-fmllr-gpost[5.4.54~1-22fb]:main():gmm-est-fmllr-gpost.cc:116) Did not find posteriors for utterance" for almost every utterance. Rest of the jobs run as expected. I have checked the trans.JOBS, and the matrix of many speakers are unit matrix. I have checked audio and feature files and everything seems alright to me.

However, this alignment process runs perfectly on non speed-perturbed data using the same model.

Could you please provide some insight on what possibly could have gone wrong?

Note : Out of 3 jobs in Error:

1 of them has Error : "failure detected in destructor."

2 of them has Error : "Failed to read vector from stream. Error reading vector data (binary mode); truncated stream?"

Please find the detailed error as below:

------------------------------

-------------------------------------------------------------------------------------------------------------------------------------------------
WARNING (gmm-est-fmllr-gpost[5.4.54~1-22fb]:main():gmm-est-fmllr-gpost.cc:116) Did not find posteriors for utterance sp1.0-15693-0000380572
WARNING (gmm-est-fmllr-gpost[5.4.54~1-22fb]:Update():fmllr-diag-gmm.cc:162) Not updating fMLLR since below min-count: count is 0
LOG (gmm-est-fmllr-gpost[5.4.54~1-22fb]:main():gmm-est-fmllr-gpost.cc:141) For speaker sp1.0-15693, auxf-impr from fMLLR is -nan, over 0 frames.
LOG (transform-feats[5.4.54~1-22fb]:main():transform-feats.cc:158) Overall average [pseudo-]logdet is -96.0336 over 7.37016e+07 frames.
LOG (transform-feats[5.4.54~1-22fb]:main():transform-feats.cc:161) Applied transform to 92869 utterances; 0 had errors.
LOG (gmm-est-fmllr-gpost[5.4.54~1-22fb]:main():gmm-est-fmllr-gpost.cc:186) Done 6941 files, 85928 with no gposts, 0 with other errors.
LOG (gmm-est-fmllr-gpost[5.4.54~1-22fb]:main():gmm-est-fmllr-gpost.cc:188) Overall fMLLR auxf impr per frame is 3.14036 over 4.12015e+06 frames.
ERROR (gmm-est-fmllr-gpost[5.4.54~1-22fb]:~RandomAccessTableReader():util/kaldi-table-inl.h:2578) failure detected in destructor.

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::RandomAccessTableReader<kaldi::GaussPostHolder>::~RandomAccessTableReader()
main
__libc_start_main
_start

terminate called after throwing an instance of 'std::runtime_error'
what():
bash: line 1: 8736 Done                    ali-to-post "ark:gunzip -c exp/tri4_ali_nodup_sp/pre_ali.15.gz|" ark:-
      8738                       | weight-silence-post 0.0 1:2:3:4:5:6:7:8:9:10 exp/tri4/final.alimdl ark:- ark:-
      8740 Killed                  | gmm-post-to-gpost exp/tri4/final.alimdl "ark,s,cs:apply-cmvn --utt2spk=ark:data/train_nodup_sp/split32/15/utt2spk scp:data/train_nodup_sp/split32/15/cmvn.scp scp:data/train_nodup_sp/split32/15/feats.scp ark:- | splice-feats ark:- ark:- | transform-feats exp/tri4/final.mat ark:- ark:- |" ark:- ark:-
      8741 Aborted                 (core dumped) | gmm-est-fmllr-gpost --fmllr-update-type=full --spk2utt=ark:data/train_nodup_sp/split32/15/spk2utt exp/tri4/final.mdl "ark,s,cs:apply-cmvn --utt2spk=ark:data/train_nodup_sp/split32/15/utt2spk scp:data/train_nodup_sp/split32/15/cmvn.scp scp:data/train_nodup_sp/split32/15/feats.scp ark:- | splice-feats ark:- ark:- | transform-feats exp/tri4/final.mat ark:- ark:- |" ark,s,cs:- ark:exp/tri4_ali_nodup_sp/trans.15
# Accounting: time=1151 threads=1
# Ended (code 134) at Wed Aug 8 23:27:48 IST 2018, elapsed time 1151 seconds
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
gmm-post-to-gpost exp/tri4/final.alimdl 'ark,s,cs:apply-cmvn --utt2spk=ark:data/train_nodup_sp/split32/10/utt2spk scp:data/train_nodup_sp/split32/10/cmvn.scp scp:data/train_nodup_sp/split32/10/feats.scp ark:- | splice-feats ark:- ark:- | transform-feats exp/tri4/final.mat ark:- ark:- |' ark:- ark:-
gmm-est-fmllr-gpost --fmllr-update-type=full --spk2utt=ark:data/train_nodup_sp/split32/10/spk2utt exp/tri4/final.mdl 'ark,s,cs:apply-cmvn --utt2spk=ark:data/train_nodup_sp/split32/10/utt2spk scp:data/train_nodup_sp/split32/10/cmvn.scp scp:data/train_nodup_sp/split32/10/feats.scp ark:- | splice-feats ark:- ark:- | transform-feats exp/tri4/final.mat ark:- ark:- |' ark,s,cs:- ark:exp/tri4_ali_nodup_sp/trans.10
weight-silence-post 0.0 1:2:3:4:5:6:7:8:9:10 exp/tri4/final.alimdl ark:- ark:-
ali-to-post 'ark:gunzip -c exp/tri4_ali_nodup_sp/pre_ali.10.gz|' ark:-
transform-feats exp/tri4/final.mat ark:- ark:-
splice-feats ark:- ark:-
apply-cmvn --utt2spk=ark:data/train_nodup_sp/split32/10/utt2spk scp:data/train_nodup_sp/split32/10/cmvn.scp scp:data/train_nodup_sp/split32/10/feats.scp ark:-
transform-feats exp/tri4/final.mat ark:- ark:-
splice-feats ark:- ark:-
apply-cmvn --utt2spk=ark:data/train_nodup_sp/split32/10/utt2spk scp:data/train_nodup_sp/split32/10/cmvn.scp scp:data/train_nodup_sp/split32/10/feats.scp ark:-
LOG (ali-to-post[5.4.54~1-22fb]:main():ali-to-post.cc:73) Converted 87523 alignments.
LOG (weight-silence-post[5.4.54~1-22fb]:main():weight-silence-post.cc:95) Done 87523 posteriors.
WARNING (gmm-est-fmllr-gpost[5.4.54~1-22fb]:main():gmm-est-fmllr-gpost.cc:116) Did not find posteriors for utterance sp0.9-21931-9502993-1
ERROR (gmm-est-fmllr-gpost[5.4.54~1-22fb]:Read():kaldi-vector.cc:1221) Failed to read vector from stream. Error reading vector data (binary mode); truncated stream? (size = 51) File position at start is -1, currently -1

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::Vector<float>::Read(std::istream&, bool, bool)
kaldi::GaussPostHolder::Read(std::istream&)
kaldi::RandomAccessTableReaderArchiveImplBase<kaldi::GaussPostHolder>::ReadNextObject()
kaldi::RandomAccessTableReaderDSortedArchiveImpl<kaldi::GaussPostHolder>::FindKeyInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
kaldi::RandomAccessTableReaderDSortedArchiveImpl<kaldi::GaussPostHolder>::HasKey(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
kaldi::RandomAccessTableReader<kaldi::GaussPostHolder>::HasKey(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
main
__libc_start_main
_start

WARNING (gmm-est-fmllr-gpost[5.4.54~1-22fb]:Read():posterior.cc:211) Exception caught reading table of posteriors.
WARNING (gmm-est-fmllr-gpost[5.4.54~1-22fb]:ReadNextObject():util/kaldi-table-inl.h:1973) Object read failed, reading archive standard input

Shin XXX

unread,

Aug 9, 2018, 11:10:59 AM8/9/18

to kaldi...@googlegroups.com

I noticed sth unusual, in the first log file, "LOG (gmm-est-fmllr-gpost[5.4.54~1-22fb]:main():gmm-est-fmllr-gpost.cc:141) For speaker sp1.0-15693, auxf-impr from fMLLR is -nan, over 0 frames.", how could this speaker has no frames? Could it be sth wrong with the "make_mfcc" process. Then later, according to the Stack-Trace, gmm-est-fmllr-gpost.cc fails to destruct it gpost reader class(this one"RandomAccessTableReader<kaldi::GaussPostHolder>"). I guess maybe it's caused by some memory problem, something wrong happened in the feature generating steps.

In the second log file, according to the Stack-Trace, gpost reader detects that it has the key(the utterance id, the first column in wav.scp), then it starts to read the data belong to that key(which is provided by "weight-silence-post"), but it reads nothing because the log says "File position at start is -1, currently -1". The warning "Did not find posteriors for utterance" should not cause this problem, because gpost reader will directly skip that key without reading.

It seems like "weight-silence-post" telled "gmm-est-fmllr-gpost" that it found posteriors for an utterance then "gmm-est-fmllr-gpost" tried to read those posteriors and found nothing...

I'd like to add a line in "gmm-est-fmllr-gpost.cc" to print every utt-id to see which utterance cause that Error. I think the simplest way to debug it is to re-run this script on "data/train_nodup_sp/split32/15/" and "data/train_nodup_sp/split32/10/" (use ./utils/data/copy_data_dir.sh to extract split32/15 or split32/10), check if such errors happen again. Repeat these steps until you find that abnormal utterance.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/357415c9-ce6d-44be-b4f4-abfdf445ba8c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Povey

unread,

Aug 9, 2018, 9:51:50 PM8/9/18

to kaldi-help

Probably something went wrong earlier, such as when generating the
speaker independent decoding. I suspect that you may have run that
job twice and one copy was still running when you started the second,
so they were overwriting each others' files.

Dan

> https://groups.google.com/d/msgid/kaldi-help/CADtO4NWsyR4FwSXSe0aZx156ysv9wfOs0GL%3DwY5keY%3DmdOTzTQ%40mail.gmail.com.

crpatel

unread,

Aug 16, 2018, 8:43:42 AM8/16/18

to kaldi-help

"Probably something went wrong earlier, such as when generating the speaker independent decoding. I suspect that you may have run that

job twice and one copy was still running when you started the second,

so they were overwriting each others' files."

==> I couldn't find anything in the logs that suggest anything went wrong at the earlier stage.

I tried to align with the LDA+MLLT model instead of LDA+MLLT+SAT and the alignment worked perfectly. Due to limitation of compute power, I haven't yet tried to find out problematic utterances by recursively aligning the failed jobs with the LDA+MLLT+SAT model.

So with the successful alignment by the LDA+MLLT, I went ahead to train tdnn chain model using this alignment. I assume that alignment quality will not be very sensitive to the WER of the model used for alignment

While tdnn training I get warnings at many iterations which looks like:

[steps/libs/nnet3/train/common.py:134 - get_successful_models - WARNING ] Only 4/5 of the models have been accepted for averaging, based on log files exp/chain/tdnn_sp_v3/log/train.1559.%.log

The corresponding values of objective function are:

exp/chain/tdnn_sp_v3/log/train.1559.1.log:LOG (nnet3-chain-train[5.4.54~1-22fb]:PrintTotalStats():nnet-training.cc:349) Overall average objective function for 'output' is -0.235595 + -0.0127855 = -0.248381 over 497408 frames.

exp/chain/tdnn_sp_v3/log/train.1559.1.log:LOG (nnet3-chain-train[5.4.54~1-22fb]:PrintTotalStats():nnet-training.cc:346) Overall average objective function for 'output-xent' is -1.38552 over 497408 frames.

exp/chain/tdnn_sp_v3/log/train.1559.2.log:LOG (nnet3-chain-train[5.4.54~1-22fb]:PrintTotalStats():nnet-training.cc:349) Overall average objective function for 'output' is -0.106124 + -0.0127876 = -0.118911 over 497408 frames.

exp/chain/tdnn_sp_v3/log/train.1559.2.log:LOG (nnet3-chain-train[5.4.54~1-22fb]:PrintTotalStats():nnet-training.cc:346) Overall average objective function for 'output-xent' is -1.39787 over 497408 frames.

exp/chain/tdnn_sp_v3/log/train.1559.3.log:LOG (nnet3-chain-train[5.4.54~1-22fb]:PrintTotalStats():nnet-training.cc:349) Overall average objective function for 'output' is -0.106789 + -0.0127651 = -0.119554 over 497408 frames.

exp/chain/tdnn_sp_v3/log/train.1559.3.log:LOG (nnet3-chain-train[5.4.54~1-22fb]:PrintTotalStats():nnet-training.cc:346) Overall average objective function for 'output-xent' is -1.39634 over 497408 frames.

exp/chain/tdnn_sp_v3/log/train.1559.4.log:LOG (nnet3-chain-train[5.4.54~1-22fb]:PrintTotalStats():nnet-training.cc:349) Overall average objective function for 'output' is -3.48244 + -0.0124167 = -3.49486 over 491008 frames.

exp/chain/tdnn_sp_v3/log/train.1559.4.log:LOG (nnet3-chain-train[5.4.54~1-22fb]:PrintTotalStats():nnet-training.cc:346) Overall average objective function for 'output-xent' is -0.920236 over 491008 frames.

exp/chain/tdnn_sp_v3/log/train.1559.5.log:LOG (nnet3-chain-train[5.4.54~1-22fb]:PrintTotalStats():nnet-training.cc:349) Overall average objective function for 'output' is -0.106752 + -0.0127227 = -0.119475 over 497408 frames.

exp/chain/tdnn_sp_v3/log/train.1559.5.log:LOG (nnet3-chain-train[5.4.54~1-22fb]:PrintTotalStats():nnet-training.cc:346) Overall average objective function for 'output-xent' is -1.40838 over 497408 frames.

Seems model is diverging.

--------------------------------------------------

In the training log of the the many iterations, I have warnings like:

WARNING (nnet3-chain-train[5.4.54~1-22fb]:BetaGeneralFrameDebug():chain-denominator.cc:412) On time 0, alpha-beta product nan != 128 alpha-dash-sum = 140.8, beta-dash-sum = nan

WARNING (nnet3-chain-train[5.4.54~1-22fb]:BetaGeneralFrameDebug():chain-denominator.cc:425) On time 0, log-prob-deriv sum 124.994 != 128

WARNING (nnet3-chain-train[5.4.54~1-22fb]:BetaGeneralFrameDebug():chain-denominator.cc:428) Excessive error detected, will abandon this minibatch

WARNING (nnet3-chain-train[5.4.54~1-22fb]:ComputeChainObjfAndDeriv():chain-training.cc:214) Objective function is nan and denominator computation (if done) returned false, setting objective function to -10 per frame.

Could you please guide me to understand why my model is diverging. Is it due to some issue in alignment?

Daniel Povey

unread,

Aug 16, 2018, 12:58:57 PM8/16/18

to kaldi-help

I suspect that if you had just re-run the alignment with the
LDA+MLLT+SAT model from scratch, it would have worked. Likely, as I
said, two jobs interfered with each other, or maybe a file system
glitch.

The WER of the chain model will likely be very slightly worse if you
train from LDA+MLLT than with LDA+MLLT+SAT.

It's hard to say why that model might be diverging, without seeing the xconfig.

Dan

> https://groups.google.com/d/msgid/kaldi-help/0f849e8d-18ed-423b-96b4-6cd0da247c1c%40googlegroups.com.

crpatel

unread,

Aug 17, 2018, 12:39:36 AM8/17/18

to kaldi-help

Thank you Dan for your reply.

Please find the attached xconfig files.

network.xconfig

xconfig

xconfig.expanded.1

xconfig.expanded.2

Daniel Povey

unread,

Aug 17, 2018, 2:30:44 PM8/17/18

to kaldi-help

I must have responded to the wrong thread when asking for xconfig
files. There is nothing in those that would cause divergence, but
there is nothing in the rest of the thread to suggest divergence is
the problem, either. The earlier replies cover it.

> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> To post to this group, send email to kaldi...@googlegroups.com.
> To view this discussion on the web visit

> https://groups.google.com/d/msgid/kaldi-help/e54d72a4-c393-4771-850e-993c24ed790a%40googlegroups.com.

crpatel

unread,

Aug 17, 2018, 9:31:31 PM8/17/18

to kaldi-help

Thanks Dan for your reply,

When I re-ran the script on one of the failed job only (data/train_nodup_sp/split32/14/), it was successful. Therefore, I ran the alignment script on entire data hoping success and it is still failing at some jobs (4/32 failed). This time failure is happening at different jobs than earlier failed jobs.

ERROR (gmm-est-fmllr-gpost[5.4.54~1-22fb]:Read():kaldi-vector.cc:1221) Failed to read vector from stream. Error reading vector data (binary mode); truncated stream? (size = 83) File position at start is -1, currently -1

[ Stack-Trace: ]

gmm-est-fmllr-gpost() [0x666fb8]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)

kaldi::MessageLogger::~MessageLogger()

kaldi::Vector<float>::Read(std::istream&, bool, bool)

kaldi::GaussPostHolder::Read(std::istream&)

kaldi::RandomAccessTableReaderArchiveImplBase<kaldi::GaussPostHolder>::ReadNextObject()

kaldi::RandomAccessTableReaderDSortedArchiveImpl<kaldi::GaussPostHolder>::FindKeyInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)

kaldi::RandomAccessTableReaderDSortedArchiveImpl<kaldi::GaussPostHolder>::HasKey(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)

kaldi::RandomAccessTableReader<kaldi::GaussPostHolder>::HasKey(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)

main

__libc_start_main

_start

WARNING (gmm-est-fmllr-gpost[5.4.54~1-22fb]:Read():posterior.cc:211) Exception caught reading table of posteriors.

WARNING (gmm-est-fmllr-gpost[5.4.54~1-22fb]:ReadNextObject():util/kaldi-table-inl.h:1973) Object read failed, reading archive standard input

"I must have responded to the wrong thread when asking for xconfig

files. There is nothing in those that would cause divergence, but
there is nothing in the rest of the thread to suggest divergence is

the problem, either. The earlier replies cover it. "

>> I think I have quoted the wrong text while posting.

If model is not diverging then why I am getting warnings like below in tdnn training that I am doing with the LDA+MLLT alignments?

[steps/libs/nnet3/train/common.py:134 - get_successful_models - WARNING ] Only 4/5 of the models have been accepted for averaging, based on log files exp/chain/tdnn_sp_v3/log/train.1559.%.log

Daniel Povey

unread,

Aug 17, 2018, 10:02:28 PM8/17/18

to kaldi-help

Maybe you are parallelizing using run.pl and using too many jobs for
the machine you have, and some of them are being killed by the Linux
OOM killer.

> To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/0687c211-bf56-4860-a875-6b34c2266d49%40googlegroups.com.

Reply all

Reply to author

Forward