Accumulating LDA statistics and stuck when training new language

131 views
Skip to first unread message

Alexandra Zhuravlyova

unread,
Jul 11, 2021, 7:59:03 PM7/11/21
to kaldi-help
Hello, I'm training a new language model with own labeled data and I'm facing the following issues (traceback attached as well): 

Reaching this point with training tri2b model with the following parameters (I have relatively small dataset)

steps/train_lda_mllt.sh --cmd "$train_cmd" \
--splice-opts "--left-context=3 --right-context=3" 500 1000 \
data/valid_train data/lang exp/tri1_ali_valid_train exp/tri2b 


The process is getting stuck echoing the following:

steps/train_lda_mllt.sh: Accumulating LDA statistics

Logs are attached.

$cat s5/exp/tri2b/log/lda_est.log 
est-lda --write-full-matrix=exp/tri2b/full.mat --dim=40 exp/tri2b/0.mat exp/tri2b/lda.1.acc exp/tri2b/lda.2.acc exp/tri2b/lda.3.acc exp/tri2b/lda.4.acc exp/tri2b/lda.5.acc
ERROR (est-lda[5.5.951~1-579c9]:Cholesky():tp-matrix.cc:110) Cholesky decomposition failed. Maybe matrix is not positive definite.

[ Stack-Trace: ]
est-lda(kaldi::MessageLogger::LogMessage() const+0xb1a) [0x557fe4e9ce1a]
est-lda(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x21) [0x557fe4e27efb]
est-lda(kaldi::TpMatrix<double>::Cholesky(kaldi::SpMatrix<double> const&)+0x1b1) [0x557fe4e839a7]
est-lda(kaldi::LdaEstimate::Estimate(kaldi::LdaEstimateOptions const&, kaldi::Matrix<float>*, kaldi::Matrix<float>*) const+0x141) [0x557fe4e266a9]
est-lda(main+0x338) [0x557fe4e24ed2]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7ff02b141bf7]
est-lda(_start+0x2a) [0x557fe4e24aba]

$cat lda.1.acc
B<LDAACCS> <VECSIZE> [<NUMCLASSES> 9<ZERO_ACCS> FV 9<FIRST_ACCS> FM 9[<SECOND_ACCS> FP [</LDAACCS>

Also I have checked whether I compiled everything properly (I did) and if OpenFST is installed (it is installed). 

CUDA is configured, data is passing the checks, G.fst is created manually using arpa2fst. 

The only issue are some warnings regarding empty phones, but probably it is not causing the issue. 

The running script doesn't go next to tri3b and seems like I'm missing something important - maybe parameters of the splice opts are set wrong?

steps/train_lda_mllt.sh --cmd "$train_cmd" \
--splice-opts "--left-context=3 --right-context=3" 500 1000 \
data/valid_train data/lang exp/tri1_ali_valid_train exp/tri2b

Thanks upfront!


AK Project

unread,
Dec 7, 2021, 3:35:11 AM12/7/21
to kaldi-help
hallo i have problem same like you, are you already solve this ? thank youu

Daniel Povey

unread,
Dec 7, 2021, 4:42:11 AM12/7/21
to kaldi-help
Likely very tiny amount of data, too small to estimate the LDA.  E.g. just a handful of frames, or zero frames.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/a43b5db1-46db-4e12-842e-29e870609ec2n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages