Regarding Cholesky decomposition failed in Kaldi

1,560 views
Skip to first unread message

Achintya Sarkar

unread,
Mar 4, 2016, 7:40:43 AM3/4/16
to kaldi...@googlegroups.com
Dear Kaldi users,

I am trying to build an i-vector system with Kaldi. It gives me following error during call "sid/train_ivector_extractor.sh" script. However, there was no issue during  gmm training.


sid/train_ivector_extractor.sh --cmd run.pl --mem 4G -l mem_free=8G,ram_free=8G --num-iters 5 exp/full_ubm_64/final.ubm data/train exp/extractor_64
sid/train_ivector_extractor.sh: doing Gaussian selection and posterior computation
Accumulating stats (pass 0)
Summing accs (pass 0)
Updating model (pass 0)
run.pl: job failed, log is in exp/extractor_64/log/update.0.log


exp/extractor_64/log/update.0.log:
===============================

# ivector-extractor-est --num-threads=4 exp/extractor_64/0.ie exp/extractor_64/acc.0 exp/extractor_64/1.ie
# Started at Fri Mar  4 12:25:42 CET 2016
#
ivector-extractor-est --num-threads=4 exp/extractor_64/0.ie exp/extractor_64/acc.0 exp/extractor_64/1.ie
LOG (ivector-extractor-est:main():ivector-extractor-est.cc:55) Reading model
LOG (ivector-extractor-est:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (ivector-extractor-est:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (ivector-extractor-est:main():ivector-extractor-est.cc:59) Reading statistics
LOG (ivector-extractor-est:Update():ivector-extractor.cc:1176) Overall auxf/frame on training data was -10.9148 per frame over 7.92257e+06 frames.
LOG (ivector-extractor-est:UpdateProjections():ivector-extractor.cc:1330) Overall objective function improvement for M (mean projections) was 0.139391 per frame over 7.92257e+06 frames.
WARNING (ivector-extractor-est:Cholesky():tp-matrix.cc:110) Cholesky decomposition failed. Maybe matrix is not positive definite. Throwing error
Cholesky decomposition failed.
# Accounting: time=0 threads=1
# Ended (code 255) at Fri Mar  4 12:25:42 CET 2016, elapsed time 0 seconds


Could you please help on this issue.

Best,
Achintya

Daniel Povey

unread,
Mar 4, 2016, 4:14:00 PM3/4/16
to kaldi-help
Can you get the ivector-extractor-est program in GDB and show us the backtrace of where it failed?
You can do
gdb --args ivector-extractor-est --num-threads=4 exp/extractor_64/0.ie exp/extractor_64/acc.0 exp/extractor_64/1.ie 
(gdb) catch throw
(gdb) r
and when it breaks, type
(gdb) bt
and show us the backtrace.
Are you using the same features that our scripts use, or some other kind of features?
Dan

 

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mag...@naver.com

unread,
Mar 30, 2016, 6:03:11 AM3/30/16
to kaldi-help, sarkar....@gmail.com
i have the same problem when run "sid/train_ivector_extractor.sh" script 
did you solved it? 

2016년 3월 4일 금요일 오후 9시 40분 43초 UTC+9, Achintya Sarkar 님의 말:

Daniel Povey

unread,
Mar 30, 2016, 12:56:58 PM3/30/16
to kaldi-help, sarkar....@gmail.com
I didn't hear more from this person.
I need a backtrace-- run it in gdb using
gdb --args (program) (args)
(gdb) catch throw
(gdb) r
and when you get an error, do
(gdb) bt



--

sarkar....@gmail.com

unread,
Mar 31, 2016, 5:32:11 AM3/31/16
to kaldi-help, dpo...@gmail.com
Hi Dan,

Thanks for your reply.
Sorry for the late response  as I was way. Here is the output  of  ivector-extractor-est in gdb mode.


I am using external feature converted to ark format from htk format.

 gdb --args ivector-extractor-est --num-threads=4
exp/extractor_64/0.ie exp/extractor_64/acc.0 exp/extractor_64/1.ie

GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ivector-extractor-est...done.
(gdb) catch throw
Catchpoint 1 (throw)


(gdb) r
Starting program: /work2/kaldi-trunk/src/ivectorbin/ivector-extractor-est --num-threads=4 exp/extractor_64/0.ie exp/extractor_64/acc.0 exp/extractor_64/1.ie
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
/work2/kaldi-trunk/src/ivectorbin/ivector-extractor-est --num-threads=4 exp/extractor_64/0.ie exp/extractor_64/acc.0 exp/extractor_64/1.ie
LOG (ivector-extractor-est:main():ivector-extractor-est.cc:55) Reading model
LOG (ivector-extractor-est:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
[New Thread 0x7ffff5e52700 (LWP 28412)]
[New Thread 0x7ffff5651700 (LWP 28413)]
[Thread 0x7ffff5e52700 (LWP 28412) exited]
[New Thread 0x7ffff4e50700 (LWP 28415)]
[New Thread 0x7fffeffff700 (LWP 28416)]
[Thread 0x7ffff5651700 (LWP 28413) exited]
[New Thread 0x7ffff5651700 (LWP 28417)]
[New Thread 0x7ffff5e52700 (LWP 28418)]
[New Thread 0x7fffef7fe700 (LWP 28419)]
[New Thread 0x7fffeeffd700 (LWP 28420)]
[New Thread 0x7fffee7fc700 (LWP 28421)]
[Thread 0x7ffff4e50700 (LWP 28415) exited]
[New Thread 0x7fffedffb700 (LWP 28422)]
[Thread 0x7fffeffff700 (LWP 28416) exited]

--- ...  (many files like that display)


LOG (ivector-extractor-est:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (ivector-extractor-est:main():ivector-extractor-est.cc:59) Reading statistics
LOG (ivector-extractor-est:Update():ivector-extractor.cc:1176) Overall auxf/frame on training data was -8.97978 per frame over 7.92257e+06 frames.

(gdb) bt
#0  0x00007ffff749b8b0 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00000000004dcd42 in kaldi::TpMatrix<double>::Cholesky (this=0x7fffffffc690, orig=...) at tp-matrix.cc:112
#2  0x00000000004d74b8 in kaldi::SpMatrix<double>::ApplyFloor (this=0x7fffffffc920, C=..., alpha=1, verbose=false) at sp-matrix.cc:567
#3  0x000000000046c9c7 in kaldi::IvectorExtractorStats::UpdateVariances (this=0x7fffffffd7d0, opts=..., extractor=0x7fffffffd160) at ivector-extractor.cc:1398
#4  0x000000000046b114 in kaldi::IvectorExtractorStats::Update (this=0x7fffffffd7d0, opts=..., extractor=0x7fffffffd160) at ivector-extractor.cc:1186
#5  0x000000000045fc74 in main (argc=5, argv=0x7fffffffdac8) at ivector-extractor-est.cc:63


Best,
Achintya



Daniel Povey

unread,
Mar 31, 2016, 2:45:54 PM3/31/16
to Achintya Sarkar, kaldi-help
OK.  What was happening is that the code floors the variances to a matrix-valued floor (it's done via eigenvalues; I'll skip the math), and the code crashes if that flooring matrix is not positive definite.  That matrix should always be positive definite because it's computed as the average of all the variances (pre-flooring).  However, if you use features that are linearly dependent, the matrix might not be positive definite.  We generally assume that features would never be linearly dependent, but I don't know what features you're using.  Also if the feature variance has some eigenvalues which are much larger than others (e.g. 10^5 times larger or more) it might cause sufficient loss of precision to cause this error.

Anyway, I just pushed a change to the code so that it will be tolerant of this and won't crash, but it indicates that something is wrong with your setup.

Other possible causes are:
  - you had NaNs or inf's in  your data.
  - it was a compilation problem that could have been fixed by `make depend' and make.


Dan

sarkar....@gmail.com

unread,
Apr 1, 2016, 4:01:12 AM4/1/16
to kaldi-help, sarkar....@gmail.com, dpo...@gmail.com
Hi Dan,

Thanks again.  I found that a few  features are linear dependent to other.
The code is now running perfect  excluding the redundancy.


Best,
Achintya

Shuai Wang

unread,
Apr 5, 2017, 4:06:11 AM4/5/17
to kaldi-help, sarkar....@gmail.com, dpo...@gmail.com

So what do you mean by " found a few features linear dependent to others" ? Duplicate features? Thanks
在 2016年4月1日星期五 UTC+8下午4:01:12,sarkar....@gmail.com写道:

Daniel Povey

unread,
Apr 5, 2017, 1:31:17 PM4/5/17
to Shuai Wang, kaldi-help, Achintya Sarkar
Linearly dependent features could mean duplicate or always-zero features, but there are other possibilities too, such as when one feature is a linear combination of a bunch of other features.

Shuai Wang

unread,
Apr 5, 2017, 9:05:44 PM4/5/17
to dpo...@gmail.com, kaldi-help, Achintya Sarkar
Thanks a lot!
--
Best Wishes
Shuai Wang
Ph.D. student
SpeechLab, Shanghai Jiao Tong University

seiten kaku

unread,
Aug 18, 2020, 6:11:09 AM8/18/20
to kaldi-help
I encountered this error when training a cnn-tdnnf model.
But the same feature is also used to train another tdnnf model and this error didn't happened.


Achintya Sarkar於 2016年3月4日星期五 UTC+8下午8時40分43秒寫道:
Reply all
Reply to author
Forward
0 new messages