Re: [kaldi-help] About "bad perplexity" problem

411 views
Skip to first unread message

Daniel Povey

unread,
Dec 21, 2016, 1:37:07 AM12/21/16
to kaldi-help
It's trying to train the LM on some data, and I suspect the input data
file (probably data/local/lm/text.no_oov) is empty or does not exist.

Dan



On Tue, Dec 20, 2016 at 10:31 PM, Yeonjong Choi <cyj...@gmail.com> wrote:
> Dear all
>
> Hello, I'm new for using Kaldi toolkit.
> I have a question about perplexity problem.
>
> I'm now trying to make ASR system for "RSR2015" dataset, using
> fisher_english example scripts.
> I prepared data files (wav.scp, spk2gender, utt2spk, spk2utt, text) in each
> of s5/data/test and s5/data/train_all,
> and I commented out line 11, 12 of run.sh (so I don't use
> local/fisher_data_prep.sh).
>
> When I try to run line 27 of run.sh (local/fisher_train_lms.sh),
> the message shows like below :
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Not installing the kaldi_lm toolkit since it is already there.
> Not creating raw N-gram counts ngrams.gz and heldout_ngrams.gz since they
> already exist in data/local/lm/3gram-mincount
> (remove them if you want them regenerated)
> Iteration 1/6 of optimizing discounting parameters
> discount_ngrams: for n-gram order 1, D=0.600000, tau=0.675000 phi=2.000000
> discount_ngrams: for n-gram order 2, D=0.800000, tau=0.675000 phi=2.000000
> discount_ngrams: for n-gram order 3, D=0.000000, tau=0.825000 phi=2.000000
> interpolate_ngrams: 148 words in wordslist
> Perplexity over 0.000000 words is -nan
> Perplexity over 0.000000 words (excluding 0.000000 OOVs) is -nan
>
> real 0m0.012s
> user 0m0.008s
> sys 0m0.060s
> interpolate_ngrams: 148 words in wordslist
> discount_ngrams: for n-gram order 1, D=0.600000, tau=0.900000 phi=2.000000
> discount_ngrams: for n-gram order 2, D=0.800000, tau=0.900000 phi=2.000000
> discount_ngrams: for n-gram order 3, D=0.000000, tau=1.100000 phi=2.000000
> Perplexity over 0.000000 words is -nan
> Perplexity over 0.000000 words (excluding 0.000000 OOVs) is -nan
>
> real 0m0.013s
> user 0m0.000s
> sys 0m0.080s
> discount_ngrams: for n-gram order 1, D=0.600000, tau=1.215000 phi=2.000000
> discount_ngrams: for n-gram order 2, D=0.800000, tau=1.215000 phi=2.000000
> discount_ngrams: for n-gram order 3, D=0.000000, tau=1.485000 phi=2.000000
> interpolate_ngrams: 148 words in wordslist
> Perplexity over 0.000000 words is -nan
> Perplexity over 0.000000 words (excluding 0.000000 OOVs) is -nan
>
> real 0m0.012s
> user 0m0.000s
> sys 0m0.060s
> Bad perplexities . at
> /work1/t2g-shinoda2011/16M31343/kaldi-trunk/egs/fisher_english/rsr/../../../tools/kaldi_lm/optimize_alpha.pl
> line 30.
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Can anyone help me solving this problem?
> Thank you for your help!
>
> Yeonjong Choi
>
> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Yeonjong Choi

unread,
Dec 21, 2016, 1:51:59 AM12/21/16
to kaldi-help, dpo...@gmail.com
Dear Dan

Thank you for your reply.
I've just checked text.no_oov file,
and I found the file is not empty.
Here is the file attached.

Yeonjong

2016년 12월 21일 수요일 오후 3시 37분 7초 UTC+9, Dan Povey 님의 말:
text.no_oov

Daniel Povey

unread,
Dec 21, 2016, 2:07:33 AM12/21/16
to Yeonjong Choi, kaldi-help
Maybe at some previous point you ran that when the input was empty; try doing
rm -r data/local/lm/3gram-mincount
and rerun.

Yeonjong Choi

unread,
Dec 21, 2016, 2:13:03 AM12/21/16
to kaldi-help, cyj...@gmail.com, dpo...@gmail.com
Dear Dan

Your were right, it works fine now!
Thank you so much!

Yeonjong

2016년 12월 21일 수요일 오후 4시 7분 33초 UTC+9, Dan Povey 님의 말:

Amani Jameel

unread,
Jan 6, 2017, 8:07:10 AM1/6/17
to kaldi-help, cyj...@gmail.com, dpo...@gmail.com

I have the same problem of bad perplexity and the text.no_oov  is empty and I tried rm -r data/local/lm/3gram-mincount  but this didn't solve the problem , any ideas hoe to solve this . the following is the error at my terminal.
fstaddselfloops data/lang/phones/wdisambig_phones.int data/lang/phones/wdisambig_words.int
discount_ngrams: for n-gram order 1, D=0.000000, tau=0.000000 phi=1.000000
discount_ngrams: for n-gram order 2, D=0.000000, tau=0.000000 phi=1.000000
discount_ngrams: for n-gram order 3, D=1.000000, tau=0.000000 phi=1.000000

discount_ngrams: for n-gram order 1, D=0.600000, tau=0.675000 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.800000, tau=0.675000 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=0.825000 phi=2.000000
discount_ngrams: for n-gram order 1, D=0.600000, tau=0.900000 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.800000, tau=0.900000 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.100000 phi=2.000000
discount_ngrams: for n-gram order 1, D=0.600000, tau=1.215000 phi=2.000000
discount_ngrams: for n-gram order 2, D=0.800000, tau=1.215000 phi=2.000000
discount_ngrams: for n-gram order 3, D=0.000000, tau=1.485000 phi=2.000000
interpolate_ngrams: 526120 words in wordslist
interpolate_ngrams: 526120 words in wordslist
interpolate_ngrams: 526120 words in wordslist

Perplexity over 0.000000 words is -nan
Perplexity over 0.000000 words (excluding 0.000000 OOVs) is -nan

real    0m0.141s
user    0m0.120s
sys    0m0.008s

Perplexity over 0.000000 words is -nan
Perplexity over 0.000000 words (excluding 0.000000 OOVs) is -nan

real    0m0.139s
user    0m0.120s
sys    0m0.012s

Perplexity over 0.000000 words is -nan
Perplexity over 0.000000 words (excluding 0.000000 OOVs) is -nan

real    0m0.143s
user    0m0.120s
sys    0m0.012s
Bad perplexities   . at ./../../../tools/kaldi_lm/optimize_alpha.pl line 30.
cat: data/train/feats.scp: No such file or directory

Daniel Povey

unread,
Jan 6, 2017, 1:49:53 PM1/6/17
to Amani Jameel, kaldi-help, Yeonjong Choi
The script that uses kaldi_lm to train has some parts after "exit 0"
that document how to use SRILM. SRILM is easier to use and I
recommend you use that instead; install first with
tools/extras/install_srilm.sh.
Reply all
Reply to author
Forward
0 new messages