Strange results for pron and silence prob after code resync

Alex Gurianov

unread,

Oct 30, 2018, 9:17:42 AM10/30/18

to kaldi-help

Hello

I've updated the code to the last Kaldi version and got strange results after rebuilding lang with pronunciation and silence probabilities.

my old Kaldi branch:

commit 64025aa7294dc62edefa3af8f4a4aaf92bf5c25a

Author: Nickolay V. Shmyrev ....

Date: Wed May 2 05:58:11 2018 +0300

[scripts] rnnlm scripts: ignore first iteration while looking for the best model (#2399)

the results with it

%WER 31.69 [ 631 / 1991, 119 ins, 116 del, 396 sub ] exp/mono/decode/wer_14

%WER 17.08 [ 340 / 1991, 79 ins, 71 del, 190 sub ] exp/tri1/decode/wer_16

%WER 15.87 [ 316 / 1991, 66 ins, 67 del, 183 sub ] exp/tri2/decode/wer_17

%WER 14.06 [ 280 / 1991, 58 ins, 60 del, 162 sub ] exp/tri3/decode/wer_17

+ silence prob

%WER 9.34 [ 186 / 1991, 54 ins, 35 del, 97 sub ] exp/tri3/decode_sp/wer_15

new Kaldi branch:

commit 8e30fddb300a87e7c79ef2c0b9c731a8a9fd23f0

Author: Hossein Hadian ...

Date: Sat Oct 20 07:35:35 2018 +0330

[src] Add support for context independent phones in gmm-init-biphone (for e2e) (#2779)

the results with it:

%WER 32.70 [ 651 / 1991, 97 ins, 148 del, 406 sub ] exp/mono/decode/wer_16

%WER 16.83 [ 335 / 1991, 78 ins, 70 del, 187 sub ] exp/tri1/decode/wer_16

%WER 15.12 [ 301 / 1991, 74 ins, 62 del, 165 sub ] exp/tri2/decode/wer_16

%WER 13.81 [ 275 / 1991, 61 ins, 53 del, 161 sub ] exp/tri3/decode/wer_16

+ silence prob

%WER 21.09 [ 420 / 1991, 157 ins, 37 del, 226 sub ] exp/tri3/decode_sp/wer_13

Here is a part of my recipe.

#############################

# align tri2

steps/align_si.sh --boost-silence $boost_sil --nj $njobs --cmd "$train_cmd" \

data/train data/lang exp/tri2 exp/tri2_ali || exit 1;

steps/train_lda_mllt.sh --cmd "$train_cmd" \

--splice-opts "--left-context=3 --right-context=3" 2500 15000 \

data/train data/lang exp/tri2_ali exp/tri3 || exit;

utils/mkgraph.sh data/lang_test exp/tri3 exp/tri3/graph

steps/decode.sh --config conf/decode.config --nj $njobs --cmd "$decode_cmd" \

exp/tri3/graph data/test exp/tri3/decode

# Now we compute the pronunciation and silence probabilities from training data,

# and re-create the lang directory.

steps/get_prons.sh --cmd "$train_cmd" \

data/train data/lang exp/tri3

utils/dict_dir_add_pronprobs.sh --max-normalize true \

data/local/dict \

exp/tri3/pron_counts_nowb.txt exp/tri3/sil_counts_nowb.txt \

exp/tri3/pron_bigram_counts_nowb.txt data/local/dict_sp

utils/prepare_lang.sh data/local/dict_sp \

"<UNK>" data/local/lang_tmp_sp data/lang_sp

# Prepare G.fst and data/{train,test} directories

local/ag_format_lms.sh --src-dir data/lang_sp data/local || exit

utils/mkgraph.sh data/lang_sp_test_short \

exp/tri3 exp/tri3/graph_sp

steps/decode.sh --config conf/decode.config --nj $njobs --cmd "$decode_cmd" \

exp/tri3/graph_sp data/test exp/tri3/decode_sp

##################

Best regards

Alex

Daniel Povey

unread,

Oct 30, 2018, 11:59:57 AM10/30/18

to kaldi...@googlegroups.com

I doubt very much that code differences are to blame here, but I'm not sure what might have caused your problem. If you still have the lang directories, try to compare their contents, e.g. file sizes, phones.txt, words.txt.

I suggest to try running from scratch, but back up or move your old stuff first.

It was also suspicious that the silence probs would give you so much improvement originally, normally the changes are very small.

Check your scoring script too (e.g. did you change it?), and look at the output in the new and old setups and see how it differs. Look for a directory called wer_details where you can see details of the recognition errors that were made.

Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/775e0c6e-2238-4eb8-a4b4-40314f26f37f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alex Gurianov

unread,

Oct 31, 2018, 12:55:18 PM10/31/18

to kaldi...@googlegroups.com

Dear Dan

Thank you for your answer.

Concerning abnormal improvement. I must describe my task and idea, I think.

I need recognize numbers (0-10000) and keywords (like "buy", "help", "delivery" and so on) from short utterances like "I what to buy" "I need tech assistance"

I have two sets of data:

1. Target dataset. 30 speakers, short utterances 3-10 seconds, 120 words, 6 hours.

Recordings are pronunciations of 3 numbers like "one two three", "eleven twenty forty one", one-two words phrases like "support department" "delivery".

So I have many variants of every words pronunciation even by one speaker.

Also I extended the lexicon to 590 words. In Russian we have many endings for nouns, adj, verbs so I added mostly ending variations.

2. External dataset. 400 speakers, short utterances 10-30 seconds, 800 words, 50 hours. Short sentences from fiction books I think.

I mix these sets, build graphs with different LMs (cmu, built on train sentences, built on target sentences).

I evaluate WER using recordings from target dataset (4 speakers not used in the training).

The idea is to find a balance between relatively good recognition quality on target dataset, minimize false recognition of target words when smth else was pronounced.

As I believe the external dataset gives more alternatives of phones pronunciations and possibly pronunciations of phones not seen in the target dataset required for it extension and helps with false recognition.

I would greatly appreciate it if you kindly give me some feedback on this conclusion and the idea in general.

Returning to the issue.

I've done from the scratch, got almost the same.

The new and the old lang_sp dir (lang with silence and pron probabilities in my recipe) differ in L.fst and L_disambig.fst only.

I use symlink to steps/score_kaldi.sh as a score script.

Here is some results from wer_details.

old version : Set1: %WER 9.34 95% Conf Interval [ 7.90, 10.78 ]

new version: Set1: %WER 19.04 95% Conf Interval [ 17.10, 20.]

In both wer_details I see empty speaker name, and there are warnings in stat.log "Use of uninitialized value $SPK in hash element at utils/scoring/wer_per_spk_details.pl line 163, <STDIN> line 3316" (Don't think this is the key)

per_spk old

SPEAKER id #SENT #WORD Corr Sub Ins Del Err S.Err

raw 410 789 716 44 19 29 92 71

sys 410 789 90.75 5.58 2.41 3.68 11.66 17.32

74952002 raw 122 344 319 25 12 0 37 32

74952002 sys 122 344 92.73 7.27 3.49 0.00 10.76 26.23

74952004 raw 97 275 267 6 1 2 9 9

74952004 sys 97 275 97.09 2.18 0.36 0.73 3.27 9.28

74952015 raw 163 480 454 24 15 2 41 33

74952015 sys 163 480 94.58 5.00 3.12 0.42 8.54 20.25

74952023 raw 37 103 101 2 5 0 7 7

74952023 sys 37 103 98.06 1.94 4.85 0.00 6.80 18.92

SUM raw 829 1991 1857 101 52 33 186 152

SUM sys 829 1991 93.27 5.07 2.61 1.66 9.34 18.34

per_spk. new

SPEAKER id #SENT #WORD Corr Sub Ins Del Err S.Err

raw 410 789 582 177 58 30 265 210

sys 410 789 73.76 22.43 7.35 3.80 33.59 51.22

74952002 raw 122 344 320 24 18 0 42 33

74952002 sys 122 344 93.02 6.98 5.23 0.00 12.21 27.05

74952004 raw 97 275 264 9 2 2 13 13

74952004 sys 97 275 96.00 3.27 0.73 0.73 4.73 13.40

74952015 raw 163 480 447 32 20 1 53 45

74952015 sys 163 480 93.12 6.67 4.17 0.21 11.04 27.61

74952023 raw 37 103 99 4 2 0 6 5

74952023 sys 37 103 96.12 3.88 1.94 0.00 5.83 13.51

SUM raw 829 1991 1712 246 100 33 379 306

SUM sys 829 1991 85.99 12.36 5.02 1.66 19.04 36.91

Unfortunately, I don't see obvious setup errors.

Best regards

Alex

вт, 30 окт. 2018 г. в 18:59, Daniel Povey <dpo...@gmail.com>:

You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/cyfz1WuBsd4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyTP86ZA8rzSMx6-6ROhpYYwHjEjLcfkvhGmB2_b2gDcTA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

Best regards,

Alex Gurianov

Daniel Povey

unread,

Oct 31, 2018, 1:11:45 PM10/31/18

to kaldi...@googlegroups.com

It's generally better to put only the canonical pronunciations of words in your dictionary, and let the acoustic model handle any variation.

It would be good if you could figure out why that perl script is dying. It might point to either a bug in that script, or a problem with your setup.

Regarding the reason why your WER is degrading when you update the code: the way I recommend to debug this, is to first check out the older Kaldi version and see if you can replicate the old results; and then look at the outputs and log files step by step and see from what stage they seem to start to differ significantly. I suspect that you changed something else in your scripts.

You may be able to narrow it down by doing something like decoding with the current code and the lang_sp directory created by the old code. That would say whether it's something in decoding+scoring, or in lang directory creation.

Regarding the differences in L.fst and L_disambig.fst: I think at some point about a month ago I changed the script that generates them slightly, to simplify the structure of the FSTs that they output. But I think they will be equivalent. You could try to verify this with fstequivalent --random=true A.fst B.fst

Dan

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAJi7DR%3DaXAypK0um%2BBKuwYX-VBc6C8RjJfPFWo7Mpgf5cy7xDw%40mail.gmail.com.

Alex Gurianov

unread,

Nov 1, 2018, 6:33:33 AM11/1/18

to kaldi...@googlegroups.com

> It would be good if you could figure out why that perl script is dying. It might point to either a bug in that script, or a problem with your setup.

I have several wav files with Russian symbols in filenames. That was the reason for warnings in stat.log "Use of uninitialized value $SPK in hash element at utils/scoring/wer_per_spk_details.pl line 163". I'm not familiar with perl but sometimes can use all languages Google knows :) https://github.com/kaldi-asr/kaldi/pull/2811

ср, 31 окт. 2018 г. в 20:11, Daniel Povey <dpo...@gmail.com>:

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyQ4aZ6BOLih3bZPOUzU%2BERe1xJeC5VxNyyWcD9rsEcN2A%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Daniel Povey

unread,

Nov 1, 2018, 11:50:43 AM11/1/18

to kaldi...@googlegroups.com

Hm.

I notice that wer_per_spk_details.pl tries to figure out whether its input is UTF-8 or some ASCII-compatible encoding. Your filenames wouldn't be UTF-8, they'd be a special ASCII codepage for Russian. But I'm concerned that the script could get confused if the actual Russian text in the files is UTF-8; it would then have to treat the entire file as being an ASCII-compatible encoding, which wouldn't be right.

It shouldn't make it crash though-- that would still be a bug. Please show me the full output of wer_per_spk_details.pl. The separate debugging steps I suggested for your overall setup still hold.

Dan

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAJi7DRmwhcxcS2MZ0ZEZcBM3cejCYO6VLMdF0tBXLrD4j7Xpeg%40mail.gmail.com.

Alex Gurianov

unread,

Nov 2, 2018, 5:13:59 AM11/2/18

to kaldi-help

I see proposed changes, let me answer in GitHub ( https://github.com/kaldi-asr/kaldi/pull/2811 )

Concerning the main issue.

I made some changes in the dataset (removed duplicated and useless recordings), so now WERs differ from that I gave in the first post, but still have the issue.

Test 1. Comparison of L.fst from the new and the old code trunk. (not equal)

openfst-1.6.7/bin/fstequivalent --random=true data/lang_sp/L.fst old/lang_sp/L.fst; echo $?

outputs 2. Please take into account that old/lang_sp/L.fst was created using openfst-1.6.5

Test 2. Comparison of L.fst from the new and the old code trunk.But lang_sp in the new code trunk was built using content of data/local/dict_sp folder created in the old trunk. (not equal)

openfst-1.6.7/bin/fstequivalent --random=true data/lang_sp/L.fst old/lang_sp/L.fst; echo $?

outputs 2

NEW:

%WER 34.15 [ 680 / 1991, 128 ins, 104 del, 448 sub ] exp/mono/decode/wer_13_0.5

%WER 19.74 [ 393 / 1991, 116 ins, 66 del, 211 sub ] exp/tri1/decode/wer_15_1.0

%WER 19.34 [ 385 / 1991, 113 ins, 81 del, 191 sub ] exp/tri2/decode/wer_17_1.0

%WER 19.49 [ 388 / 1991, 157 ins, 49 del, 182 sub ] exp/tri3/decode/wer_15_1.0

+SP

%WER 24.66 [ 491 / 1991, 203 ins, 37 del, 251 sub ] exp/tri3/decode_sp/wer_13_1.0

OLD:

%WER 32.95 [ 656 / 1991, 112 ins, 119 del, 425 sub ] exp/mono/decode/wer_16_0.0

%WER 20.29 [ 404 / 1991, 109 ins, 71 del, 224 sub ] exp/tri1/decode/wer_16_1.0

%WER 19.29 [ 384 / 1991, 117 ins, 72 del, 195 sub ] exp/tri2/decode/wer_17_1.0

%WER 17.28 [ 344 / 1991, 99 ins, 53 del, 192 sub ] exp/tri3/decode/wer_17_1.0

+SP

%WER 12.41 [ 247 / 1991, 87 ins, 42 del, 118 sub ] exp/tri3/decode_sp/wer_17_1.0

Test 3. Decoding in the new trunk with lang_sp from the old trunk (it helps)

NEW:

%WER 14.82 [ 295 / 1991, 120 ins, 47 del, 128 sub ] exp/tri3/decode_sp/wer_17_1.0

OLD:

%WER 12.41 [ 247 / 1991, 87 ins, 42 del, 118 sub ] exp/tri3/decode_sp/wer_17_1.0

Further steps:

1) testing difference KALDI commits. May be you suggest me the best starting point.

2) Old kaldi uses openfst-1.6.5 and the new uses openfst-1.6.7.

Alex

четверг, 1 ноября 2018 г., 18:50:43 UTC+3 пользователь Dan Povey написал:

Daniel Povey

unread,

Nov 2, 2018, 12:35:53 PM11/2/18

to kaldi...@googlegroups.com

If the data/local_dict_sp folders differ then you wouldn't expect the L.fst to be equivalent. (Actually I'm not 100% sure that fstequivalent will even tell you that they are equivalent after the script changes, although they should be functionally equivalent). In your test 1, are the data/local/dict_sp the same, that you created the L.fst from? If not, you need to trace back and figure out where the original difference was.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/45e79db8-1abd-4706-bcf8-7f1f6d29674e%40googlegroups.com.

Alex Gurianov

unread,

Nov 5, 2018, 10:20:46 AM11/5/18

to kaldi-help

In the test 1 they are not the same.

in the test 2 I copied dict_sp from the old branch to check if it helps (didn't help)

пятница, 2 ноября 2018 г., 19:35:53 UTC+3 пользователь Dan Povey написал:

Daniel Povey

unread,

Nov 5, 2018, 10:49:21 AM11/5/18

to kaldi...@googlegroups.com

You're going to have to do things like looking at the pattern of errors in the decoding and how it changed, to see if you can narrow it down.

I very much doubt that it's any kind of script bug or code change that is responsible for this, because it would have affected others, but never say never.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/fea32c2a-5aed-4e3c-90df-a789a01d23af%40googlegroups.com.

Daniel Povey

unread,

Nov 5, 2018, 11:28:21 AM11/5/18

to kaldi...@googlegroups.com

Actually, if you could send me the two lang dirs, as .tar.gz files or in one .tar.gz faile, that were created from exactly the same inputs with the two different code+script versions, I may be able to debug a bit more. I you send the input dict-dir, that would be helpful too.

Dan

Alex Gurianov

unread,

Nov 6, 2018, 10:22:03 AM11/6/18

to kaldi...@googlegroups.com

Dear Dan

I've sent you the data. Thank you for your help.

Alex

пн, 5 нояб. 2018 г. в 19:28, Daniel Povey <dpo...@gmail.com>:

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyQm0Rj7VpbOzJ%2Bq20_3KL18F6Yy%2BY0MEE3q4kwSruuzBw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Daniel Povey

unread,

Nov 6, 2018, 5:13:43 PM11/6/18

to kaldi...@googlegroups.com

OK, I found a possible cause of the problem.

The new version of the script that creates the lexicon FST mixed up the initial-probabilities of silence and nonsilence. In your case the initial-sil prob was 0.99 so mixing up 0.99 vs. 0.01 might have made a substantial difference.

The lexicons still wouldn't have been equivalent even with this fix, since the lexconp_silprob.txt you were using in the new and old setups were different, but this may possibly have been the main cause of the WER differences.

See the PR

https://github.com/kaldi-asr/kaldi/pull/2823

which may possibly fix this issue.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAJi7DRnSG1gFN6QXBS1oEPhjWo7uz40fH4ckNGEBC7yWAmykDA%40mail.gmail.com.

Alex Gurianov

unread,

Nov 7, 2018, 5:06:30 AM11/7/18

to kaldi-help

It works for me, thank you.

start point

%WER 15.77 [ 314 / 1991, 94 ins, 57 del, 163 sub ] exp/tri3/decode/wer_16_1.0

+sp before merging

%WER 19.44 [ 387 / 1991, 129 ins, 48 del, 210 sub ] exp/tri3/decode_sp/wer_15_1.

+sp after merging

%WER 11.00 [ 219 / 1991, 79 ins, 46 del, 94 sub ] exp/tri3/decode_sp/wer_17_1.0

Alex

среда, 7 ноября 2018 г., 1:13:43 UTC+3 пользователь Dan Povey написал:

Reply all

Reply to author

Forward