Strange results for pron and silence prob after code resync

210 views
Skip to first unread message

Alex Gurianov

unread,
Oct 30, 2018, 9:17:42 AM10/30/18
to kaldi-help
Hello

I've updated the code to the last Kaldi version and got strange results after rebuilding lang with pronunciation and silence probabilities.

my old Kaldi branch:
commit 64025aa7294dc62edefa3af8f4a4aaf92bf5c25a
Author: Nickolay V. Shmyrev ....
Date:   Wed May 2 05:58:11 2018 +0300
    [scripts] rnnlm scripts: ignore first iteration while looking for the best model (#2399)

the results with it
%WER 31.69 [ 631 / 1991, 119 ins, 116 del, 396 sub ] exp/mono/decode/wer_14
%WER 17.08 [ 340 / 1991, 79 ins, 71 del, 190 sub ] exp/tri1/decode/wer_16
%WER 15.87 [ 316 / 1991, 66 ins, 67 del, 183 sub ] exp/tri2/decode/wer_17
%WER 14.06 [ 280 / 1991, 58 ins, 60 del, 162 sub ] exp/tri3/decode/wer_17
+ silence prob
%WER 9.34 [ 186 / 1991, 54 ins, 35 del, 97 sub ] exp/tri3/decode_sp/wer_15


new Kaldi branch:
commit 8e30fddb300a87e7c79ef2c0b9c731a8a9fd23f0
Author: Hossein Hadian ...
Date:   Sat Oct 20 07:35:35 2018 +0330
    [src] Add support for context independent phones in gmm-init-biphone (for e2e) (#2779)

the results with it:
%WER 32.70 [ 651 / 1991, 97 ins, 148 del, 406 sub ] exp/mono/decode/wer_16
%WER 16.83 [ 335 / 1991, 78 ins, 70 del, 187 sub ] exp/tri1/decode/wer_16
%WER 15.12 [ 301 / 1991, 74 ins, 62 del, 165 sub ] exp/tri2/decode/wer_16
%WER 13.81 [ 275 / 1991, 61 ins, 53 del, 161 sub ] exp/tri3/decode/wer_16
+ silence prob
%WER 21.09 [ 420 / 1991, 157 ins, 37 del, 226 sub ] exp/tri3/decode_sp/wer_13

Here is a part of my recipe. 
#############################
# align tri2
steps/align_si.sh --boost-silence $boost_sil --nj $njobs --cmd "$train_cmd" \
  data/train data/lang exp/tri2 exp/tri2_ali || exit 1;

steps/train_lda_mllt.sh --cmd "$train_cmd" \
    --splice-opts "--left-context=3 --right-context=3" 2500 15000 \
    data/train data/lang exp/tri2_ali exp/tri3 || exit;

utils/mkgraph.sh data/lang_test exp/tri3 exp/tri3/graph

steps/decode.sh --config conf/decode.config --nj $njobs --cmd "$decode_cmd" \
  exp/tri3/graph data/test exp/tri3/decode

# Now we compute the pronunciation and silence probabilities from training data,
# and re-create the lang directory.

steps/get_prons.sh --cmd "$train_cmd" \
     data/train data/lang exp/tri3

utils/dict_dir_add_pronprobs.sh --max-normalize true \
  data/local/dict \
  exp/tri3/pron_counts_nowb.txt exp/tri3/sil_counts_nowb.txt \
  exp/tri3/pron_bigram_counts_nowb.txt data/local/dict_sp

utils/prepare_lang.sh data/local/dict_sp \
"<UNK>" data/local/lang_tmp_sp data/lang_sp

# Prepare G.fst and data/{train,test} directories
local/ag_format_lms.sh --src-dir data/lang_sp data/local || exit

utils/mkgraph.sh data/lang_sp_test_short \
   exp/tri3 exp/tri3/graph_sp

steps/decode.sh --config conf/decode.config --nj $njobs --cmd "$decode_cmd" \
  exp/tri3/graph_sp data/test exp/tri3/decode_sp

##################

Best regards
Alex

Daniel Povey

unread,
Oct 30, 2018, 11:59:57 AM10/30/18
to kaldi...@googlegroups.com
I doubt very much that code differences are to blame here, but I'm not sure what might have caused your problem.  If you still have the lang directories, try to compare their contents, e.g. file sizes, phones.txt, words.txt.

I suggest to try running from scratch, but back up or move your old stuff first.
It was  also suspicious that the silence probs would give you so much improvement originally, normally the changes are very small.
Check your scoring script too (e.g. did you change it?), and look at the output in the new and old setups and see how it differs.  Look for a directory called wer_details where you can see details of the recognition errors that were made.


Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/775e0c6e-2238-4eb8-a4b4-40314f26f37f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alex Gurianov

unread,
Oct 31, 2018, 12:55:18 PM10/31/18
to kaldi...@googlegroups.com
Dear Dan

Thank you for your answer.

Concerning abnormal improvement. I must describe my task and idea, I think.
I need recognize numbers (0-10000) and keywords (like "buy", "help", "delivery" and so on) from short utterances like "I what to buy" "I need tech assistance"

I have two sets of data:
1. Target dataset. 30 speakers, short utterances 3-10 seconds, 120 words, 6 hours.
Recordings are pronunciations of 3 numbers like "one two three", "eleven twenty forty one", one-two words phrases  like "support department" "delivery". 
So I have many variants of every words pronunciation even by one speaker.
Also I extended the lexicon to 590 words. In Russian we have many endings for nouns, adj, verbs so I added mostly ending variations.
2. External dataset. 400 speakers, short utterances 10-30 seconds, 800 words, 50 hours.  Short sentences from fiction books I think. 

I  mix these sets, build graphs with different  LMs (cmu, built on train sentences, built on target sentences). 
I evaluate WER using recordings from target dataset (4 speakers not used in the training).

The idea is to find a balance between relatively good recognition quality on target dataset, minimize false recognition of target words when smth else was pronounced.
As I believe the external dataset gives more alternatives of phones pronunciations and possibly pronunciations of phones not seen in the target dataset required for it extension and helps with false recognition.
I would greatly appreciate it if you kindly give me some feedback on this conclusion and the idea in general.

Returning to the issue.
I've done from the scratch, got almost the same.
The new and the old lang_sp dir (lang with silence and pron probabilities in my recipe) differ in L.fst and L_disambig.fst only.

I use symlink to steps/score_kaldi.sh as a score script.
Here is some results from wer_details. 
old version :   Set1: %WER 9.34 95% Conf Interval [ 7.90, 10.78 ] 
new version:  Set1: %WER 19.04 95% Conf Interval [ 17.10, 20.]
In both  wer_details I see empty speaker name, and there are warnings in stat.log "Use of uninitialized value $SPK in hash element at utils/scoring/wer_per_spk_details.pl line 163, <STDIN> line 3316" (Don't think this is the key)

per_spk  old
SPEAKER        id     #SENT   #WORD   Corr        Sub        Ins        Del         Err          S.Err    
                        raw        410       789        716         44         19         29           92           71
                        sys        410        789      90.75       5.58       2.41       3.68      11.66      17.32
74952002        raw        122        344        319         25         12          0            37         32
74952002        sys        122        344      92.73       7.27       3.49       0.00      10.76      26.23
74952004        raw         97        275        267          6            1              2           9          9
74952004        sys         97        275      97.09       2.18       0.36       0.73       3.27       9.28
74952015        raw        163        480        454         24         15          2            41         33
74952015        sys        163        480      94.58       5.00       3.12       0.42       8.54      20.25
74952023        raw         37        103        101          2               5          0          7          7
74952023        sys         37        103      98.06       1.94       4.85       0.00       6.80      18.92
SUM               raw        829       1991       1857        101         52         33        186        152
SUM               sys        829       1991      93.27       5.07       2.61       1.66       9.34      18.34

per_spk. new 
SPEAKER         id     #SENT   #WORD   Corr        Sub        Ins        Del        Err         S.Err  
                        raw        410        789        582        177         58         30         265        210
                        sys        410        789      73.76      22.43       7.35       3.80      33.59      51.22
74952002        raw        122        344        320         24         18          0           42            33
74952002        sys        122        344      93.02       6.98       5.23       0.00      12.21      27.05
74952004        raw         97        275        264          9            2            2         13             13
74952004        sys         97         275      96.00       3.27       0.73       0.73      4.73      13.40
74952015        raw        163        480        447         32         20          1           53         45
74952015        sys        163        480      93.12       6.67       4.17       0.21      11.04      27.61
74952023        raw         37        103         99          4              2          0            6          5
74952023        sys         37        103      96.12       3.88       1.94       0.00       5.83      13.51
SUM             raw        829        1991       1712        246        100        33        379        306
SUM             sys        829        1991      85.99      12.36       5.02      1.66      19.04      36.91

Unfortunately, I don't see obvious setup errors.

Best regards
Alex


вт, 30 окт. 2018 г. в 18:59, Daniel Povey <dpo...@gmail.com>:
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/cyfz1WuBsd4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
Best regards,
Alex Gurianov

Daniel Povey

unread,
Oct 31, 2018, 1:11:45 PM10/31/18
to kaldi...@googlegroups.com



It's generally better to put only the canonical pronunciations of words in your dictionary, and let the acoustic model handle any variation.

It would be good if you could figure out why that perl script is dying.  It might point to either a bug in that script, or a problem with your setup.

Regarding the reason why your WER is degrading when you update the code: the way I recommend to debug this, is to first check out the older Kaldi version and see if you can replicate the old results; and then look at the outputs and log files step by step and see from what stage they seem to start to differ significantly.  I suspect that you changed something else in your scripts. 
You may be able to narrow it down by doing something like decoding with the current code and the lang_sp directory created by the old code.  That would say whether it's something in decoding+scoring, or in lang directory creation.

Regarding the differences in L.fst and L_disambig.fst: I think at some point about a month ago I changed the script that generates them slightly, to simplify the structure of the FSTs that they output.  But I think they will be equivalent.  You could try to verify this with fstequivalent  --random=true A.fst B.fst

Dan
 

Alex Gurianov

unread,
Nov 1, 2018, 6:33:33 AM11/1/18
to kaldi...@googlegroups.com
>  It would be good if you could figure out why that perl script is dying.  It might point to either a bug in that script, or a problem with your setup.

I have several wav files with Russian symbols in filenames. That was the reason for warnings in stat.log "Use of uninitialized value $SPK in hash element at utils/scoring/wer_per_spk_details.pl line 163". I'm not familiar with perl but sometimes can use all languages Google knows :)  https://github.com/kaldi-asr/kaldi/pull/2811



ср, 31 окт. 2018 г. в 20:11, Daniel Povey <dpo...@gmail.com>:

For more options, visit https://groups.google.com/d/optout.

Daniel Povey

unread,
Nov 1, 2018, 11:50:43 AM11/1/18
to kaldi...@googlegroups.com
Hm.
I notice that wer_per_spk_details.pl tries to figure out whether its input is UTF-8 or some ASCII-compatible encoding.  Your filenames wouldn't be UTF-8, they'd be a special ASCII codepage for Russian.  But I'm concerned that the script could get confused if the actual Russian text in the files is UTF-8; it would then have to treat the entire file as being an ASCII-compatible encoding, which wouldn't be right.
It shouldn't make it crash though-- that would still be a bug.  Please show me the full output of wer_per_spk_details.pl.  The separate debugging steps I suggested for your overall setup still hold.
Dan

Alex Gurianov

unread,
Nov 2, 2018, 5:13:59 AM11/2/18
to kaldi-help
I see proposed changes, let me answer in GitHub ( https://github.com/kaldi-asr/kaldi/pull/2811 )

Concerning the main issue.
I made some changes in the dataset (removed duplicated and useless recordings), so now WERs differ from that I gave in the first post, but still have the issue.

Test 1. Comparison of L.fst from the new and the old code trunk. (not equal) 
openfst-1.6.7/bin/fstequivalent --random=true data/lang_sp/L.fst old/lang_sp/L.fst;  echo $?
outputs 2. Please take into account that old/lang_sp/L.fst was created using openfst-1.6.5 

Test 2.  Comparison of L.fst from the new and the old code trunk.But lang_sp in the new code trunk was built using content of data/local/dict_sp folder created in the old trunk. (not equal)
openfst-1.6.7/bin/fstequivalent --random=true data/lang_sp/L.fst old/lang_sp/L.fst;  echo $?
outputs 2
NEW:
%WER 34.15 [ 680 / 1991, 128 ins, 104 del, 448 sub ] exp/mono/decode/wer_13_0.5
%WER 19.74 [ 393 / 1991, 116 ins, 66 del, 211 sub ] exp/tri1/decode/wer_15_1.0
%WER 19.34 [ 385 / 1991, 113 ins, 81 del, 191 sub ] exp/tri2/decode/wer_17_1.0
%WER 19.49 [ 388 / 1991, 157 ins, 49 del, 182 sub ] exp/tri3/decode/wer_15_1.0
+SP
%WER 24.66 [ 491 / 1991, 203 ins, 37 del, 251 sub ] exp/tri3/decode_sp/wer_13_1.0

OLD:
%WER 32.95 [ 656 / 1991, 112 ins, 119 del, 425 sub ] exp/mono/decode/wer_16_0.0
%WER 20.29 [ 404 / 1991, 109 ins, 71 del, 224 sub ] exp/tri1/decode/wer_16_1.0
%WER 19.29 [ 384 / 1991, 117 ins, 72 del, 195 sub ] exp/tri2/decode/wer_17_1.0
%WER 17.28 [ 344 / 1991, 99 ins, 53 del, 192 sub ] exp/tri3/decode/wer_17_1.0
+SP
%WER 12.41 [ 247 / 1991, 87 ins, 42 del, 118 sub ] exp/tri3/decode_sp/wer_17_1.0

Test 3. Decoding in the new trunk with lang_sp from the old trunk (it helps)

NEW:
%WER 14.82 [ 295 / 1991, 120 ins, 47 del, 128 sub ] exp/tri3/decode_sp/wer_17_1.0

OLD:
%WER 12.41 [ 247 / 1991, 87 ins, 42 del, 118 sub ] exp/tri3/decode_sp/wer_17_1.0


Further steps:
1) testing difference KALDI commits. May be you suggest me the best starting point.
2) Old kaldi uses openfst-1.6.5 and the new uses openfst-1.6.7.

Alex

четверг, 1 ноября 2018 г., 18:50:43 UTC+3 пользователь Dan Povey написал:

Daniel Povey

unread,
Nov 2, 2018, 12:35:53 PM11/2/18
to kaldi...@googlegroups.com

If the data/local_dict_sp folders differ then you wouldn't expect the L.fst to be equivalent.  (Actually I'm not 100% sure that fstequivalent will even tell you that they are equivalent after the script changes, although they should be functionally equivalent).  In your test 1, are the data/local/dict_sp the same, that you created the L.fst from?  If not, you need to trace back and figure out where the original difference was.
 

Alex Gurianov

unread,
Nov 5, 2018, 10:20:46 AM11/5/18
to kaldi-help
In the test 1 they are not the same.
in the test 2 I copied dict_sp from the old branch to check if it helps (didn't help)

пятница, 2 ноября 2018 г., 19:35:53 UTC+3 пользователь Dan Povey написал:

Daniel Povey

unread,
Nov 5, 2018, 10:49:21 AM11/5/18
to kaldi...@googlegroups.com
You're going to have to do things like looking at the pattern of errors in the decoding and how it changed, to see if you can narrow it down.
I very much doubt that it's any kind of script bug or code change that is responsible for this, because it would have affected others, but never say never.


Daniel Povey

unread,
Nov 5, 2018, 11:28:21 AM11/5/18
to kaldi...@googlegroups.com

Actually, if you could send me the two lang dirs, as .tar.gz files or in one .tar.gz faile, that were created from exactly the same inputs with the two different code+script versions, I may be able to debug a bit more.  I you send the input dict-dir, that would be helpful too.

Dan



Alex Gurianov

unread,
Nov 6, 2018, 10:22:03 AM11/6/18
to kaldi...@googlegroups.com
Dear Dan

I've sent you the data. Thank you for your help.

Alex

пн, 5 нояб. 2018 г. в 19:28, Daniel Povey <dpo...@gmail.com>:

For more options, visit https://groups.google.com/d/optout.

Daniel Povey

unread,
Nov 6, 2018, 5:13:43 PM11/6/18
to kaldi...@googlegroups.com
OK, I found a possible cause of the problem.
The new version of the script that creates the lexicon FST mixed up the initial-probabilities of silence and nonsilence.  In your case the initial-sil prob was 0.99 so mixing up 0.99 vs. 0.01 might have made a substantial difference.  
The lexicons still wouldn't have been equivalent even with this fix, since the lexconp_silprob.txt you were using in the new and old setups were different, but this may possibly have been the main cause of the WER differences.
See the PR
which may possibly fix this issue.


Alex Gurianov

unread,
Nov 7, 2018, 5:06:30 AM11/7/18
to kaldi-help
It works for me, thank you.

start point
%WER 15.77 [ 314 / 1991, 94 ins, 57 del, 163 sub ] exp/tri3/decode/wer_16_1.0

+sp before merging 
%WER 19.44 [ 387 / 1991, 129 ins, 48 del, 210 sub ] exp/tri3/decode_sp/wer_15_1.       

+sp after merging 
%WER 11.00 [ 219 / 1991, 79 ins, 46 del, 94 sub ] exp/tri3/decode_sp/wer_17_1.0

Alex

среда, 7 ноября 2018 г., 1:13:43 UTC+3 пользователь Dan Povey написал:
Reply all
Reply to author
Forward
0 new messages