problem about kws in another database

311 views
Skip to first unread message

minli

unread,
May 29, 2018, 11:16:07 AM5/29/18
to kaldi-help
I modifed the script from wsj/s5/run to do the kws in thchs30 and tedlium datasets,but it finally generated a empty kwslist.xml document like this :
<kwslist kwlist_filename="" language="cantonese" system_id="">
</kwslist>
what's wrong with my script. I have been stucked here few days,please help me.
the script is attached.

3.sh

minli

unread,
May 30, 2018, 11:31:38 AM5/30/18
to kaldi-help
helpppppp

在 2018年5月29日星期二 UTC+2下午5:16:07,minli写道:

Daniel Povey

unread,
May 30, 2018, 1:11:01 PM5/30/18
to kaldi-help
Yenda is traveling right now, he will likely respond soon.
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> To post to this group, send email to kaldi...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kaldi-help/bf8c9cb8-78f1-4820-bc49-4ca3a2a7aac3%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Jan Trmal

unread,
May 30, 2018, 6:27:36 PM5/30/18
to kaldi-help
Start looking at the result files (if they contain some matches) and secondly, I don't think the line 27 makes sense (cat applied on a directory instead of a file)
y.

minli

unread,
May 31, 2018, 3:32:19 PM5/31/18
to kaldi-help
aha, Thanks! I have modified the script(cat applied on a directory), but when I executed utils/write_kwslist.pl, it reports 'Unmapped utterance 49' and  kwslist.xml is empty. I tried several datasets like tedlium and thch30, however everytime it reports the same. kind someone helps me

在 2018年5月31日星期四 UTC+2上午12:27:36,Yenda写道:

Jan Trmal

unread,
May 31, 2018, 3:40:13 PM5/31/18
to kaldi-help
this was already discussed in the kaldi-help list, I think.
It relates to data/kws/utter_map not containing a line mapping numerical id 49 to it's textual representation (for example UTT-1-1 or whatever utterance ids you are using).
y.

minli

unread,
May 31, 2018, 6:44:18 PM5/31/18
to kaldi-help
thanks for your reply. You are right, id 49 can't find it's textual representation, but I still don't know how to map the correct utterance. should I modify the utter_map or utt_id parts in local/kws_data_prep.sh to generate the correct data/kws/utter_map and utter_id so that write_kwslist.pl file can find it's textual representation in utter_map? or what should I do ,the three related file was attached. Thanks again!
best regards,
minli


在 2018年5月31日星期四 UTC+2下午9:40:13,Yenda写道:
kws_data_prep.sh
utter_map
result.txt

minli

unread,
Jun 1, 2018, 10:42:37 AM6/1/18
to kaldi-help
someone helppppppp (T.T)

在 2018年6月1日星期五 UTC+2上午12:44:18,minli写道:

Jan Trmal

unread,
Jun 1, 2018, 10:51:02 AM6/1/18
to kaldi-help
You didn't send your utt_id but it might be you are just using incorrect file.
The mapping file should contain lines looking like
utt-name number
y.

minli

unread,
Jun 1, 2018, 2:52:28 PM6/1/18
to kaldi-help
Yenda, really really thanks for your reply. the utter_id file is attached, but this is the only file which contains the form of "utt-name number", the utter_map hasn't such form. the script file I used was babel/s5b/local/kws_data_prep.sh ,and the script cc.shows how I generate the "utter_id, utter_map  ,keywords.int, kwlist,_invocab.xml,  kwlist_outvocab.xml, and keywords.fsts"  files in Tedlium dataset.
and I also tried another datasets lkie WSJ thch30 , but the Unmapped utterance xx problem still existing.(T.T)
best regards,
minli
utter_id
cc.sh
result.txt
utter_map
kws_data_prep.sh

Jan Trmal

unread,
Jun 1, 2018, 3:04:34 PM6/1/18
to kaldi-help
Then just use the utter_id instead of utter_map, as I was suggesting
y.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

minli

unread,
Jun 1, 2018, 4:12:27 PM6/1/18
to kaldi-help
Thanks!  It works! I change the form of the utter_id file to "numbers utterance"( the original was "utterance numbers ") and I also replace "--map-utter=data/kws/utter_map" to "--map-utter=data/kws/utter_id" . In the next step I need F4DE scores, I have installed F4DE, but I have no idea how to evaluate the result. could you give me a direction? can file "babel/s5b/local/kws_score_f4de.sh" achieve that?  and I saw I have to prepare the ecf file, but where can I get the information about the " <excerpt audio_filename="YOUR_AUDIO_FILENAME" channel="1" tbeg="0.000" dur="483.825" source_type="splitcts"/>"
best regards,
minli

在 2018年6月1日星期五 UTC+2下午9:04:34,Yenda写道:

aliiire...@gmail.com

unread,
Jun 1, 2018, 4:45:55 PM6/1/18
to kaldi-help
With the permission of Yenda

this post is about your last question
---------
in this paper, you can find information about NIST evaluation,

minli

unread,
Jun 3, 2018, 10:21:30 AM6/3/18
to kaldi-help
Thanks. perversely I have set 4 keywords and it finally generated the kwslist.xml file, I don't know why each keyword use the different voice file to make the detection. for example, for the keyword "begin", 5 voice files were used to detect it, but for "can be", it uses 15 files. the kwslist.xml is attached
best regard,
minli

在 2018年6月1日星期五 UTC+2下午10:45:55,aliiire...@gmail.com写道:
ew.xml

Jan Trmal

unread,
Jun 3, 2018, 10:47:09 AM6/3/18
to kaldi-help
Your question does not really make sense to me. The kwslist does not say anything about the files being used to search for the keyword in, it says "the given keyword is --- probably, with the given score--- located in these recordings".  
The ecf file contains the definition of the search collection.
y.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

Jan Trmal

unread,
Jun 3, 2018, 10:52:48 AM6/3/18
to kaldi-help
or more precisely, the ecf file contains the definition of the search collection for the F4DE scoring purposes only.
The search procedure in kaldi searches the whole dataset (defined by the given data/dataset directory).
y.

minli

unread,
Jun 5, 2018, 4:54:34 AM6/5/18
to kaldi-help
Thanks for your reply. I have successfully generated the evaluation report using kws_score.sh. I saw the ATWV value is 0.00 in metrics.txt, but previously the system detected several keywords which are listed in kwslist.xml, I don't know why the values in metrics.txt are so poor: is there something wrong in my program?
ATWV = 0.0000
OTWV = 0.0000
STWV = 1.0000
MTWV = 0.0000, THRESHOLD = 0
Lattice Recall = 1.0000
the related files are attached, thanks for your patient.
best regards,
minli
resultdir.rar

Jan Trmal

unread,
Jun 5, 2018, 8:04:03 AM6/5/18
to kaldi-help
I think your issue is with the data files -- when you look at the kws dir, the stm contains different utterances/files than the utter_id file
Plus, when you look at the sum.txt and alignment.csv in the result dirs, I see that the reference was empty -- otherwise you'd get MISS or CORR entries in csv and sum.txt wouldn't be empty
y.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

minli

unread,
Jun 5, 2018, 9:30:09 AM6/5/18
to kaldi-help
sorry, I don't understand what the stm is, because I don't see any stm file in kws dir. btw, for evaluation, I saw the  kws_score.sh file only need 4 file (utter_id utter_map are not needed):
1. kwslist.xml (it describe the result of the recognition, I use the test dataset  in Tedlium, and I saw all "yes" results was correctly detected, so maybe there is no problem in kws dir ? ) 
2. kwlist.xml (keywords list)
3. ecf.xml (contains all the test dataset in Tedium )
4. rttm (I use train dataset as the parameter of the script instead of test dataset. If I use test datasets, it reports some map problem. e.g "can't find utterance xxxxx " the utterance xxxx is in the training dataset not in the testing dataset. I'm not so sure about this, so I use    train dataset  to generate rttm )
what should I do (T.T)
best regards,
mini
在 2018年6月5日星期二 UTC+2下午2:04:03,Yenda写道:

Jan Trmal

unread,
Jun 5, 2018, 9:49:11 AM6/5/18
to kaldi-help
sorry, I meant rttm file.
That rttm file has to be generated from the same dataset as you search on, as it contains time-aligned word information that is used for scoring. In other words, using this file the scorer works out where the real keywords are located in the audio and if your putative hit is a correct hit or just false alarm.
You cannot just mix files from different datasets hoping it will magically work.
y.

minli

unread,
Jun 5, 2018, 10:55:18 AM6/5/18
to kaldi-help
really really Thank you, I thought I almost close to the truth
1. I tried to use test data to generate rttm(e.g. local/ali_to_rttm.sh data/test data/lang exp/tri3), but log file says "Not processing utterance 911Mothers_2010W-0001727-0001919 because no word transcription found". the "911Mothers file is in the training dataset, so I don't know where is "911Mothers" coming from in this log file. 

2. I just modified the line
to "local/make_L_align.sh data/local/lang/ $lang $lang 2>&1 | tee $dir/log/L_align.log". I'm not sure if data/local/lang is mapping to testing dataset rather than train data. so I guess the generated L_align.fst file is using train dataset not using test dataset, so "911Mother" occurs in the log file.

3. my log file and dir structre is attached.

could you help me out? I don't know which lang dictionary is mapping to the test data indeed.
best regards,
minli

在 2018年6月5日星期二 UTC+2下午3:49:11,Yenda写道:
align_to_words.log
dirstruct.jpg

Jan Trmal

unread,
Jun 5, 2018, 11:38:09 AM6/5/18
to kaldi-help
the command should be something like
. local/ali_to_rttm.sh data/test data/lang exp/tri3/test_ali
where the exp/tri3/test_ali will be obtained by a script call
steps/align...
(I'm intentionally not writing how the command should look like because I have noticed you have the inclination to try random things and you need to figure this one out exactly -- if you look at exp/, there will be probably directory exp/tri3_ali. Then you need to figure out the command that was used  to generate the directory, and you use that command to align your test directory to exp/tri3/test_ali)

y.

minli

unread,
Jun 5, 2018, 2:59:32 PM6/5/18
to kaldi-help
Thanks. I used generate_ail_test.sh script to generate exp/tri3/tri1_ail_test and generated rttm from data/test/ successfully. but the problem still going on. Crying~(T.T)
the related files are attached. what cause that.....
best regards,
minli

在 2018年6月5日星期二 UTC+2下午5:38:09,Yeda写道:
partialFile.rar

minli

unread,
Jun 5, 2018, 3:06:33 PM6/5/18
to kaldi-help
by the way, I found in rttm file :
NON-LEX AimeeMullins_2009P 1 0017.82 0.09 <eps> <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 17.91 0.14 i'd <NA> <NA> <NA>
NON-LEX AimeeMullins_2009P 1 18.05 0.03 <eps> <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 18.08 0.15 like <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 18.23 0.13 to <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 18.36 0.9 share <NA> <NA> <NA>
NON-LEX AimeeMullins_2009P 1 19.26 0.22 <eps> <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 19.48 0.21 with <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 19.69 0.14 you <NA> <NA> <NA>

The audio file names are the same. e.g.  "AimeeMullins_2009P" is not the name "AimeeMullins_2009P-0001782-0002881"in ecf.xml. Do these names lead to ATWV=0? if yes, how can I fix this?
best regards,
minli

Jan Trmal

unread,
Jun 5, 2018, 3:39:16 PM6/5/18
to kaldi-help
yes, the ecf file has to contain the same files as the rttm. Don't forget about the right duration, too
y.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

minli

unread,
Jun 5, 2018, 4:34:55 PM6/5/18
to kaldi-help
But,if that is the problem, how can I modify the audio file name in rttm to map the ecf.xml? I saw the rttm file contains the partial name of utterance in data/test however it was generated automatically by using local/ali_to_rttm.sh data/test data/lang exp/tri1_ali_test .
best regards,
minli

在 2018年6月5日星期二 UTC+2下午9:39:16,Yenda写道:

Jan Trmal

unread,
Jun 5, 2018, 5:08:14 PM6/5/18
to kaldi-help
you will have to write your own script probably.
You can use the wav.scp file and the wav-to-duration (have a look at the --read-whole-file switch)
or you can use the rttm file to obtain pretty much the same information.
Both approaches will lead to a bit different ecf file and will be providing a bit different results, but the difference should be small in normal circumstances. The first (wav.scp and wav-to-duration) is preferable.
y.

minli

unread,
Jun 5, 2018, 5:31:05 PM6/5/18
to kaldi-help
1.  "-read-whole-file switch" what is that mean, Forgive me stupid, I still don't understand the what both methods are.....(>vv<)
2. Did you mean that I should use utils/data/get_uttdur.sh to obtain the ecf.xml? but I have already generated the right ecf.xml and the audio filenames can match to data/test. I think maybe the rttm file audio name is wrong ?
best regard,
minli
在 2018年6月5日星期二 UTC+2下午11:08:14,Yenda写道:

minli

unread,
Jun 5, 2018, 5:45:42 PM6/5/18
to kaldi-help
besides, I saw the code in local/ali_to_rttm.sh, in the end, I can generate the align.txt successfully which contains the full name of utterance. but when I execute the rest of the line to generate the rttm, it just like cut out the suffix of utterance name.

this is part of align.txt:
AimeeMullins_2009P-0001782-0002881 0 9 ; 63654 14 ; 0 3 ; 77881 15 ; 137099 13 ; 122379 90 ; 0 22 ; 148939 21 ; 150939 14 ; 39 4 ; 36871 50 ; 135636 12 ; 63653 8 ; 80734 41 ; 0 58 ; 39 3 ; 47306 16 ; 88850 29 ; 2363 23 ; 147850 17 ; 149794 34 ; 4644 7 ; 7126 34 ; 49288 59 ; 0 14 ; 68032 44 ; 148840 61 ; 0 60 ; 63653 15 ; 4032 22 ; 71608 19 ; 90872 13 ; 135882 52 ; 58171 52 ; 0 19 ; 147806 32 ; 63656 9 ; 149794 17 ; 5766 34 ; 19276 23 ; 0 15 
AimeeMullins_2009P-0002881-0004026 0 25 ; 63654 12 ; 3871 25 ; 47805 64 ; 0 10 ; 40818 32 ; 135669 10 ; 103084 59 ; 0 41 ; 4823 9 ; 63653 5 ; 110968 43 ; 135636 9 ; 63653 8 ; 57378 15 ; 92895 39 ; 96528 43 ; 64977 11 ; 90872 18 ; 77729 61 ; 0 24 ; 79202 26 ; 142786 14 ; 135669 6 ; 149415 48 ; 0 17 ; 36639 71 ; 0 22 ; 137099 8 ; 120878 18 ; 147737 11 ; 63654 17 ; 47744 46 ; 0 80 ; 77184 9 ; 84512 14 ; 110876 18 ; 150939 8 ; 135669 13 ; 43073 48 ; 0 86 
AimeeMullins_2009P-0004141-0004834 0 8 ; 36639 75 ; 0 52 ; 1508 62 ; 0 79 ; 30747 59 ; 0 26 ; 59968 78 ; 0 68 ; 143137 90 ; 0 15 ; 149712 53 ; 0 26 
AimeeMullins_2009P-0004933-0005639 0 6 ; 128637 65 ; 0 31 ; 81205 69 ; 0 25 ; 149629 73 ; 0 75 ; 81886 71 ; 0 59 ; 75236 51 ; 0 48 ; 90771 78 ; 0 53 
AimeeMullins_2009P-0005639-0006581 0 40 ; 117001 88 ; 0 10 ; 149559 35 ; 97744 46 ; 0 62 ; 146928 76 ; 0 53 ; 64855 82 ; 0 60 ; 21708 91 ; 0 55 ; 99697 89 ; 0 40 ; 58101 85 ; 0 28 
AimeeMullins_2009P-0006581-0007519 0 35 ; 121368 78 ; 0 46 ; 33657 77 ; 0 41 ; 75084 26 ; 142786 39 ; 0 24 ; 38227 31 ; 142786 36 ; 0 16 ; 38227 32 ; 49288 45 ; 38227 37 ; 64977 49 ; 0 63 ; 30223 53 ; 142786 41 ; 0 43 ; 29781 44 ; 97744 52 ; 0 28 
AimeeMullins_2009P-0007622-0008535 0 9 ; 120878 22 ; 3884 56 ; 0 10 ; 63194 45 ; 0 28 ; 143137 83 ; 0 23 ; 4823 17 ; 146926 43 ; 0 86 ; 5716 74 ; 0 53 ; 59439 61 ; 0 57 ; 130442 73 ; 0 62 ; 20623 68 ; 0 41 

This is part of rttm:
NON-LEX AimeeMullins_2009P 1 0017.82 0.09 <eps> <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 17.91 0.14 i'd <NA> <NA> <NA>
NON-LEX AimeeMullins_2009P 1 18.05 0.03 <eps> <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 18.08 0.15 like <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 18.23 0.13 to <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 18.36 0.9 share <NA> <NA> <NA>
NON-LEX AimeeMullins_2009P 1 19.26 0.22 <eps> <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 19.48 0.21 with <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 19.69 0.14 you <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 19.83 0.04 a <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 19.87 0.5 discovery <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 20.37 0.12 that <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 20.49 0.08 i <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 20.57 0.41 made <NA> <NA> <NA>
NON-LEX AimeeMullins_2009P 1 20.98 0.58 <eps> <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 21.56 0.03 a <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 21.59 0.16 few <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 21.75 0.29 months <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 22.04 0.23 ago <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 22.27 0.17 while <NA> <NA> <NA>
LEXEME AimeeMullins_2009P 1 22.44 0.34 writing <NA> <NA> <NA>






在 2018年6月5日星期二 UTC+2下午11:31:05,minli写道:

minli

unread,
Jun 5, 2018, 6:03:26 PM6/5/18
to kaldi-help
@.@ maybe audio name in rttm is right because I suddenly saw that wav.scp file only contains "AimeeMullins_2009P sph2pipe -f wav -p /data/kaldi/egs/tedlium/s5_r2/db/TEDLIUM_release2/test/sph/AimeeMullins_2009P.sph |"etc, which includes the short name AimeeMullins_2009P rather than AimeeMullins_2009P-0004141-0004834.Because there are some segments in tedium. 
For now I have no idea how to do the next step, hopefully you can help me out. I used up my ideas(T.T)
best regards,
minli


在 2018年6月5日星期二 UTC+2下午11:45:42,minli写道:

Jan Trmal

unread,
Jun 5, 2018, 6:15:02 PM6/5/18
to kaldi-help
AimeeMullins_2009P is (probably) name of a file
AimeeMullins_2009P-0004141-0004834 is (again, probably) name of an utterance (a small segment of the file)

You can use either whole files or only utterances for the scoring purposes but you have to make your mind and you cannot mix those things together.

I cannot spare much time anymore to help you, i.e. you will have to ask your advisor or your friends/colleagues to help you out -- my personal impression is that you are lost and confused about even very standard concepts in ASR.
y.

Reply all
Reply to author
Forward
0 new messages