Hi all,
I want to do some experiments using CTM files that contain words, timecodes and confidence score for each N-best paths. I can create the file using the lattice and the script steps/get_ctm_conf.sh but the CTM file contains all I need for 1-best path. I'm also interested on the other paths but I'm not sure how to do this (e.g. 100).
Until now, I tried to modify steps/get_ctm_conf.sh:
1) I inserted lattice-to-nbest in the pipeline, without more changes:
lattice-prune ... |
lattice-to-nbest --n=100 ... | lattice-align-words ... | lattice-align-words ... | lattice-to-ctm-conf ... | ...
2) I modified the pipeline:
lattice-prune ... |
lattice-to-nbest --n=100 ... | nbest-to-ctm ... | ...
1) creates the CTM file correctly but there's something wrong because the transcriptions are worse. For instance, the WER of the 1-best transcription of that CTM file is 6 points higher than the WER I obtain from the original CTM file, created with steps/get_ctm_conf.sh without modifications.
2) creates the CTM file without confidence scores and the transcriptions are worse too.
I checked the forum and found similar questions (
https://groups.google.com/forum/#!searchin/kaldi-help/lattice-to-nbest%7Csort:date/kaldi-help/II24CNQYihc/ZcMqCBHJAAAJ) but I don't see clearly how this should be done.
Thank you for any help or advice.