Hi!
I would like to get the CTM file given an audio file and a transcript file from that audio. I already have prepared and tested, acoustics models (one GMM model and a NNet model) and language models (n-grams and lstm models).
I've read that GMM models works better for aligning transcript files and audio files. For my task I started from segment_long_utterances.sh file and I changed some code to transform the CTM_edits file into a final CTM where the insertions and the silences are cleaned.
It already works well with some audios, but I've seen that with some others I get problems in the aligning task. Some transcripts are not well cleaned and the decode_segmentation.sh file fails.
I want to know if there is some magical kaldi recipe which does the task of aligning taking care about all the possible problems. Or maybe there is a better script to start with rather than segment_long_utterances.sh.
If not, which value of "beam" and "lattice_beam" is enough to force an output from the decode_segmentation file? I've already tried --beam=20.0 --lattice-beam=6.0 and it still fails at some points.
Thanks in advance!!