Scoring is not being generated because you didn't have text data in the preparation stage
Jayenthiran Pukuraj <jaya...@gmail.com>: Dec 01 10:28PM -0800
Hi
I have been following the steps recommended in the forums for setting up
Kaldi with the Mini LibriSpeech dataset. I successfully completed the
monophone and triphone training stages.
Here is the workflow I used for testing with my own audio file:
1.
*Prepared the test data:*
- Created wav.scp, utt2spk, and spk2utt files using the unique speaker
ID.
2.
*Extracted MFCC features:*
steps/make_mfcc.sh --nj 1 --cmd run.pl $data_dir
exp/make_mfcc/$file_name mfcc steps/compute_cmvn_stats.sh $data_dir
exp/make_mfcc/$file_name mfcc
3.
*Created the decoding graph (if not already created):*
utils/mkgraph.sh data/lang_nosp_test_tgsmall exp/tri1 $graph_dir
4.
*Decoded the test audio using the triphone model:*
However, I am encountering the following issues:
-
There is no scoring_kaldi folder generated inside the exp/tri1/decode
directory. Instead, I only see the following:
- lat.1.gz
- log
- num_jobs
-
The transcription output is incorrect, even when using a US-based audio
file from the Mini LibriSpeech dataset.
*Questions:*
1. Why is the scoring_kaldi folder not being generated?
2. What could be causing the transcription errors, even for test audio
that matches the training dataset?
[image: Screenshot 2024-12-02 111635.png]
[image: Screenshot 2024-12-02 114428.png]
DECODE1.log:
# gmm-latgen-faster --max-active=7000 --beam=13.0 --lattice-beam=6.0
--acoustic-scale=0.083333 --allow-partial=true
--word-symbol-table=exp/tri1/graph_nosp_tgsmall/words.txt
exp/tri1/final.mdl exp/tri1/graph_nosp_tgsmall/HCLG.fst
"ark,s,cs:apply-cmvn
--utt2spk=ark:data/US-female-conv_test_audio/split1/1/utt2spk
scp:data/US-female-conv_test_audio/split1/1/cmvn.scp
scp:data/US-female-conv_test_audio/split1/1/feats.scp ark:- | add-deltas
ark:- ark:- |" "ark:|gzip -c >
exp/tri1/decode_nosp_tgsmall_US-female-conv/lat.1.gz"
# Started at Mon Dec 2 05:06:35 UTC 2024
#
gmm-latgen-faster --max-active=7000 --beam=13.0 --lattice-beam=6.0
--acoustic-scale=0.083333 --allow-partial=true
--word-symbol-table=exp/tri1/graph_nosp_tgsmall/words.txt
exp/tri1/final.mdl exp/tri1/graph_nosp_tgsmall/HCLG.fst
'ark,s,cs:apply-cmvn
--utt2spk=ark:data/US-female-conv_test_audio/split1/1/utt2spk
scp:data/US-female-conv_test_audio/split1/1/cmvn.scp
scp:data/US-female-conv_test_audio/split1/1/feats.scp ark:- | add-deltas
ark:- ark:- |' 'ark:|gzip -c >
exp/tri1/decode_nosp_tgsmall_US-female-conv/lat.1.gz'
add-deltas ark:- ark:-
apply-cmvn --utt2spk=ark:data/US-female-conv_test_audio/split1/1/utt2spk
scp:data/US-female-conv_test_audio/split1/1/cmvn.scp
scp:data/US-female-conv_test_audio/split1/1/feats.scp ark:-
LOG (apply-cmvn[5.5.1148~1-122a3]:main():apply-cmvn.cc:162) Applied
cepstral mean normalization to 1 utterances, errors on 0
US-female-conv I TELL YOU BY THE
LOG
(gmm-latgen-faster[5.5.1148~1-122a3]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:375)
Log-like per frame for utterance US-female-conv is -9.03805 over 123 frames.
LOG (gmm-latgen-faster[5.5.1148~1-122a3]:main():gmm-latgen-faster.cc:176)
Time taken 0.334557s: real-time factor assuming 100 frames/sec is 0.271997
LOG (gmm-latgen-faster[5.5.1148~1-122a3]:main():gmm-latgen-faster.cc:179)
Done 1 utterances, failed for 0
LOG (gmm-latgen-faster[5.5.1148~1-122a3]:main():gmm-latgen-faster.cc:181)
Overall log-likelihood per frame is -9.03805 over 123 frames.
# Accounting: time=4 threads=1
# Ended (code 0) at Mon Dec 2 05:06:39 UTC 2024, elapsed time 4 seconds
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to kaldi-develope...@googlegroups.com.
Create text. That's the transcription