How to set location for Wav file transcription and where is txt file of transcription of the wav file?

95 views
Skip to first unread message

Sage Khan

unread,
Jul 1, 2022, 4:38:15 AM7/1/22
to kaldi-help
I've trained a model in Urdu Language using Urdu CLE corpus. I have a LM ready. Im unable to understand the online decoding portion. How do I set the location of Wav file I want to transcribe? How Can I do batch processing of audio? Where is the output transcript saved as text file?

I kinda need this to figure out the script that can input wav file and give out transcript on the shell as output.

Found some Useful links to it but does not fully answer my question above:


 
I also wanted to ask that once the model is trained on kaldi, how do we link it to a recording software or live mic feed for speech to text?


Regards

nick....@avinium.com

unread,
Jul 2, 2022, 1:01:20 AM7/2/22
to kaldi-help
If you want to do batch processing manually, you should create a folder with the same structure as the training data, i.e.
- wav.scp
- utt2spk

then off the top of my head you will need to run:
./utils/mkgraph.sh (if you don't have HCLG.fst)
./steps/make_mfcc.sh (or whatever your features are, make sure you use the right config)
./utils/validate_data_dir.sh
./steps/nnet2/extract_ivectors_online.sh
./steps/nnet3/decode.sh

The transcript will then be in a log file under "decode-dir" option you provide to decode.sh, under log/decode (I think), with a line that looks like:
UTT_ID hello world

For "streaming" there is a binary that lets you feed in audio via TCP, see online2-tcp-nnet3-decode-faster.cc.

Both of these require a bit of knowledge of Kaldi, though, so it's not very beginner-friendly. Some more accessible options are:


These are basically wrappers around Kaldi to expose a friendlier API and deal with the complicated "wiring". You just need to put the right files in the right directories and then start the server.
Reply all
Reply to author
Forward
0 new messages