How to set location for Wav file transcription and where is txt file of transcription of the wav file?

95 views

Skip to first unread message

Sage Khan

unread,

Jul 1, 2022, 4:38:15 AM7/1/22

to kaldi-help

I've trained a model in Urdu Language using Urdu CLE corpus. I have a LM ready. Im unable to understand the online decoding portion. How do I set the location of Wav file I want to transcribe? How Can I do batch processing of audio? Where is the output transcript saved as text file?

I kinda need this to figure out the script that can input wav file and give out transcript on the shell as output.

Found some Useful links to it but does not fully answer my question above:

https://kaldi-asr.org/doc/online_programs.html

https://kaldi-asr.org/doc/online_decoding.html

https://www.assemblyai.com/blog/kaldi-speech-recognition-for-beginners-a-simple-tutorial/

https://chrisearch.wordpress.com/2017/03/11/speech-recognition-using-kaldi-extending-and-using-the-aspire-model/#

I also wanted to ask that once the model is trained on kaldi, how do we link it to a recording software or live mic feed for speech to text?

Regards

nick....@avinium.com

unread,

Jul 2, 2022, 1:01:20 AM7/2/22

to kaldi-help

If you want to do batch processing manually, you should create a folder with the same structure as the training data, i.e.

- wav.scp

- utt2spk

then off the top of my head you will need to run:

./utils/mkgraph.sh (if you don't have HCLG.fst)

./steps/make_mfcc.sh (or whatever your features are, make sure you use the right config)

./utils/validate_data_dir.sh

./steps/nnet2/extract_ivectors_online.sh

./steps/nnet3/decode.sh

The transcript will then be in a log file under "decode-dir" option you provide to decode.sh, under log/decode (I think), with a line that looks like:

UTT_ID hello world

For "streaming" there is a binary that lets you feed in audio via TCP, see online2-tcp-nnet3-decode-faster.cc.

Both of these require a bit of knowledge of Kaldi, though, so it's not very beginner-friendly. Some more accessible options are:

- GitHub - alphacep/vosk-api: Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

- GitHub - alumae/kaldi-gstreamer-server: Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.

These are basically wrappers around Kaldi to expose a friendlier API and deal with the complicated "wiring". You just need to put the right files in the right directories and then start the server.

Reply all

Reply to author

Forward

0 new messages