If you want to do batch processing manually, you should create a folder with the same structure as the training data, i.e.
- wav.scp
- utt2spk
then off the top of my head you will need to run:
./utils/mkgraph.sh (if you don't have HCLG.fst)
./steps/make_mfcc.sh (or whatever your features are, make sure you use the right config)
./utils/validate_data_dir.sh
./steps/nnet2/extract_ivectors_online.sh
./steps/nnet3/decode.sh
The transcript will then be in a log file under "decode-dir" option you provide to decode.sh, under log/decode (I think), with a line that looks like:
UTT_ID hello world
For "streaming" there is a binary that lets you feed in audio via TCP, see online2-tcp-nnet3-decode-faster.cc.
Both of these require a bit of knowledge of Kaldi, though, so it's not very beginner-friendly. Some more accessible options are:
These are basically wrappers around Kaldi to expose a friendlier API and deal with the complicated "wiring". You just need to put the right files in the right directories and then start the server.