Hi Users (or probably Dan),
I'm a "medium level" user, where I've used Kaldi for data generation at word/phone level and have converted to sausages, etc, but don't intimately understand how to build my own models. Spent an hour going through past papers/kaldi models/ and searched past forum questions before posting this.
Is there anything as straightforward to use for ASR as Aspire (e.g. download and run)? And ideally something trained on a larger vocabulary (so lower WER).
I define straightforward as, where I can run cmd.sh and path.sh and then run the below with the appropriate exp files. Or maybe change parameters slightly.
online2-wav-nnet3-latgen-faster \
--online=false \
--do-endpointing=false \
--frame-subsampling-factor=3 \
--config=exp/tdnn_7b_chain_online/conf/online.conf \
--max-active=7000 \
--beam=15.0 \
--lattice-beam=6.0 \
--acoustic-scale=1.0 \
--word-symbol-table=exp/tdnn_7b_chain_online/graph_pp/words.txt \
exp/tdnn_7b_chain_online/final.mdl \
exp/tdnn_7b_chain_online/graph_pp/HCLG.fst \
'ark:echo utterance-id1 utterance-id1|' \
'scp:echo utterance-id1 {file_name_wav}|' \
'ark,t:{file_name_lat}'
(Alternatively, would there be hyperparameters to tweak in the above that would lead to NOTABLY different transcription results, either better or worse)
Logical candidates seems like LibriSpeech and Ted-Lium, but:
The LibriSpeech model downloads as a .dms file and doesn't seem to be the same thing.
Ted-Lium is missing the exp folder
Thank you!