Speech to Text from KALDI FOR DUMMIES

1,228 views
Skip to first unread message

Max Lay

unread,
Aug 29, 2016, 11:10:43 PM8/29/16
to kaldi-help
I have completed the Kaldi for dummies tutorial and would now like to be convert speech to text.
Despite many attempts to merge other examples into this one to build such a system, I have not been able to get anything working.
I see many people have asked similar questions, without any success.
Is anyone able to help me?

Thanks

Daniel Povey

unread,
Aug 29, 2016, 11:14:05 PM8/29/16
to kaldi-help
At this point Kaldi is mainly aimed at people who are to some extent
in the speech recognition industry or at least technically very
competent. From the vague way you're phrasing the question I suspect
you would have a hard time using Kaldi. In the past when I've tried
to answer questions like yours I've regretted it because it leads to
too many follow-up questions.

Dan
> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Danijel Korzinek

unread,
Aug 31, 2016, 4:23:34 AM8/31/16
to kaldi-help
What do you actually need specifically?

There seems to be a problem with people who would like ASR as a turnkey solution. Kaldi in itself is not that. There are such solutions online that use Kaldi, e.g. http://speechkitchen. Also, many people want something like https://cloud.google.com/speech/. There are at least a few of such commercial-grade solutions out there - they're even quite reaonably priced since recently.

If you want to learn about speech recognition, on the other hand, I find what is available in Kaldi not to be that difficult anyway - you can download the whole system and make a running soltuion by running a single script! Maybe you are tyring to do too much at once? Maybe you should start with something smaller and try going elsewhere from there?

Hamza Ahjam

unread,
Aug 31, 2016, 6:24:44 AM8/31/16
to kaldi-help
If you can program you can do it, it's not that hard ;

you need to write a program to generate the text files needed for the decoding (utt2spk, wav.scp ), you don't need the file named text since you don't know in advance what is in the audio file, and then run a decoding script, this script should look like the one in the bottom, and then you can read the result from the log file in the exp/mono/decode/log and exp/tri1/decode/log.

#!/bin/bash

. ./path.sh || exit 1
. ./cmd.sh || exit 1

nj=1         # number of parallel jobs - 1 is perfect for such a small data set
lm_order=1     # language model order (n-gram quantity) - 1 is enough for digits grammar

utils/utt2spk_to_spk2utt.pl data/test/utt2spk > data/test/spk2utt

echo
echo "===== FEATURES EXTRACTION ====="
echo
date
# Making feats.scp files
mfccdir=mfcc
utils/validate_data_dir.sh --no-feats data/test
utils/fix_data_dir.sh data/test 
steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/test exp/make_mfcc/test $mfccdir

# Making cmvn.scp files
steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test $mfccdir

echo
echo "===== MONO DECODING ====="
echo
date
steps/decode.sh --config conf/decode.config --nj $nj --cmd "$decode_cmd" exp/mono/graph data/test exp/mono/decode

echo
echo "===== TRI1 (first triphone pass) DECODING ====="
echo
date
steps/decode.sh --config conf/decode.config --nj $nj --cmd "$decode_cmd" exp/tri1/graph data/test exp/tri1/decode

Max Lay

unread,
Aug 31, 2016, 9:06:51 PM8/31/16
to kaldi-help
Thanks, this is exactly what I needed. Works perfectly.
Reply all
Reply to author
Forward
0 new messages