Speech to Text from KALDI FOR DUMMIES

Max Lay

unread,

Aug 29, 2016, 11:10:43 PM8/29/16

to kaldi-help

I have completed the Kaldi for dummies tutorial and would now like to be convert speech to text.

Despite many attempts to merge other examples into this one to build such a system, I have not been able to get anything working.

I see many people have asked similar questions, without any success.

Is anyone able to help me?

Thanks

Daniel Povey

unread,

Aug 29, 2016, 11:14:05 PM8/29/16

to kaldi-help

At this point Kaldi is mainly aimed at people who are to some extent
in the speech recognition industry or at least technically very
competent. From the vague way you're phrasing the question I suspect
you would have a hard time using Kaldi. In the past when I've tried
to answer questions like yours I've regretted it because it leads to
too many follow-up questions.

Dan

> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Danijel Korzinek

unread,

Aug 31, 2016, 4:23:34 AM8/31/16

to kaldi-help

What do you actually need specifically?

There seems to be a problem with people who would like ASR as a turnkey solution. Kaldi in itself is not that. There are such solutions online that use Kaldi, e.g. http://speechkitchen. Also, many people want something like https://cloud.google.com/speech/. There are at least a few of such commercial-grade solutions out there - they're even quite reaonably priced since recently.

If you want to learn about speech recognition, on the other hand, I find what is available in Kaldi not to be that difficult anyway - you can download the whole system and make a running soltuion by running a single script! Maybe you are tyring to do too much at once? Maybe you should start with something smaller and try going elsewhere from there?

Hamza Ahjam

unread,

Aug 31, 2016, 6:24:44 AM8/31/16

to kaldi-help

If you can program you can do it, it's not that hard ;

you need to write a program to generate the text files needed for the decoding (utt2spk, wav.scp ), you don't need the file named text since you don't know in advance what is in the audio file, and then run a decoding script, this script should look like the one in the bottom, and then you can read the result from the log file in the exp/mono/decode/log and exp/tri1/decode/log.

#!/bin/bash

. ./path.sh || exit 1

. ./cmd.sh || exit 1

nj=1 # number of parallel jobs - 1 is perfect for such a small data set

lm_order=1 # language model order (n-gram quantity) - 1 is enough for digits grammar

utils/utt2spk_to_spk2utt.pl data/test/utt2spk > data/test/spk2utt

echo

echo "===== FEATURES EXTRACTION ====="

echo

date

# Making feats.scp files

mfccdir=mfcc

utils/validate_data_dir.sh --no-feats data/test

utils/fix_data_dir.sh data/test

steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/test exp/make_mfcc/test $mfccdir

# Making cmvn.scp files

steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test $mfccdir

echo

echo "===== MONO DECODING ====="

echo

date

steps/decode.sh --config conf/decode.config --nj $nj --cmd "$decode_cmd" exp/mono/graph data/test exp/mono/decode

echo

echo "===== TRI1 (first triphone pass) DECODING ====="

echo

date

steps/decode.sh --config conf/decode.config --nj $nj --cmd "$decode_cmd" exp/tri1/graph data/test exp/tri1/decode

Max Lay

unread,

Aug 31, 2016, 9:06:51 PM8/31/16

to kaldi-help

Thanks, this is exactly what I needed. Works perfectly.

Reply all

Reply to author

Forward