Tutorial on how to create a simple ASR system in Kaldi toolkit from scratch using digits corpora (Kaldi for dummies)

14,147 views
Skip to first unread message

Wit Zieliński

unread,
Feb 22, 2016, 9:25:38 AM2/22/16
to kaldi-help
CREATING A SIMPLE ASR SYSTEM IN KALDI TOOLKIT FROM SCRATCH USING SMALL DIGITS CORPORA
IN OTHER WORDS: KALDI FOR DUMMIES

Author: W. Zieliński

--------------
AGENDA
--------------


1.   INTRODUCTION
2.   ENVIRONMENT
3.   DOWNLOAD KALDI
4.   KALDI DIRECTORIES STRUCTURE
5.   YOUR EXEMPLARY PROJECT
6.   DATA PREPARATION
7.   PROJECT FINALIZATION
8.   RUNNING SCRIPTS CREATION
9.   GETTING RESULTS
10.  SUMMARY



----------------------------
1. INTRODUCTION
----------------------------


This is a step by step tutorial for absolute beginners on how to create a simple ASR (Automatic Speech Recognition) system in Kaldi toolkit using your own set of data. I really would have liked to read something like this when I was starting to deal with Kaldi. This is all based on my experience as an amateur in case of speech recognition subject and script programming as well. If you have ever delved through Kaldi tutorial on the official project site and felt a little bit lost, well, my piece of art might be the choice for you. You will learn how to install Kaldi, how to make it work and how to run an ASR system using your own audio data. As an effect you will get your first speech decoding results.

First of all - get to know what Kaldi actually is and why you should use it instead of something else. In my opinion Kaldi requires solid knowledge about speech recognition and ASR systems in general. It is also good to know the basics of script languages (bash, perl, python). C++ migh be useful in the future (probably you will want to make some modifications in the source code).

To read: http://kaldi.sourceforge.net/about.html & http://kaldi.sourceforge.net/tutorial_prereqs.html



---------------------------
2. ENVIRONMENT
---------------------------


Rule number 1 - use Linux. Although it is possible to use Kaldi on Windows, most people I find trustworthy convinced me that Linux will do the job with the less amount of problems. I have chosen Ubuntu 14.10. This was (in 2014/15) a rich and stable Linux representation which I honestly recommend. When you finally have your Linux running properly, please open a terminal and install some necessary stuff (if you do not already have it):

(has to be installed)
atlas – automation and optimization of calculations in the field of linear algebra,
autoconf – automatic software compilation on different operating systems,
automake – creating portable Makefile files,
git – distributed revision control system,
libtool – creating static and dynamic libraries,
svn – revision control system (Subversion), necessary for Kaldi download and installation,
wget – data transfer using HTTP, HTTPS and FTP protocols,
zlib – data compression,

(probably has to be installed)
awk – programming language, used for searching and processing patterns in files and data streams,
bash – Unix shell and script programming language,
grep – command-line utility for searching plain-text data sets for lines matching a regular expression,
make – automatically builds executable programs and libraries from source code,
perl – dynamic programming language, perfect for text files processing.

Done. Operating system and all the necessary Linux tools are ready to go.



---------------------------------
3. DOWNLOAD KALDI
---------------------------------


Just follow the instruction: http://kaldi.sourceforge.net/install.html
If you do not have much idea about how to use GIT, please read about it: http://kaldi.sourceforge.net/tutorial_git.html

I installed Kaldi in this directory (called 'Kaldi root path'): /home/{user}/kaldi-trunk



---------------------------------------------------------
4. KALDI DIRECTORIES STRUCTURE
---------------------------------------------------------


Try to acknowledge where particular Kaldi components are placed. Also it would be nice if you read any "README" files you will find.

'kaldi-trunk' - main Kaldi directory which contains:
'egs' – example scripts allowing you to quickly build ASR systems for over 30 popular speech corporas (documentation is attached for each project),
'misc' – additional tools and supplies, not needed for proper Kaldi functionality,
'src' – Kaldi source code,
'tools' – useful components and external tools,
'windows' – tools for running Kaldi using Windows.

The most important directory for you is obviously 'egs'. Here you will create your own ASR system.



--------------------------------------------------
5. YOUR EXEMPLARY PROJECT
--------------------------------------------------


For the purpose of this tutorial, imagine that you have the same simple set of data as me (described below, in 6.1. AUDIO DATA section). Then try to 'transpose' every action I do straight into your own project. If you completely do not have any audio data or you want to follow my tutorial in an identical way, feel free to record your own tracks - it will be even bigger experience to play with ASR. Here we go.

YOUR PRECONDITION:
You have some amount of audio data that contain only spoken digits (zero, one, two...) by at least several different speakers. Each audio file is an entire spoken sentence (e.g. 'one, nine, five').

YOUR PURPOSE:
You want to divide your data into train and test sets, set up an ASR system, train it, test it and get some decoding results.

YOUT FIRST TASK:
Something to begin with - create a folder 'digits' in kaldi-trunk/egs/ directory. This is a place where you will put all the stuff related to your project.



------------------------------------
6. DATA PREPARATION
------------------------------------

6.1. AUDIO DATA
--------------------------

I assume that you want to set up an ASR system, basing on your own audio data. For example - let it be a set of 100 files. File format is WAV. Each file contains 3 spoken numbers recorded in english language, one by one. Each of these audio files is named in a recognizable way (e.g. 1_5_6.wav, which in my pattern means that the spoken sentence is 'one, five, six') and placed in the recognizable folder representing particular speaker during a particular recording session (there may be a situation that you have recordings of the same person but in two different quality/noise environments - put these in separate folders). So to sum up, my exemplary data set looks like this:
- 10 different speakers (ASR systems must be trained and tested on different speakers, the more speakers you have the better),
- each speaker says 10 sentences,
- 100 senteces/utterances (in 100 *.wav files placed in 10 folders realted to particular speakers - 10 *.wav files in each folder),
- 300 words (digits from zero to nine),
- each sentence/utterance consist of 3 words.

Whatever your first data set is, adjust my example to your particular case. Be careful with big data sets and complex grammars - start with something simple. Sentences that contain only digits are perfect in this case.

TASK:
Go to kaldi-trunk/egs/digits directory and create 'digits_audio' folder. In kaldi-trunk/egs/digits/digits_audio create two folders: 'train' and 'test'. Select one speaker of your choice to represent testing data set. Use this speaker's 'speakerID' as a name for an another new folder in kaldi-trunk/egs/digits/digits_audio/test directory. Then put there all the audio files related to that person. Put the rest (9 speakers) into 'train' folder - this will be your training data set. Also create subfolders for each speaker.



6.2. ACOUSTIC DATA
--------------------------------

Now you have to create some text files that will allow Kaldi to communicate with your audio data. Consider these files as 'must be done'.

TASK:
In kaldi-trunk/egs/digits directory, create a folder 'data'. Then create 'test' and 'train' subfolders inside. Create in each subfolder following files (so you have files named in THE SAME WAY IN 'test' AND 'train' SUBFOLDERS BUT THEY RELATE TO TWO DIFFERENT DATA SETS that you created before):

a.) spk2gender
This file informs about speakers gender. As we assumed, 'speakerID' is a unique name of each speaker (in this case it is also a 'recordingID' - every speaker has only one audio data folder from one recording session). In my example there are 5 female and 5 male speakers (f = female, m = male).

PATTERN: <speakerID> <gender>
----- exemplary spk2gender starts -----
july f
cristine f
dad m
josh m
# and so on...
----- exemplary spk2gender ends -----

b.) wav.scp
This file connects every utterance (sentence said by one person during particular recording session) with an audio file related to this utterance. If you stick to my naming approach, 'utteranceID' is nothing more than 'speakerID' (speaker's folder name) glued with *.wav file name without '.wav' ending (look for examples below).

PATTERN: <uterranceID> <full_path_to_audio_file>
----- exemplary wav.scp starts -----
july_1_2_5 /home/{user}/kaldi-trunk/egs/digits/digits_audio/train/july/1_2_5.wav
july_6_8_3 /home/{user}/kaldi-trunk/egs/digits/digits_audio/train/july/6_8_3.wav
dad_4_4_2 /home/{user}/kaldi-trunk/egs/digits/digits_audio/train/dad/4_4_2.wav
# and so on...
----- exemplary wav.scp ends -----

c.) text
This file contains every utterance matched with its text transcription.

PATTERN: <uterranceID> <text_transcription>
----- exemplary text starts -----
july_1_2_5 one two five
july_6_8_3 six eight three
dad_4_4_2 four four two
# and so on...
----- exemplary text ends -----

d.) utt2spk
This file tells the ASR system which utterance belongs to particular speaker.

PATTERN: <uterranceID> <speakerID>
----- exemplary utt2spk starts -----
july_1_2_5 july
july_6_8_3 july
dad_4_4_2 dad
# and so on...
----- exemplary utt2spk ends -----


e.) corpus.txt
This file has a slightly different directory. In kaldi-trunk/egs/digits/data create another folder 'local'. In kaldi-trunk/egs/digits/data/local create a file corpus.txt which should contain every single utterance transcription that can occur in your ASR system (in our case it will be 100 lines from 100 audio files).

PATTERN: <text_transcription>
----- exemplary corpus.txt starts -----
one two five
six eight three
four four two
# and so on...
----- exemplary corpus.txt ends -----



6.3. LANGUAGE DATA
----------------------------------

This section relates to language modelling files that also need to be considered as 'must be done'. Look for the syntax details here: http://kaldi.sourceforge.net/data_prep.html (each file is precisely described). Also feel free to read some examples in other 'egs' scripts. Now it is the perfect time.

TASK:
In kaldi-trunk/egs/digits/data/local directory, create a folder 'dict'. Then create 'test' and 'train' subfolders inside. In kaldi-trunk/egs/digits/data/local/dict create following files:

a.) lexicon.txt
This file contains every word from your dictionary with its 'phone transcriptions' (taken from /egs/voxforge).

PATTERN: <word> <phone 1> <phone 2> ...
----- exemplary lexicon.txt starts -----
!SIL sil
<UNK> spn
eight ey t
five f ay v
four f ao r
nine n ay n
one hh w ah n
one w ah n
seven s eh v ah n
six s ih k s
three th r iy
two t uw
zero z ih r ow
zero z iy r ow
----- exemplary lexicon.txt ends -----

b.) nonsilence_phones.txt
This file lists nonsilence phones that are present in your project.

PATTERN: <phone>
----- exemplary nonsilence_phones.txt starts -----
ah
ao
ay
eh
ey
f
hh
ih
iy
k
n
ow
r
s
t
th
uw
w
v
z
----- exemplary nonsilence_phones.txt ends -----

c.) silence_phones.txt
This file lists silence phones.

PATTERN: <phone>
----- exemplary silence_phones.txt starts -----
sil
spn
----- exemplary silence_phones.txt ends -----

d.) optional_silence.txt
This file lists optional silence phones.

PATTERN: <phone>
----- exemplary optional_silence.txt starts -----
sil
----- exemplary optional_silence.txt ends -----



------------------------------------------
7. PROJECT FINALIZATION
------------------------------------------

Last chapter before runnig scripts creation. Your project structure will become complete.

7.1. TOOLS ATTACHMENT
----------------------------------------

You need to add necessary Kaldi tools that are widely used in exemplary scripts.

TASK:
From kaldi-trunk/egs/wsj/s5 copy two folders (with the whole content) - 'utils' and 'steps' - and put them in your kaldi-trunk/egs/digits directory. You can also create links to these directories. You may find such links in, for example, kaldi-trunk/egs/voxforge/s5.



7.2. SCORING SCRIPT
----------------------------------

This script will help you to get decoding results.

TASK:
From kaldi-trunk/egs/voxforge/local copy the script score.sh into exactly same location in your project (kaldi-trunk/egs/digits/local).



7.3. CONFIGURATION FILES
-------------------------------------------

It is not necessary to create configuration files but it can be a good habit for future.

TASK:
In kaldi-trunk/egs/digits create a folder 'conf'. Inside kaldi-trunk/egs/digits/conf create two files (for some configuration modifications in decoding and mfcc feature extraction processes - taken from /egs/voxforge):

a.) decode.config

----- exemplary decode.config starts -----
first_beam=10.0
beam=13.0
lattice_beam=6.0
----- exemplary decode.config ends -----

b.) mfcc.conf

----- exemplary mfcc.conf starts -----
--use-energy=false
----- exemplary mfcc.conf ends -----



---------------------------------------------------
8. RUNNING SCRIPTS CREATION
---------------------------------------------------

Your first ASR system written in Kaldi environment is almost ready. Your last job is to prepare running scripts to create ASR system of your choice. I put some comments in prepared scripts for ease of understanding.

These scripts are based on solution used in /egs/voxforge directory. I decided to use two different training methods:
- MONO - monophone training,
- TRI1 - simple triphone training (first triphone pass).
These two methods are enough to show noticable differences in decoding results using only digits lexicon and small training data set.

TASK:
In kaldi-trunk/egs/digits directory create 3 scripts:

a.) cmd.sh

----- cmd.sh script starts here --------------------------------------------------
# Setting local system jobs (local CPU - no external clusters)
export train_cmd=run.pl
export decode_cmd=run.pl
----- cmd.sh script ends here --------------------------------------------------

b.) path.sh

----- path.sh script starts here --------------------------------------------------
# Defining Kaldi root directory
export KALDI_ROOT=`pwd`/../..

# Setting paths to useful tools
export PATH=$PWD/utils/:$KALDI_ROOT/src/bin:$KALDI_ROOT/tools/openfst/bin:$KALDI_ROOT/src/fstbin/:$KALDI_ROOT/src/gmmbin/:$KALDI_ROOT/src/featbin/:$KALDI_ROOT/src/lm/:$KALDI_ROOT/src/sgmmbin/:$KALDI_ROOT/src/sgmm2bin/:$KALDI_ROOT/src/fgmmbin/:$KALDI_ROOT/src/latbin/:$PWD:$PATH

# Defining audio data directory (modify it for your installation directory!)
export DATA_ROOT="/home/{user}/kaldi-trunk/egs/digits/digits_audio"

# Variable that stores path to MITLM library
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(pwd)/tools/mitlm-svn/lib

# Variable needed for proper data sorting
export LC_ALL=C
----- path.sh script ends here --------------------------------------------------

c.) run.sh

----- run.sh script starts here --------------------------------------------------
#!/bin/bash

. ./path.sh || exit 1
. ./cmd.sh || exit 1

nj=1         # number of parallel jobs - 1 is perfect for such a small data set
lm_order=1     # language model order (n-gram quantity) - 1 is enough for digits grammar

# Safety mechanism (possible running this script with modified arguments)
. utils/parse_options.sh || exit 1
[[ $# -ge 1 ]] && { echo "Wrong arguments!"; exit 1; }

# Removing previously created data (from last run.sh execution)
rm -rf exp mfcc data/train/spk2utt data/train/cmvn.scp data/train/feats.scp data/train/split1 data/test/spk2utt data/test/cmvn.scp data/test/feats.scp data/test/split1 data/local/lang data/lang data/local/tmp data/local/dict/lexiconp.txt

echo
echo "===== PREPARING ACOUSTIC DATA ====="
echo

# Needs to be prepared by hand (or using self written scripts):
#
# spk2gender    [<speaker-id> <gender>]
# wav.scp    [<uterranceID> <full_path_to_audio_file>]
# text        [<uterranceID> <text_transcription>]
# utt2spk    [<uterranceID> <speakerID>]
# corpus.txt    [<text_transcription>]

# Making spk2utt files
utils/utt2spk_to_spk2utt.pl data/train/utt2spk > data/train/spk2utt
utils/utt2spk_to_spk2utt.pl data/test/utt2spk > data/test/spk2utt

echo
echo "===== FEATURES EXTRACTION ====="
echo

# Making feats.scp files
mfccdir=mfcc
# utils/validate_data_dir.sh data/train     # script for checking if prepared data is all right
# utils/fix_data_dir.sh data/train          # tool for data sorting if something goes wrong above
steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/train exp/make_mfcc/train $mfccdir
steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/test exp/make_mfcc/test $mfccdir

# Making cmvn.scp files
steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train $mfccdir
steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test $mfccdir

echo
echo "===== PREPARING LANGUAGE DATA ====="
echo

# Needs to be prepared by hand (or using self written scripts):
#
# lexicon.txt        [<word> <phone 1> <phone 2> ...]       
# nonsilence_phones.txt    [<phone>]
# silence_phones.txt    [<phone>]
# optional_silence.txt    [<phone>]

# Preparing language data
utils/prepare_lang.sh data/local/dict "<UNK>" data/local/lang data/lang

echo
echo "===== LANGUAGE MODEL CREATION ====="
echo "===== MAKING lm.arpa ====="
echo

loc=`which ngram-count`;
if [ -z $loc ]; then
     if uname -a | grep 64 >/dev/null; then
        sdir=$KALDI_ROOT/tools/srilm/bin/i686-m64
    else
            sdir=$KALDI_ROOT/tools/srilm/bin/i686
      fi
      if [ -f $sdir/ngram-count ]; then
            echo "Using SRILM language modelling tool from $sdir"
            export PATH=$PATH:$sdir
      else
            echo "SRILM toolkit is probably not installed.
              Instructions: tools/install_srilm.sh"
            exit 1
      fi
fi

local=data/local
ngram-count -order $lm_order -write-vocab $local/tmp/vocab-full.txt -wbdiscount -text $local/corpus.txt -lm $local/tmp/lm.arpa

echo
echo "===== MAKING G.fst ====="
echo

lang=data/lang
cat $local/tmp/lm.arpa | arpa2fst - | fstprint | utils/eps2disambig.pl | utils/s2eps.pl | fstcompile --isymbols=$lang/words.txt --osymbols=$lang/words.txt --keep_isymbols=false --keep_osymbols=false | fstrmepsilon | fstarcsort --sort_type=ilabel > $lang/G.fst

echo
echo "===== MONO TRAINING ====="
echo

steps/train_mono.sh --nj $nj --cmd "$train_cmd" data/train data/lang exp/mono  || exit 1

echo
echo "===== MONO DECODING ====="
echo

utils/mkgraph.sh --mono data/lang exp/mono exp/mono/graph || exit 1
steps/decode.sh --config conf/decode.config --nj $nj --cmd "$decode_cmd" exp/mono/graph data/test exp/mono/decode

echo
echo "===== MONO ALIGNMENT ====="
echo

steps/align_si.sh --nj $nj --cmd "$train_cmd" data/train data/lang exp/mono exp/mono_ali || exit 1

echo
echo "===== TRI1 (first triphone pass) TRAINING ====="
echo

steps/train_deltas.sh --cmd "$train_cmd" 2000 11000 data/train data/lang exp/mono_ali exp/tri1 || exit 1

echo
echo "===== TRI1 (first triphone pass) DECODING ====="
echo

utils/mkgraph.sh data/lang exp/tri1 exp/tri1/graph || exit 1
steps/decode.sh --config conf/decode.config --nj $nj --cmd "$decode_cmd" exp/tri1/graph data/test exp/tri1/decode

echo
echo "===== run.sh script is finished ====="
echo
----- run.sh script ends here --------------------------------------------------



----------------------------------
9. GETTING RESULTS
----------------------------------

TASK:
Now all you have to do is to run run.sh script. If I have made any mistakes in this tutorial, logs from the terminal should guide you how to deal with it.

Besides the fact that you will notice some decoding results in the terminal window, go to newly made 'kaldi-trunk/egs/digits/exp'. You may notice there folders with 'mono' and 'tri1' results as well - directories structure are the same. Got to 'mono/decode' directory. Here you may find result files (named in a 'wer_{number}' way). Logs for decoding process may be found in 'log' folder (same directory).

----------------------
10. SUMMARY
----------------------


This is just an example. The point of this short tutorial is to show you how to create 'anything' in Kaldi and to get a better understanding of how to think while using this toolkit. Personally I started with looking for tutorials made by the Kaldi authors/developers. After succesful Kaldi installation I launched some example scripts (Yesno, Voxforge, LibriSpeech - they are relatively easy and have free acoustic/language data to download - I used these three as a base for my own scripts).

Make sure you follow http://kaldi.sourceforge.net/index.html (now moving to http://kaldi-asr.org/) - official project website. There are two very useful sections for beginners inside:
a.) http://kaldi.sourceforge.net/tutorial.html - almost 'step by step' tutorial on how to set up an ASR system; up to some point this can be done without RM dataset. It is good to read it,
b.) http://kaldi.sourceforge.net/data_prep.html - very detailed explaination of how to use your own data in Kaldi.

More useful links about Kaldi I found:
https://sites.google.com/site/dpovey/kaldi-lectures - Kaldi lectures created by the main author
http://www.superlectures.com/icassp2011/category.php?lang=en&id=131 - similar; video version
http://www.diplomovaprace.cz/133/thesis_oplatek.pdf - some master diploma thesis about speech recognition using Kaldi


---
This is all from my side. Good luck!

Olumide

unread,
Feb 22, 2016, 1:35:15 PM2/22/16
to kaldi...@googlegroups.com
Brilliant!!!

Thanks Wit!

On 22/02/2016 14:25, Wit Zieliński wrote:
> *CREATING A SIMPLE ASR SYSTEM IN KALDI TOOLKIT FROM SCRATCH USING SMALL
> DIGITS CORPORA*
> *IN OTHER WORDS: KALDI FOR DUMMIES*
>
> *snip*

Eric Riebling

unread,
Feb 23, 2016, 10:37:06 AM2/23/16
to kaldi-help
I can't stress strongly enough how helpful something like this would have been for me as well.

The first-timer is presented with 43 topics of Kaldi documentation at the same level with no
particular order, hierarchy, or optimum path to introduce first concepts then build upon them.
(Maybe an unavoidable consequence of being more of a bottom-up reference for a toolkit
than a textbook)

Daniel Povey

unread,
Feb 23, 2016, 4:14:27 PM2/23/16
to kaldi-help
If someone can come up with a suggestion for how to incorporate this into the Kaldi Doxygen documentation, and eventually turn this into a pull request, that would be great.  I don't have time right now to give much attention to this.
Dan


--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jan Trmal

unread,
Feb 24, 2016, 1:13:25 PM2/24/16
to kaldi-help
Guys, please help me understand what is not right with the tutorial we have?
Is it that it is "hidden" in the docs? Or that it's on non-free corpus? Or is it non-elementary enough?
If we manage to figure out this and perhaps come iwth a single tutorial, I think I could create a direct link at the kaldi-asr.org web page, so it would be easier accesible.
I welcome and appreciate any effort on this, so I don't want to sound discouraging -- I'm just trying understnad why, for example, Wit spent the time on it? Wit, did you feel that the "original" kaldi tutorial is hard to follow? 
I welcome thoughts on this.
Y.

Daniel Galvez

unread,
Feb 24, 2016, 10:39:03 PM2/24/16
to kaldi-help
Hi everyone,

To answer some of Yenda's questions, when I began learning Kaldi (long before I had met any of the developers), my largest problem was that resource management was non-free. This left me unable to follow along very effectively, as the only other small "free" corpus I knew of at the time was the yesno corpus, which is... rather simple. Meanwhile, I wouldn't be surprised if free datasets like TEDLIUM are simply too large for a beginner (takes too many resources to train a model, not to mention I believe the run.sh by default trains a DNN, which a newcomer may not necessarily mean to do.) Much of the process in Kaldi has to do with the data; the directory structure of the example scripts is strongly-coupled with the project, but I didn't get to see tht from just reading along. You can't just read the source-code and understand the project like you might for something less data driven.

(By the way I later found the Berkeley Meeting Room Digits corpus, which is available for free and a little less trivial than yesno. I did hack together my own script for it, but never thought about merging it. Maybe I could free some bandwidth for that.)

Otherwise, I found the documentation and tutorial fine, coming from things like HTK and HTS which had respectively no sample scripts and no documentation :P

Daniel

Wit Zieliński

unread,
Feb 25, 2016, 7:16:22 PM2/25/16
to kaldi-help
Hello,

the main reason I created this tutorial is that my master thesis (which was made with Kaldi help) supervisor asked me if I could prepare something like this. And in my opinion it has been a reasonable request, at least for a specific group of users. I talked to many students that started working with Kaldi from scratch. Almost for everyone of them it has been rather difficult than easy. For me it was difficult also. Not every university has access to RM data set, so the official tutorial can be done only up to some point. I started my work with Kaldi trying to run example scripts on free corporas which I could download immediately. Then, reading intensively DATA PREPARATION part, I figured out how to set my own ASR system. But it took me a large amount of time just to run anything (not to mention that it gave me less time to actually develop my project). Of course, maybe it was caused by my own laziness and lack of experience in programming/IT/ASR (at that time, a year ago). But still, when I learn something completely new I prefer to first do it with 'step by step' instruction and then try to understand it and expand.

On the other hand, in the project introduction you warn newcomers that Kaldi was made rather for developers and researchers (for people that have experience with ASR and different speech toolkits). So, everyone who doesn't define himself as one, deals with it at his own risk. I think that current tutorial is not bad. It is just not necessarily adequate for absolute newcomers (usually quiet ambitious students without proper knowledge and experience).

Best regards,
Witek

Jan Trmal

unread,
Mar 3, 2016, 2:44:49 PM3/3/16
to kaldi-help
Guys, to continue on this discussion, how important would be the corpus is english?
If the feeling is that it should be english speech, we could probably do a small version (10 hrs) of TED or perhaps even better librispeech. If the feeling is that it does not have to be english, I'm working on a recipe for Iban, which is 7+1hrs (train+test) of speech.
y.

李健

unread,
Mar 3, 2016, 9:15:44 PM3/3/16
to Kaldi-help
Hi,
In my opinion, there is not much difference whether it is English or not. 

What I am concerned about is that for this tutorial, will it involve more details about how to choose hyper parameters according to different size of corpus? In my personal opinion, it's easier to get some proper size of corpus for the newcomers, for example, thchs30 (Mandarin corpus, 30 hours), vystadial_en (English corpus, 41 hours), Czech (15 hours) and so on. They all have the corresponding examples in the Kaldi and I think it is enough to help newcomers get a start and be familiar with the procedures of building an ASR system using Kaldi. So, I think it maybe not very meaningful to write a new example script which is just similar to the above examples.
  
As a newbie in ASR domain, I have been learning Kaldi for about one year. For me, the most difficult part is that how to adjust the parameters to get a makeshift acoustic model. I have been doing some experiments on my own data. When I trained the conventional GMM part, the preliminary result seemed very bad (%WER 54 for triphone using the default parameters of 'rm' example). Then I have tried to increase the training iterations and realigning iterations for monophone and used different configurations of num-leaves and tot-gauss, but it didn't work. There are nearly hundreds of options that I can try but I even can't see the rough direction.

I know it needs very much experience for training a proper model for your own data and it is rude to ask a non-profit team to teach those newbies how to train acoustic model from scratch and tell them some experienced skills and tricks. But considering that this tutorial is customized for newcomers, I think it will be better to involve some advice about how to adjust some key parameters when one trains the model using his own data. Because many beginners have been stuck there and can't move further.

If I have bothered someone, I am sorry and just ignore this letter. 

Thanks,
Jian 

Olumide

unread,
Mar 4, 2016, 8:12:10 AM3/4/16
to kaldi...@googlegroups.com
Excellent feedback Jian.

Please consider contributing your ideas on parameter configuration to
this or any other tutorial. It will be useful to newbies like me in the
future. (I'm plan to use Kaldi a lot in the coming months.)

- Olumide
> <http://www1.icsi.berkeley.edu/Speech/mr/mrdigits.html>,
> which is available for free and a little less trivial than
> yesno. I did hack together my own script for it, but never
> thought about merging it. Maybe I could free some bandwidth
> for that.)
>
> Otherwise, I found the documentation and tutorial fine,
> coming from things like HTK and HTS which had respectively
> no sample scripts and no documentation :P
>
> Daniel
>
> On Wednesday, February 24, 2016 at 1:13:25 PM UTC-5, Yenda
> wrote:
>
> Guys, please help me understand what is not right with
> the tutorial we have?
> Is it that it is "hidden" in the docs? Or that it's on
> non-free corpus? Or is it non-elementary enough?
> If we manage to figure out this and perhaps come iwth a
> single tutorial, I think I could create a direct link at
> the kaldi-asr.org <http://kaldi-asr.org> web page, so it
> would be easier accesible.
> I welcome and appreciate any effort on this, so I don't
> want to sound discouraging -- I'm just trying understnad
> why, for example, Wit spent the time on it? Wit, did you
> feel that the "original" kaldi tutorial is hard to follow?
> I welcome thoughts on this.
> Y.
>
> On Tue, Feb 23, 2016 at 4:14 PM, Daniel Povey
> <dpo...@gmail.com> wrote:
>
> If someone can come up with a suggestion for how to
> incorporate this into the Kaldi Doxygen
> documentation, and eventually turn this into a pull
> request, that would be great. I don't have time
> right now to give much attention to this.
> Dan
>
>
> On Tue, Feb 23, 2016 at 10:37 AM, Eric Riebling
> <er...@cs.cmu.edu> wrote:
>
> I can't stress strongly enough how helpful
> something like this would have been for me as well.
>
> The first-timer is presented with 43 topics of
> Kaldi documentation <http://kaldi-asr.org/doc/>
> <mailto:kaldi-help+...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to kaldi-help+...@googlegroups.com
> <mailto:kaldi-help+...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to kaldi-help+...@googlegroups.com
> <mailto:kaldi-help+...@googlegroups.com>.

rea...@gmail.com

unread,
Mar 4, 2016, 1:52:38 PM3/4/16
to kaldi-help
Since I am working with language models a lot, I think one of the hardest things in kaldi is how to handle this dict->lang->graph building process. I spent a lot of time figuring out how all this scripts work (and they are old).

There should be some simplier way to do it. Like some script, where you give it language model (as arpa or fst graph) and lexicon file (and probably list of phones, since you have prebuild acoustic model) and you get lang dir as output. It would be nice to handle all this disambig symbols automatically as well.

Just something to consider.
Message has been deleted

arjun malhotra

unread,
Mar 9, 2016, 3:08:54 AM3/9/16
to kaldi-help
Hello Sir
As we started running the script first we got struck with the error "SRILM" Toolkit not installed. But after succesfull instalation of it. It will generate error as mention in screenshot below.

As i have made my system according to the instruction mention from ur end, but i got struck when i run file "run.sh". as shown in above snapshot of my system.kindly help me in this matter. i badly need ur support.

Olumide

unread,
Mar 9, 2016, 5:22:49 AM3/9/16
to kaldi...@googlegroups.com
Arjun,

You don't have to post take a photo of your screen to report an error.
Simply copy your screen output or run the command with the tee command.
The tee command makes a copy of everything that's written to standard
output.
http://linux.101hacks.com/unix/tee-command-examples/

Please use this command in the future to capture program output.

- Olumide


On 09/03/2016 08:08, arjun malhotra wrote:
> Hello Sir
> As we started running the script first we got struck with the error
> "SRILM" Toolkit not installed. But after succesfull instalation of it.
> It will generate error as mention in screenshot below.
> ...

arjun malhotra

unread,
Mar 9, 2016, 12:05:09 PM3/9/16
to kaldi...@googlegroups.com

Thanks for ur reply sir, I would surely appreciate ur advice.

--
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/tzyCwt7zgMQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.

Wit Zieliński

unread,
Mar 9, 2016, 6:35:27 PM3/9/16
to kaldi-help
Hi,

you're right, I forgot to put a point about installing SRILM toolkit. But run.sh script instructs you how to install it so maybe it is not that bad. And talking about your photo - please give some more information about your system. How does your utt2spk file (or all prepared files) looks like exactly? What is the structure of you audio data (directories tree, wav files names and content). Aren't this "empy line/only 1 phone..." literally caused by an empty line at the end of a file?

BR,
Witek

Vardaan

unread,
Mar 12, 2016, 8:18:02 AM3/12/16
to kaldi-help
Hi,

I followed your advice and some of the errors were corrected.
There are some errors i am not able to figure out.
The output and the related files i have attached above.

Regards,
Vardaan
filter_scp.pl
output.txt
text
utt2spk
wav.scp

Daniel Povey

unread,
Mar 12, 2016, 12:30:58 PM3/12/16
to kaldi-help, Wit Zieliński
The first error seems to be :
utils/validate_data_dir.sh: file data/train/wav.scp is not in sorted order or has duplicates
Try doing 
utils/fix_data_dir.sh data/train
before the make_mfcc stage.
Wit: you might want to update the text to say something about this, and/or the sorted-order requirement.

Dan



--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Vardaan

unread,
Mar 14, 2016, 1:13:38 PM3/14/16
to kaldi-help, zie...@gmail.com, dpo...@gmail.com
I tried the command
utils/fix_data_dir.sh data/train
But there was no correction in the errors. Do you have any other suggestion regarding these errors.
output1.txt

Daniel Povey

unread,
Mar 14, 2016, 3:57:22 PM3/14/16
to Vardaan, kaldi-help, Wit Zieliński
I find that hard to believe.  Perhaps you are just getting different errors.
Anyway, it looks like your wav.scp file has lines like
a3_4_6_3/home/vardaan/kaldi-trunk/egs/digits/digits_audio/train/4_6_3.wav
.. there should be a space, like
a3_4_6_3 /home/vardaan/kaldi-trunk/egs/digits/digits_audio/train/4_6_3.wav
Dan


Wit Zieliński

unread,
Mar 14, 2016, 8:21:19 PM3/14/16
to kaldi-help, varda...@gmail.com, zie...@gmail.com, dpo...@gmail.com
Exactly, missing spaces. Do you see any different errors Vardaan? I've just set up Kaldi again and ensured that this digits solution works properly (the only difference is that I use a Polish language digits corpora). The only thing that I had to add (in my current environment - it worked without this before) is "mkdir data/local/tmp" line in run.sh script (before this line: ngram-count -order $lm_order -write-vocab $local/tmp/vocab-full.txt -wbdiscount -text $local/corpus.txt -lm $local/tmp/lm.arpa).

Dan: yes, I would like to make some corrections (about sorting and some other little things that might be helpful). Can you allow me to edit the tutorial (first post)?

BR,
Witek 

Daniel Povey

unread,
Mar 14, 2016, 9:16:49 PM3/14/16
to Wit Zieliński, kaldi-help, Vardaan
This is a mailing list-> I don't think you can edit past posts.
You could edit and re-mail with an easy-to-find subject.

Ruoho Ruotsi

unread,
Mar 15, 2016, 2:58:08 PM3/15/16
to kaldi-help
To chime in here to answer some of Yenda's questions and echo Jian, Eric Riebling & Daniel Galvez's sentiments:

  1. The non-free LDC corpora are *definitely* a limitation of the current tutorial and general barrier to learning. Many interested engineers/scientists cannot afford that kind of investment to test-drive Kaldi. Librispeech & Musan have been *crucial* for me to being able to really learn Kaldi.
  2. re: Yenda's last question: English gets my vote for corpus language, especially to start. Other languages can always be added ensuite.
  3. re: Dan Povey's comment about Doxygen: I see Wit's tutorial as a basic README for the non-ASR engineer, scientist or technical-expert interested in learning ASR and using Kaldi as a toolkit. These users are finding Kaldi via github and therefore I think Wit's tutorial would go best in Markdown format (Tutorial.md) on github, under docs. I like it so much I might put in in my Kaldi fork. I'll argue strongly against putting it in Doxygen. Ideally, it should live where its audience can find it (on github in markdown), for example, isn't this (markdown) a warmer welcome: https://github.com/jameslyons/python_speech_features.  
  4. re: Yenda's questions about current shortcomings of the current Kaldi tutorial: I think the current Kaldi tutorial (aside from LDC corpora issues) is good and deep but targeted towards the ASR-researcher, engineer or grad-student. Makes sense, for example, the initial steps are exploring OpenFST.  On the otherhand, if you're new to ASR, have heard of pocketsphinx or Kaldi, seen it on github, cloned it and want to try to build something small, OpenFST is not an ideal entry point to learning . I'll argue, an overview would be more useful before zooming into the details, this is where Wit's tutorial improves. The main question for any tutorial is: Who's the audience?
Okay, I hope this feedback was helpful and thanks to Wit for the great writeup.

Daniel Povey

unread,
Mar 15, 2016, 4:38:45 PM3/15/16
to kaldi-help

Thanks for the feedback-
I agree with most of it but not sure about the github markdown thing.  It sets an odd precedent given that all the other information about Kaldi is on kaldi-asr.org.  If we were to have a markdown file in github it would probably be a clearer pointer to kaldi-asr.org, and if the doygen information at kaldi-asr.org is not well organized, that should be fixed directly, IMO, rather than splitting the documentation up.
Dan

 

Ruoho Ruotsi

unread,
Mar 15, 2016, 6:37:29 PM3/15/16
to kaldi-help, dpo...@gmail.com
Hi Dan,
I agree that documentation is less useful split up. Reading the doxygen docs, I see that it now supports markdown: 

So I took the liberty, on my kaldi fork to add Wit's tutorial as markdown: https://github.com/ruohoruotsi/kaldi/blob/master/egs/Simple_ASR_System.md

If Wit and the community are okay with it, I could make this a doxygen markdown document and submit a Pull-Request from my fork. Then Wit can also submit changes/updates and won't have to repost to this forum, and we'll have all documentation in one spot.  The only question is where in the repo should the document live, src/doc ??

Daniel Povey

unread,
Mar 15, 2016, 6:40:36 PM3/15/16
to Ruoho Ruotsi, kaldi-help
We are currently compiling the doxygen documentation with MARKDOWN_SUPPORT = NO.
There are, IIRC, some good reasons for this (i.e. certain things break if we enable it).

What really needs to be done is to format it using Doxygen and put it somewhere in doc/, e.g. doc/tutorial_digits.dox, and to modify other dox files such as mainpage.dox and tutorial.dox to refer to it as appropriate.

Dan

Wit Zieliński

unread,
Mar 16, 2016, 12:53:19 PM3/16/16
to kaldi-help, ruoho....@gmail.com, dpo...@gmail.com
I agree this is a good idea to put it online in a place where it might be easily modified (by simple EDIT option or a new commit). I don't have experience with Doxygen and I also don't have access to my personal computer right now so I can unfortunately 'research' it next week. If anyone is planning to do it earlier or to come up with some different idea I attach txt file with a fixed version of this 'Kaldi for dummies' tutorial (removed some small mistakes and your comments taken into account). I can also put it here in this topic (in a nice looking way - like the first post). I don't know if creating a new topic for fixed version that might change again soon is ok. Let me know if I am wrong.

BR,
Witek
Kaldi for dummies - FIXED

Daniel Povey

unread,
Mar 16, 2016, 4:18:54 PM3/16/16
to Wit Zieliński, kaldi-help, Ruoho Ruotsi
Guys, about the location--

I am sympathetic to the idea that it's better to have Kaldi documentation in a place where it's easier for people to edit- maybe a wiki, markdown, etc.  But we have to think about the big picture and where this is going, and not do it piecemeal.  Is there going to be a space somewhere for user-contributed Kaldi documentation?  What will be the mechanism for quality control and making sure it all hangs together [if any]?  Will it be in addition to the doxygen documentation, or is there a plan to migrate the doxygen documentation in doc/ to a new location and format?  I am open to hearing suggestions on these topics, but please don't think about it just from the point of view of this tutorial.

Dan

sauce...@gmail.com

unread,
Mar 24, 2016, 5:21:26 PM3/24/16
to kaldi-help
I've compiled an audio + language model + acoustic data set that is nearly exactly as described in the Exemplary Project. I've managed to successfully run the tutorial, getting word error rates of ~30%

In case anyone else finds this helpful, I've added my eg/digits directory to a fork of Kaldi, available here: https://github.com/johnaegard/kaldi.  Check out egs/digits/README for information and caveats.

I am software engineer with zero academic exposure to speech recognition. Going through this exercise has helped me immensely.  Thanks Wit!

* This corpus was compiled by cutting spoken digits out of English-language recitations of pi I found on youtube.  This isn't super compatible with the Youtube Standard License, which permits streaming only, so if you think about republishing it somewhere, consider that.

Charlie S

unread,
May 17, 2016, 3:38:40 PM5/17/16
to kaldi-help, sauce...@gmail.com
I'm approaching Kaldi in the same way as sauce, a software engineer just starting in machine learning with no academic exposure.

I'm used to tutorials in the format of "You need to do X ?  Lets build a runnable solution."

Which leads me to my question with The Exemplary Project.  After now training it, how would I try to classify new wav files with something like , "./classify my_file.wav" ?

Thanks all!

Ilya Platonov

unread,
May 17, 2016, 3:47:13 PM5/17/16
to kaldi-help, sauce...@gmail.com
Look into egs/apiai_decode/ it uses pretrained model + decode scripts.

Charlie S

unread,
May 17, 2016, 4:56:06 PM5/17/16
to kaldi-help, sauce...@gmail.com
Thanks I'll check it out!

How funny, I'm currently trying to get better results than the api.ai site, that's what led me here!

Charlie S

unread,
May 18, 2016, 2:50:38 PM5/18/16
to kaldi-help, sauce...@gmail.com
Oops, I meant the wit.ai sitem not api.ai .  The results I'm getting from the apiai_decode are _exactly_ what I was looking for, thanks!

Wit Zieliński

unread,
May 29, 2016, 6:41:34 PM5/29/16
to kaldi-help
Here goes a github version, I hope this will be helpful ;-) also running scripts will be updated if necessary. I don't share audio files because I don't have owner's rights.

https://github.com/witko0/kaldifordummies

arjun malhotra

unread,
Jun 2, 2016, 9:01:38 AM6/2/16
to kaldi...@googlegroups.com
thanks a lot wit. you are a life saver. one last thing can u pls also expain how to install SRILM toolkit. i am not installing it rightly i guess. thanks again for ur efforts and support.

--
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/tzyCwt7zgMQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.
Message has been deleted

Anis Chihi

unread,
Jun 2, 2016, 10:11:33 AM6/2/16
to kaldi-help


Am Donnerstag, 2. Juni 2016 15:01:38 UTC+2 schrieb arjun malhotra:
thanks a lot wit. you are a life saver. one last thing can u pls also expain how to install SRILM toolkit. i am not installing it rightly i guess. thanks again for ur efforts and support.

gerar...@gmail.com

unread,
Jun 2, 2016, 7:11:58 PM6/2/16
to kaldi-help
Wit.
This is great. Specially for the ones that we are starting.
If you want, I can create some utterance with my voice and shared. And if more people do the same your example can have free licensed audios.
Just let me know.

Best regards.

Wit Zieliński

unread,
Jun 3, 2016, 9:07:04 AM6/3/16
to kaldi...@googlegroups.com
Sure, we might create some kind of "digits library" in English ;-) then the github project could be 100% executable. I also can prepare some recordings. If anyone would like to contribute just upload it and share it here - I'll implement it and push it into the project. I suggest making it in a way it was described in the tutorial: 3 digits in one utterance/recording, named logically (e.g: <name/tag>_154.wav - recording contains a sentence 'one, five, four').

--

Jan Trmal

unread,
Jun 3, 2016, 9:15:38 AM6/3/16
to kaldi-help
Guys, I don't think it's worth your effort and time. There are already some free corpora.
If you feel like there should be a corpus accompanying the howto, you can use yesno data or subset the librispeech (or something else).
I appreciate the effort greatly, but I don't think you should spend the time on recording new audio -- you would realize that it's not that straightforward as it seems (assuming you would like to have something that performs nicely) and got annoyed or bored. That being said, you are of course free to do what you want :)
y.
 

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Wit Zieliński

unread,
Jun 3, 2016, 10:03:55 AM6/3/16
to kaldi...@googlegroups.com
You're right it is not necessary and some other free corporas might be used. Anyway it would be nice to have a complete github project, fully adjusted to this tutorial. It is not about creating a useful corpora, but exemplary one (someone sees the wavs, directories and may compare it with txt files). If 10 people will record 10 sentences each it will be enough and won't consume much time. It is not a problem.

Gerardo Roa

unread,
Jun 3, 2016, 10:19:28 AM6/3/16
to kaldi...@googlegroups.com

Wit
I'm agree with you.
Just for your example would be nice help you to add data.
Then, anyone can try to create their own sentences as test data and compare.

Best regards
Gerardo Roa Dabike

arjun malhotra

unread,
Jun 3, 2016, 10:20:48 AM6/3/16
to kaldi...@googlegroups.com

I also agree with I can provide some way files but it would be English. 

Zilong Zhao

unread,
Jun 3, 2016, 7:10:47 PM6/3/16
to kaldi-help
Hey wit, your tutorial is really really helpful, now I have some questions,  the mitlm toolkit is obligatory to use Kaldi?  and I do not know that as we create a ASR system, why we also provide a text for test audio, the aim of this system is to recognize the audio, but if we provide the text, what's the meaning of this system? if I want to put a audio and get a text, can i not add the text of the test audio?

Wit Zieliński

unread,
Jun 6, 2016, 6:46:11 PM6/6/16
to kaldi-help
Hi,

it is not obligatory (correct me if I am wrong!). To answer your second question - how would you like to verify recognition in a test session if you don't have any data about test audio files? :) (I mean - system will recognize something and you still need to compare it with something else - a pattern - to get any kind of percentage score). This example is not a system that will 'speech-to-text' your digits audio data - it's a scientific tool that allows you to check how well your speech data is recognized (in this particular case).

BR,
Witek

gerar...@gmail.com

unread,
Jun 7, 2016, 12:59:46 PM6/7/16
to kaldi-help
Dear Wit
I already create my audios.
please check it. If you need some changes just let me know.
https://dl.dropboxusercontent.com/u/91736775/groad.tar.gz
Message has been deleted

tanzia.ha...@gmail.com

unread,
Jun 17, 2016, 10:56:07 AM6/17/16
to kaldi-help
-----------Need Help-----------
I have 423 files of two speakers in .wav format and I have the transcription in .txt format. How can I automatically generate "wav.scp" and "utt2spk" files?  Because manually it's a huge task. Please help me regarding this ... 

thanks in advance
Tanzia Haque

Wit Zieliński

unread,
Jun 17, 2016, 11:04:43 AM6/17/16
to kaldi...@googlegroups.com
You can write scripts (bash/awk), adjusted to you naming convention, that will do it for you. You will find some ideas in the Kaldi example projects that download audio data and prepare txt files automatically. Look for 'preparation' scripts.

BR,
Witek

--

Gerardo Roa

unread,
Jun 17, 2016, 11:07:51 AM6/17/16
to kaldi...@googlegroups.com

Tanzia
You do not do that manually, you must use a tool. (Python, bash, perl, java, ..., as you wich)
It depends who you organized your data.
What I'm doing is to create a python tool for that.
If the names of the files are related with the order of the utterance in the transcription, you could create something that say
transcription [0] = speaker1_audio0
transcription [1] = speaker1_audio1

And ordered that data in the wav.scp and utt2spk format easily.
But, again, it depends how you have your data.

If you can add more detail of how you named the files and how you organiza the transcriptions it would be easer to help you in more details.

Best regards
Gerardo Roa Dabike

On 17 Jun 2016 3:54 p.m., <tanzia.ha...@gmail.com> wrote:
------------Need Help-----------

I have 423 files of two speakers in .wav format and I have the transcription in .txt format. How can I automatically generate "wav.scp" and "utt2spk" files?  Because manually it's a huge task. Please help me regarding this ... 

thanks in advance

--

tanzia.ha...@gmail.com

unread,
Jun 18, 2016, 2:51:46 AM6/18/16
to kaldi-help
Thanks a lot Witek for your quick help... :)

BR,
Tanzi

tanzia.ha...@gmail.com

unread,
Jun 18, 2016, 2:54:20 AM6/18/16
to kaldi-help, gerar...@gmail.com
Thanks a lot Gerardo Roa Dabike for giving the idea to write the python script... it works :)

BR,
Tanzi

Max Lay

unread,
Aug 29, 2016, 10:18:21 PM8/29/16
to kaldi-help, sauce...@gmail.com
Did you manage to get the apiai_decode to work with what was built in this tutorial?
I've completed the tutorial, but now wish to build an "audio in, text out" speech recognition system from it and have no idea how.

Ilya Platonov

unread,
Aug 31, 2016, 1:55:28 PM8/31/16
to kaldi-help, sauce...@gmail.com
apiai_decode uses nnet3 based decoding and this tutorial builds gmm based model.

So if you want to decode with model, built with this tutorial you have to use steps/decode.sh script

Something like steps/decode.sh --cmd run.pl --nj 1 exp/tri1/graph data/test-corpus/ exp/tri1/decode

Alper Boray

unread,
Sep 30, 2016, 9:11:33 AM9/30/16
to kaldi-help
Hello, I'm trying to prepare my own ASR by following your tutorial. However I got some errors that I couldn't solve on my own. I would be so glad if you help me to solve the problem. I prepared required text files and installed SRILM.
Error part of the output of run.sh is attached.
output.txt

Jan Trmal

unread,
Sep 30, 2016, 10:40:52 AM9/30/16
to kaldi-help
You should set DYLD_FALLBACK_LIBRARY_PATH to reflect the location of the liblbfgs-1.10.dylib library
the command should look similar to this:

export DYLD_FALLBACK_LIBRARY_PATH=$DYLD_FALLBACK_LIBRARY_PATH:/Users/mac/kaldi/tools/liblbfgs-1.10/lib/.libs/

you can put it in the path.sh file.
 
y.

On Fri, Sep 30, 2016 at 9:11 AM, Alper Boray <alper...@gmail.com> wrote:
Hello, I'm trying to prepare my own ASR by following your tutorial. However I got some errors that I couldn't solve on my own. I would be so glad if you help me to solve the problem. I prepared required text files and installed SRILM.
Error part of the output of run.sh is attached.

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
Message has been deleted

Daniel Povey

unread,
Dec 8, 2016, 3:05:02 PM12/8/16
to kaldi-help
Almost certainly the problem was in an earlier stage, but you are not
showing enough of the output to make it obvious.

On Thu, Dec 8, 2016 at 1:50 AM, <ssmal...@sheffield.ac.uk> wrote:
> Thank you for the tutorial. this is really helpful.
> I am having this error on the output:
>
>
> ===== MONO TRAINING =====
>
>
> steps/train_mono.sh --nj 1 --cmd run.pl data/train data/lang exp/mono
>
> cat: data/lang/oov.int: No such file or directory
>
>
>
> What do you think causing this problem?
> Thanks in advance
> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.

Olabosipo Shoroye

unread,
Jan 15, 2017, 7:33:12 AM1/15/17
to kaldi-help
@sauce....gmail.com I used your Github link to kickstart mine, got 40% WER when using the same numbering for 'o' and '0' in 'f0' train folder. This was strange because they female voice clearly said 'o' not zero. Then I changed it all to 'o' and changed all other appropriate language files to match and then got 46.67% WER when it should have been better. Why did you label 'o', '0' for f0?

joel starukar

unread,
Jun 21, 2017, 10:19:06 AM6/21/17
to kaldi-help
i am unable to generate mfcc files . Can you help me with it

Ola Shoroye

unread,
Jun 21, 2017, 10:21:20 AM6/21/17
to kaldi...@googlegroups.com
What exactly is your error message? 

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/tzyCwt7zgMQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.

joel starukar

unread,
Jun 21, 2017, 10:30:31 AM6/21/17
to kaldi-help
error in getting feature dimension and feats.scp file is not created

Ola Shoroye

unread,
Jun 27, 2017, 2:56:49 PM6/27/17
to kaldi-help
Sorry for the delayed response, I've forgotten exactly what I did to solve that error and if I even got the error (I got many), however one thing I noticed was that the code wasn't perfect and when errors occurred it was because something broke down before that error so you need to read the logs on the console and spot an error before the error you mentioned, then you'll need to address that error and when you do, that which you described will probably be resolved too.

Harsh Dhaliwal

unread,
Nov 30, 2017, 5:18:27 AM11/30/17
to kaldi-help
I had the same error. It has to do with sampling rate of your wav files. Make sure it is 16KHz mono.

Sidrah Azhar

unread,
Feb 3, 2018, 2:39:29 AM2/3/18
to kaldi-help
Hi. I am following your steps to build ASR system for "Deaf" children. Is it appropriate or there will be some changes for building this system for Deaf?
Kindly guide me!
Regards,
Sidrah

Daniel Povey

unread,
Feb 3, 2018, 2:42:23 PM2/3/18
to kaldi-help
To make a system that was useful for deaf children would probably need a lot of work and you'd need to have prior experience in speech recognition.  Maybe just focus on learning the basics for now.  Try reading this
and running the "kaldi for dummies" tutorial.

Dan


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/523c2869-4112-4afc-b366-6c87b25c4c78%40googlegroups.com.

Sidrah Azhar

unread,
Feb 3, 2018, 2:55:55 PM2/3/18
to kaldi...@googlegroups.com
Thank you Dan!

You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/tzyCwt7zgMQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Lâm Nguyễn

unread,
Jun 2, 2018, 10:06:10 PM6/2/18
to kaldi-help
Brilliant! Thank you so much!
Reply all
Reply to author
Forward
0 new messages