'Chain' models

Daniel Povey

unread,

Dec 14, 2015, 4:52:38 PM12/14/15

to kaldi-developers

Everyone,

I have decided the 'chain' models are ready to be publicized a bit more widely.

Rather than doing this all by email I prepared a documentation page:

http://kaldi-asr.org/doc2/chain.html

(note, this is in a 'doc2/' version of the docs, not the normal 'doc/' location).

This is the outcome of all my experimentation with CTC; in the end I couldn't get improvement with CTC versus our best models (BTW, I hear Microsoft Research has had a similar experience), but with these 'chain' models I was able to use a similar sequence-level objective function and actually get some improvements, plus the speed advantages of the 3-fold frame subsampling.

I would appreciate some help from others in testing this stuff out, developing and tuning recipes for other corpora, and improving the GPU implementation; the documentation page says what the TODOs are.

Dan

Ilya Platonov

unread,

Dec 15, 2015, 3:54:52 PM12/15/15

to kaldi-developers, dpo...@gmail.com

I see from the doc, that online decoder are aimed to be finished in April. Is this because everyone is busy with doing other stuff or because it is hard to implement? Lets say we find some outside developer who will focus on implementing it, how long will it take?

Thank you.

Daniel Povey

unread,

Dec 15, 2015, 4:07:11 PM12/15/15

to Ilya Platonov, kaldi-developers

Actually it's not hard, it's more that I'm busy with other things, and I wanted to do some experiments to determine exactly what's needed. For short-term stuff, the nnet2 online setup could be adapted to work with nnet3. But you would have a hard time finding an external developer who would be able to do this, because it requires deep familiarity with ASR and with the Kaldi codebase. However, I may be able to make a quick and dirty version much sooner.

Dan

Message has been deleted

Rémi Francis

unread,

Dec 18, 2015, 5:59:43 AM12/18/15

to kaldi-developers, dpo...@gmail.com

Could the 'chain' models be improved with sMBR training after the chain training?

They are both sequence-level objectives, so if with the chain models we get the same accuracy than CE trained models, but then can't improve them more with sMBR, there is still a gap to fill.

Daniel Povey

unread,

Dec 18, 2015, 4:30:24 PM12/18/15

to Rémi Francis, kaldi-developers

That's true, but the gap is already almost as large as the gain we would normally get from discriminative training in nnet2 models, so it's very unlikely that that would cancel out all the improvement. Currently we haven't written the nnet3 sequence-training code so we can't test that out.

Of course, whether the criterion is sMBR or boosted MMI, we could still get an improvement from training the 'chain' models that way, as it's a word-level lattice, and the 'chain' models were trained with a phone-level language model. Who knows-- we'd have to test it.

I'm also hoping that the improvement we'll get from LSTMs/BLSTMs with chain models is more than we'd get from regular models, because the frame-independence assumption is broken very badly by infiite-context models, but it's not an assumption that we make in the 'chain' models. Vijay will test this.

Dan

Ilya Platonov

unread,

Jan 9, 2016, 12:28:46 PM1/9/16

to kaldi-developers, dpo...@gmail.com

nnet3 uses "g.q" queue instead of "all.q".

I had to configure it in cluster before running local/nnet3/run_lstm.sh

On Monday, December 14, 2015 at 1:52:38 PM UTC-8, Dan Povey wrote:

Daniel Povey

unread,

Jan 9, 2016, 2:41:34 PM1/9/16

to Ilya Platonov, kaldi-developers

More recent scripts use the standard '--gpu 1' option, which is then interpreted by queue.pl .. by default it maps to '-q g.q -l gpu=1' I think, but you can configure it by creating and editing conf/queue.conf. See http://kaldi-asr.org/doc/queue.html.

Dan

Xingyu Na

unread,

Feb 13, 2016, 9:18:59 PM2/13/16

to kaldi-de...@googlegroups.com

I noticed the chain commits are merged into master. Does it mean chain branch is deprecated and future work on chain be done on master?
X.

--
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To post to this group, send email to kaldi-de...@googlegroups.com.
Visit this group at https://groups.google.com/group/kaldi-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/CAEWAuyQ5X%3DP%2BupRA9hwvpQaf04sBvTOvDyCqhveJJ5nOnw5mMw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Povey

unread,

Feb 13, 2016, 9:22:02 PM2/13/16

to kaldi-developers

no, chain branch is still used for ongoing work, it's less stable than master.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/56BFE46A.9050402%40gmail.com.

Ilya Platonov

unread,

Feb 22, 2016, 4:42:48 PM2/22/16

to kaldi-developers, dpo...@gmail.com

Chain models referred to be faster then nnet2 setup ( I read about 3 times faster).

Still when I do "time steps/nnet3/decode.sh", it work slightly slower then my similar nnet2 setup.

So how is speed measured in this case? What part of decoding process is faster? How much improvement should I expect on full decoding process?

Thank you.

Daniel Povey

unread,

Feb 22, 2016, 4:45:06 PM2/22/16

to Ilya Platonov, kaldi-developers

Chain models referred to be faster then nnet2 setup ( I read about 3 times faster).

Still when I do "time steps/nnet3/decode.sh", it work slightly slower then my similar nnet2 setup.

Are you doing this on a chain model, or on a different nnet3 model?

So how is speed measured in this case? What part of decoding process is faster? How much improvement should I expect on full decoding process?

It's the real-time factor. Sometimes to actually see the improvement you have to reduce the beams slightly. In practice the speedup seems to be more like a factor of 2 than 3- but getting that factor of 2 improvement by reducing the beam is pretty easy, you won't see a substantial change in WER.

Dan

Daniel Povey

unread,

Feb 22, 2016, 4:50:07 PM2/22/16

to Ilya Platonov, kaldi-developers

BTW, the improvement in speed is in both the neural net (since most of it is evaluated on about 3 times fewer frames, plus it's smaller), and in the decoder search (since the frame rate is 3 times slower than the baseline). However, the beams used in the baseline tend to leave a lot more states active in the chain models, so in order to see this speedup you need to reduce the beams a bit (based on Remi Francis's experiments, I suggest subtracting 2 from the baseline --beam and --lattice-beam).

Dan

Ilya Platonov

unread,

Feb 22, 2016, 5:02:57 PM2/22/16

to kaldi-developers, rea...@gmail.com, dpo...@gmail.com

I have been running vanilla nnet3 scripts, sorry for confusion.

Rémi Francis

unread,

Feb 24, 2016, 10:17:04 AM2/24/16

to kaldi-developers, rea...@gmail.com, dpo...@gmail.com

This was done on the model from the tdnn_2o script, I haven't tried the newer ones but since they are better the beams could probably be reduced.

However this was trying to match my cross entropy baseline; for my sMBR one, with the newer scripts I have a similar WER with the same beams. I haven't really measured the speed accuracy trade off with them.

Ilya Platonov

unread,

Feb 29, 2016, 2:28:36 PM2/29/16

to kaldi-developers, dpo...@gmail.com

I am trying to compile latest kaldi master on my StarCluster instance (and it has pretty old ubuntu raring) and getting this:

#$ _SPACE_=
#$ _CUDART_=cudart
#$ _HERE_=/usr/local/cuda/bin
#$ _THERE_=/usr/local/cuda/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_SIZE_=64
#$ TOP=/usr/local/cuda/bin/..
#$ NVVMIR_LIBRARY_DIR=/usr/local/cuda/bin/../nvvm/libdevice
#$ LD_LIBRARY_PATH=/usr/local/cuda/bin/../lib::/srv/train/kaldi/egs/speaktoit/s5/tools/mitlm-svn/lib:/srv/train/kaldi/egs/speaktoit/s5/../../../tools/openfst-1.3.4/lib
#$ PATH=/usr/local/cuda/bin/../open64/bin:/usr/local/cuda/bin/../nvvm/bin:/usr/local/cuda/bin:/srv/train/kaldi/egs/speaktoit/s5/utils/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/bin:/srv/train/kaldi/egs/speaktoit/s5/../../../tools/openfst/bin:/srv/train/kaldi/egs/speaktoit/s5/../../../src/fstbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/gmmbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/featbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/lm/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/sgmmbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/sgmm2bin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/fgmmbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/latbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/nnetbin:/srv/train/kaldi/egs/speaktoit/s5/../../../src/nnet2bin:/srv/train/kaldi/egs/speaktoit/s5/../../../src/nnet3bin:/srv/train/kaldi/egs/speaktoit/s5/../../../src/online2bin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/ivectorbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/lmbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/chainbin:/srv/train/kaldi/egs/speaktoit/s5/../../../src/kwsbin:/srv/train/kaldi/egs/speaktoit/s5:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/local/cuda/bin:/opt/sge6/bin/linux-x64
#$ INCLUDES="-I/usr/local/cuda/bin/..//include"
#$ LIBRARIES= "-L/usr/local/cuda/bin/..//lib64"
#$ CUDAFE_FLAGS=
#$ OPENCC_FLAGS=
#$ PTXAS_FLAGS=
#$ gcc -D__CUDA_ARCH__=100 -E -x c++ -DCUDA_FLOAT_MATH_FUNCTIONS -DCUDA_NO_SM_11_ATOMIC_INTRINSICS -DCUDA_NO_SM_12_ATOMIC_INTRINSICS -DCUDA_NO_SM_13_DOUBLE_INTRINSICS -D__CUDACC__ -D__NVCC__ -fPIC -I"/usr/local/cuda/include" -I"../" "-I/usr/local/cuda/bin/..//include" -D"HAVE_CUDA" -include "cuda_runtime.h" -m64 -g -o "/tmp/tmpxft_00001255_00000000-18_chain-kernels.compute_10.cpp1.ii" "chain-kernels.cu"
#$ cudafe --m64 --gnu_version=40703 -tused --no_remove_unneeded_entities --gen_c_file_name "/tmp/tmpxft_00001255_00000000-3_chain-kernels.compute_10.cudafe1.c" --stub_file_name "/tmp/tmpxft_00001255_00000000-3_chain-kernels.compute_10.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_00001255_00000000-3_chain-kernels.compute_10.cudafe1.gpu" --nv_arch "compute_10" --gen_module_id_file --module_id_file_name "/tmp/tmpxft_00001255_00000000-2_chain-kernels.module_id" --include_file_name "tmpxft_00001255_00000000-1_chain-kernels.fatbin.c" "/tmp/tmpxft_00001255_00000000-18_chain-kernels.compute_10.cpp1.ii"
chain-kernels.cu(28): error: identifier "atomicExch" is undefined
detected during:
instantiation of "void atomic_add(Real *, Real) [with Real=BaseFloat]"
(47): here
instantiation of "void atomic_add_thresholded(Real *, Real) [with Real=BaseFloat]"
(211): here

1 error detected in the compilation of "/tmp/tmpxft_00001255_00000000-18_chain-kernels.compute_10.cpp1.ii".
# --error 0x2 --
make[1]: *** [chain-kernels.o] Error 2
make[1]: *** Waiting for unfinished jobs....g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Wno-unused-local-typedefs -Winit-self -DHAVE_EXECINFO_H=1 -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS -I/srv/train/kaldi/tools/ATLAS/include -I/srv/train/kaldi/tools/openfst/include -g -DHAVE_CUDA -I/usr/local/cuda/include -c -o mixup-nnet.o mixup-nnet.cc

Should I use newer cuda?

Jan Trmal

unread,

Feb 29, 2016, 2:38:33 PM2/29/16

to kaldi-de...@googlegroups.com, Dan Povey

I believe atomicExch is supported from compute_13 or something like that -- compute_10 might be too old. It's no particularly issue of cuda version, it's more like issue of the target device (compute capability).

y.

--

You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To post to this group, send email to kaldi-de...@googlegroups.com.
Visit this group at https://groups.google.com/group/kaldi-developers.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/31b6e59a-d3ab-417b-a4d1-0afe11d8d346%40googlegroups.com.

Ilya Platonov

unread,

Feb 29, 2016, 2:46:05 PM2/29/16

to kaldi-developers, dpo...@gmail.com

So does it mean I can not compile kaldi with cuda on this device? This is amazon ec2 g2.2xlarge

I am all new to this cuda related stuff.

Vijayaditya Peddinti

unread,

Feb 29, 2016, 2:47:46 PM2/29/16

to kaldi-developers, Daniel Povey

We have used Kaldi with this particular ec2 instance type before.

--Vijay

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/fecc69ed-b2d7-491d-9f73-aab4d9fb3f83%40googlegroups.com.

Jan Trmal

unread,

Feb 29, 2016, 2:50:46 PM2/29/16

to kaldi-de...@googlegroups.com, Dan Povey

my guess from looking at the makefile would be update the cuda to version higher than 6.5

That amazon machine has fairly modern architecture, i.e. it should work.

y.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/fecc69ed-b2d7-491d-9f73-aab4d9fb3f83%40googlegroups.com.

Jan Trmal

unread,

Feb 29, 2016, 2:52:37 PM2/29/16

to kaldi-de...@googlegroups.com, Dan Povey

or just modify the cumatrix/Makefile so that the lines
51   #For toolkit older than 6.5, add the compute capability 1.0
52   CUDA_VER_GT_6_5 := $(shell [ $(CUDA_VERSION) -ge 65 ] && echo true)
53   ifneq ($(CUDA_VER_GT_6_5), true)
54     CUDA_ARCH += -gencode arch=compute_13,code=sm_13 \
55                  -gencode arch=compute_10,code=sm_10
56   endif

look like

51   #For toolkit older than 6.5, add the compute capability 1.0
52   CUDA_VER_GT_6_5 := $(shell [ $(CUDA_VERSION) -ge 65 ] && echo true)
53   ifneq ($(CUDA_VER_GT_6_5), true)
54     CUDA_ARCH += -gencode arch=compute_13,code=sm_13
55   endif

Daniel Povey

unread,

Feb 29, 2016, 2:52:41 PM2/29/16

to Jan Trmal, kaldi-developers

If you don't need the chain code, then you can just remove 'chain' and 'chainbin' from the targets in the Makefile.

Dan

On Mon, Feb 29, 2016 at 2:50 PM, Jan Trmal <jtr...@gmail.com> wrote:

Ilya Platonov

unread,

Feb 29, 2016, 2:54:21 PM2/29/16

to kaldi-developers, dpo...@gmail.com

I have been using ec2 instances with cuda for quite some time now on the same setup. Compilation fails on new chain models code, which I wanted to try out here.

It does compile find on my local machine

Here is some version info

root@master:/srv/train/kaldi/src/chain# cudafe++ -v

cudafe: NVIDIA (R) Cuda Language Front End

Based on Edison Design Group C/C++ Front End, version 4.5 (Jul 17 2013 18:38:05)

Cuda compilation tools, release 5.5, V5.5.0

Daniel Povey

unread,

Feb 29, 2016, 2:56:20 PM2/29/16

to Ilya Platonov, kaldi-developers

I am assuming that 5.5 version is on the ec2 instance. That's too old, I think the atomicExch is only supported from 6.5. You have to update your CUDA, else there is no chance for it to work.

Dan

Ilya Platonov

unread,

Feb 29, 2016, 2:56:29 PM2/29/16

to kaldi-developers, dpo...@gmail.com

I mean it compiles fine on my local machine.

Jan Trmal

unread,

Feb 29, 2016, 2:59:21 PM2/29/16

to kaldi-de...@googlegroups.com, Dan Povey

Upgrading the cuda should solve your issue.

y.

On Mon, Feb 29, 2016 at 2:56 PM, Ilya Platonov <rea...@gmail.com> wrote:

I mean it compiles fine on my local machine.

--

You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To post to this group, send email to kaldi-de...@googlegroups.com.
Visit this group at https://groups.google.com/group/kaldi-developers.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/fa65217f-7706-4096-ac84-d4641b9e106b%40googlegroups.com.

Ilya Platonov

unread,

Feb 29, 2016, 5:32:58 PM2/29/16

to kaldi-developers, dpo...@gmail.com

Yes upgrading to 6.5 solved this. Thank you.

Ilya Platonov

unread,

Mar 2, 2016, 5:10:41 PM3/2/16

to kaldi-developers, dpo...@gmail.com

I am trying to do chain training and I got bunch of issues:

I used this script as base https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/run_tdnn_2o.sh tweaked it a bit to fit my training data

1) touch: cannot touch 'exp/chain/tdnn_2o/egs/.nodelete': No such file or directory -- pretty straight forward, egs folder does not exist on line 211

2)

steps/nnet3/chain/train_tdnn.sh: line 310: [: 1: unary operator expected

steps/nnet3/chain/train_tdnn.sh: line 311: [: 1: unary operator expected

Not sure why there is a error like this, but assume I can just ignore it.

3) Then it fails here:

steps/nnet3/chain/train_tdnn.sh: getting preconditioning matrix for input features.

queue.pl: 20 / 20 failed, log is in exp/chain/tdnn_2o/log/get_lda_stats.*.log

Then in log I get this

# Accounting: time=0 threads=1

# Finished at Wed Mar 2 21:58:56 UTC 2016 with status 255

# Running on node004

# Started at Wed Mar 2 21:58:54 UTC 2016

# nnet3-chain-acc-lda-stats --rand-prune=4.0 exp/chain/tdnn_2o/init.raw ark:exp/chain/tdnn_2o/egs/cegs.11.ark exp/chain/tdnn_2o/11.lda_stats

nnet3-chain-acc-lda-stats --rand-prune=4.0 exp/chain/tdnn_2o/init.raw ark:exp/chain/tdnn_2o/egs/cegs.11.ark exp/chain/tdnn_2o/11.lda_stats

WARNING (nnet3-chain-acc-lda-stats:Open():util/kaldi-table-inl.h:353) TableReader: failed to open stream exp/chain/tdnn_2o/egs/cegs.11.ark

ERROR (nnet3-chain-acc-lda-stats:SequentialTableReader():util/kaldi-table-inl.h:534) Error constructing TableReader: rspecifier is ark:exp/chain/tdnn_2o/egs/cegs.11.ark

[stack trace: ]

kaldi::KaldiGetStackTrace()

kaldi::KaldiErrorMessage::~KaldiErrorMessage()

kaldi::SequentialTableReader<kaldi::KaldiObjectHolder<kaldi::nnet3::NnetChainExample> >::SequentialTableReader(std::string const&)

nnet3-chain-acc-lda-stats(main+0x2e2) [0x82aace]

/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x2ae76f29bea5]

nnet3-chain-acc-lda-stats() [0x82a729]

So there are no exp/chain/tdnn_2o/egs/cegs.11.ark files in egs dir, but it has a lot of cegs_orig files.

Here is a full log:

root@master:/srv/train/kaldi/egs/speaktoit/s5# local/run_tdnn_2o.sh --stage 12 2>&1 | tee train.log

local/run_tdnn_2o.sh --stage 12

steps/nnet3/chain/train_tdnn.sh --stage -10 --apply-deriv-weights false --lm-opts --num-extra-lm-states=2000 --get-egs-stage -10 --minibatch-size 128 --egs-opts --frames-overlap-per-eg

0 --frames-per-eg 150 --num-epochs 8 --num-jobs-initial 3 --num-jobs-final 8 --splice-indexes -2,-1,0,1,2 -1,2 -3,3 -6,3 -6,3 --feat-type raw --cmvn-opts --norm-means=false --norm-var

s=false --initial-effective-lrate 0.001 --final-effective-lrate 0.0001 --max-param-change 1.0 --final-layer-normalize-target 0.5 --relu-dim 850 --cmd queue.pl --remove-egs false data/t

rain_hires exp/chain/tri5_2o_tree exp/tri3b_lats_nodup exp/chain/tdnn_2o

steps/nnet3/chain/train_tdnn.sh: creating phone language-model

steps/nnet3/chain/train_tdnn.sh: creating denominator FST

copy-transition-model exp/chain/tri5_2o_tree/final.mdl exp/chain/tdnn_2o/0.trans_mdl

LOG (copy-transition-model:main():copy-transition-model.cc:62) Copied transition model.

am-info exp/chain/tdnn_2o/0.trans_mdl

steps/nnet3/chain/train_tdnn.sh: creating neural net configs

steps/nnet3/tdnn/make_configs.py --pool-type none --include-log-softmax=false --final-layer-normalize-target 0.5 --splice-indexes -2,-1,0,1,2 -1,2 -3,3 -6,3 -6,3 --feat-dim 40 --ivector-dim 0 --relu-dim 850 --num-targets 7343 --use-presoftmax-prior-scale false exp/chain/tdnn_2o/configs

Append(Offset(input, -2), Offset(input, -1), input, Offset(input, 1), Offset(input, 2))

steps/nnet3/chain/train_tdnn.sh: calling get_egs.sh

steps/nnet3/chain/get_egs.sh --frames-overlap-per-eg 0 --cmvn-opts --norm-means=false --norm-vars=false --feat-type raw --transform-dir exp/tri3b_lats_nodup --left-context 1 --right-context 1 --frames-per-iter 800000 --stage -10 --cmd queue.pl --right-tolerance 10 --left-tolerance 5 --frames-per-eg 150 --frame-subsampling-factor 3 data/train_hires exp/chain/tdnn_2o exp/tri3b_lats_nodup exp/chain/tdnn_2o/egs

File data/train_hires/utt2uniq exists, so augmenting valid_uttlist to

include all perturbed versions of the same 'real' utterances.

steps/nnet3/chain/get_egs.sh: feature type is raw

steps/nnet3/chain/get_egs.sh: working out number of frames of training data

steps/nnet3/chain/get_egs.sh: working out feature dim

feat-to-dim 'ark,s,cs:utils/filter_scp.pl --exclude exp/chain/tdnn_2o/egs/valid_uttlist data/train_hires/split15/1/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_hires/split15/1/utt2spk scp:data/train_hires/split15/1/cmvn.scp scp:- ark:- |' -

apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_hires/split15/1/utt2spk scp:data/train_hires/split15/1/cmvn.scp scp:- ark:-

WARNING (feat-to-dim:Close():kaldi-io.cc:496) Pipe utils/filter_scp.pl --exclude exp/chain/tdnn_2o/egs/valid_uttlist data/train_hires/split15/1/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_hires/split15/1/utt2spk scp:data/train_hires/split15/1/cmvn.scp scp:- ark:- | had nonzero return status 36096

steps/nnet3/chain/get_egs.sh: creating 1184 archives, each with 5329 egs, with

steps/nnet3/chain/get_egs.sh: 150 labels per example, and (left,right) context = (1,1)

steps/nnet3/chain/get_egs.sh: copying training lattices

steps/nnet3/chain/get_egs.sh: Getting validation and training subset examples.

steps/nnet3/chain/get_egs.sh: ... extracting validation and training-subset alignments.

... Getting subsets of validation examples for diagnostics and combination.

steps/nnet3/chain/get_egs.sh: Generating training examples on disk

steps/nnet3/chain/get_egs.sh: recombining and shuffling order of archives on disk

steps/nnet3/chain/get_egs.sh: removing temporary archives

steps/nnet3/chain/get_egs.sh: removing temporary lattices

steps/nnet3/chain/get_egs.sh: removing temporary alignments and transforms

steps/nnet3/chain/get_egs.sh: Finished preparing training examples

steps/nnet3/chain/train_tdnn.sh: getting preconditioning matrix for input features.

queue.pl: 20 / 20 failed, log is in exp/chain/tdnn_2o/log/get_lda_stats.*.log

May be I should try to use wsj version of script instead of run_tdnn_2o.sh, but not sure if it will fix this.

Daniel Povey

unread,

Mar 2, 2016, 5:20:51 PM3/2/16

to Ilya Platonov, kaldi-developers

I think you may have hit a code path in the scripts that is buggy and was not tested, where $archives_multiple != 1.

I'm looking at it right now and will try to commit a fix.

Dan

Daniel Povey

unread,

Mar 2, 2016, 5:44:42 PM3/2/16

to Ilya Platonov, kaldi-developers

OK, I committed a couple of fixes.

To save re-dumping egs, you can go into the egs directory and do

for x in egs.*.ark; do mv $x c$x; done

and then rerun.

However, I recommend that you run a more recent script- for example, check out the chain branch, and run the 6h script. 2o is a very old script and will not give the best results.

Dan

Daniel Povey

unread,

Mar 2, 2016, 5:47:03 PM3/2/16

to Ilya Platonov, kaldi-developers

... oh, and you should rerun with --train-stage -3 to avoid re-dumping egs.

Dan

Ilya Platonov

unread,

Mar 6, 2016, 1:18:31 PM3/6/16

to kaldi-developers, rea...@gmail.com, dpo...@gmail.com

So I finally got my first chain models results on tdnn_o2.

It both faster (more then 2 times) and have significant WER improvement compared to my nnet2/tdnn results. And I have not finished training yet, so WER will probably improve.

So Dan and everyone else who is working on this - good job.

I do not use ivectors.

Now, I would love to have online decoder and put this into production :).

Daniel Povey

unread,

Mar 7, 2016, 1:15:40 AM3/7/16

to Ilya Platonov, kaldi-developers

That's great news!

There has been some improvements since then. After spending a while improving results with the so-called 'jesus-layer' we discovered that a regular ReLU-based TDNN could do even better (it's not clear why the jesus-layer initially helped, perhaps something was going wrong in training at that time).

Anyway, 6z (present in branch 'chain') is the current recommended setup.

I hope to have an online-decoding setup for these models checked in within 2 months' time. It's not very hard, but we have to first decide what types of features to support (especially as regards iVectors).

Dan

Ilya Platonov

unread,

Mar 17, 2016, 4:54:28 PM3/17/16

to kaldi-developers, dpo...@gmail.com

What do I do if I want to reduce computation time for chain models by factor 1.5, 2 or 4. What parameters should I tune in my training?

Daniel Povey

unread,

Mar 17, 2016, 5:14:52 PM3/17/16

to Ilya Platonov, kaldi-developers

I assume you are talking about the computation in test time. Let's assume we're talking about the nnet computation, not the graph search (which can be controlled by max-active, beam, etc.).

The main thing you can do to reduce this is to reduce the model size- principally the relu-dim, but you'd probably also want to reduce the --num-leaves a bit.

You could also set all the --frame-subsampling-factor and --alignment-subsampling-factor options to a number more than 3 (e.g., 4), but to get a speedup you'd have to modify all the splicing indexes like -3,0,3 to -4,0,4. I tried this in one of the tuning scripts that's checked in, the WER difference was quite small. That would give you a 33% speedup.

Dan

Vijayaditya Peddinti

unread,

Mar 17, 2016, 5:16:42 PM3/17/16

to kaldi-developers, Ilya Platonov

If you reduce relu-dim you might want to also reduce the regularization constants (leaky-hmm-coefficient and xent-regularize) introduced to avoid over-fitting.

--Vijay

--
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To post to this group, send email to kaldi-de...@googlegroups.com.
Visit this group at https://groups.google.com/group/kaldi-developers.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/CAEWAuyTZh5EsvHyCTs0fVF3%3DCWxvY1NU4dsEG-sSNrXHU3YSZA%40mail.gmail.com.

Ilya Platonov

unread,

Mar 17, 2016, 5:26:45 PM3/17/16

to kaldi-developers, rea...@gmail.com

Thanks for the tips.

Ilya Platonov

unread,

Apr 5, 2016, 1:53:29 PM4/5/16

to kaldi-developers, dpo...@gmail.com

We successfully used chain models for this demo on Artik 10 device:

https://www.youtube.com/watch?v=ObD_wIw0pys

It is almost the same model as used here https://github.com/kaldi-asr/kaldi/tree/master/egs/apiai_decode/s5, but a bit tweaked version to achieve realtime recognition (it was almost realtime without tweaks).

I changed relu-dim to 650 and frame-subsampling-factor to 4 during training.

Ilya Platonov

unread,

Apr 15, 2016, 2:16:08 PM4/15/16

to kaldi-developers, dpo...@gmail.com

What is the best latest script to use for training (in master branch)? I do use tdnn_2o right now.

On Monday, December 14, 2015 at 1:52:38 PM UTC-8, Dan Povey wrote:

Everyone,

I have decided the 'chain' models are ready to be publicized a bit more widely.
Rather than doing this all by email I prepared a documentation page:
http://kaldi-asr.org/doc2/chain.html
(note, this is in a 'doc2/' version of the docs, not the normal 'doc/' location).

This is the outcome of all my experimentation with CTC; in the end I couldn't get improvement with CTC versus our best models (BTW, I hear Microsoft Research has had a similar experience), but with these 'chain' models I was able to use a similar sequence-level objective function and actually get some improvements, plus the speed advantages of the 3-fold frame subsampling.

I would appreciate some help from others in testing this stuff out, developing and tuning recipes for other corpora, and improving the GPU implementation; the documentation page says what the TODOs are.

Dan

Daniel Povey

unread,

Apr 15, 2016, 3:39:34 PM4/15/16

to Ilya Platonov, kaldi-developers

I'd recommend the 6z script in Switchboard. It should be in master by now.

Dan

Reply all

Reply to author

Forward