'Chain' models

1,938 views
Skip to first unread message

Daniel Povey

unread,
Dec 14, 2015, 4:52:38 PM12/14/15
to kaldi-developers

Everyone,

I have decided the 'chain' models are ready to be publicized a bit more widely.
Rather than doing this all by email I prepared a documentation page:
(note, this is in a 'doc2/' version of the docs, not the normal 'doc/' location).

This is the outcome of all my experimentation with CTC; in the end I couldn't get improvement with CTC versus our best models (BTW, I hear Microsoft Research has had a similar experience), but with these 'chain' models I was able to use a similar sequence-level objective function and actually get some improvements, plus the speed advantages of the 3-fold frame subsampling.

I would appreciate some help from others in testing this stuff out, developing and tuning recipes for other corpora, and improving the GPU implementation; the documentation page says what the TODOs are.

Dan

Ilya Platonov

unread,
Dec 15, 2015, 3:54:52 PM12/15/15
to kaldi-developers, dpo...@gmail.com
I see from the doc, that online decoder are aimed to be finished in April. Is this because everyone is busy with doing other stuff or because it is hard to implement? Lets say we find some outside developer who will focus on implementing it, how long will it take?

Thank you.

Daniel Povey

unread,
Dec 15, 2015, 4:07:11 PM12/15/15
to Ilya Platonov, kaldi-developers
Actually it's not hard, it's more that I'm busy with other things, and I wanted to do some experiments to determine exactly what's needed.  For short-term stuff, the nnet2 online setup could be adapted to work with nnet3.  But you would have a hard time finding an external developer who would be able to do this, because it requires deep familiarity with ASR and with the Kaldi codebase.  However, I may be able to make a quick and dirty version much sooner.
Dan

Message has been deleted

Rémi Francis

unread,
Dec 18, 2015, 5:59:43 AM12/18/15
to kaldi-developers, dpo...@gmail.com
Could the 'chain' models be improved with sMBR training after the chain training? 
They are both sequence-level objectives, so if with the chain models we get the same accuracy than CE trained models, but then can't improve them more with sMBR, there is still a gap to fill.

Daniel Povey

unread,
Dec 18, 2015, 4:30:24 PM12/18/15
to Rémi Francis, kaldi-developers
That's true, but the gap is already almost as large as the gain we would normally get from discriminative training in nnet2 models, so it's very unlikely that that would cancel out all the improvement.  Currently we haven't written the nnet3 sequence-training code so we can't test that out.  
Of course, whether the criterion is sMBR or boosted MMI, we could still get an improvement from training the 'chain' models that way, as it's a word-level lattice, and the 'chain' models were trained with a phone-level language model.  Who knows-- we'd have to test it.
I'm also hoping that the improvement we'll get from LSTMs/BLSTMs with chain models is more than we'd get from regular models, because the frame-independence assumption is broken very badly by infiite-context models, but it's not an assumption that we make in the 'chain' models.  Vijay will test this.

Dan

Ilya Platonov

unread,
Jan 9, 2016, 12:28:46 PM1/9/16
to kaldi-developers, dpo...@gmail.com
nnet3 uses "g.q" queue instead of "all.q".

I had to configure it in cluster before running local/nnet3/run_lstm.sh


On Monday, December 14, 2015 at 1:52:38 PM UTC-8, Dan Povey wrote:

Daniel Povey

unread,
Jan 9, 2016, 2:41:34 PM1/9/16
to Ilya Platonov, kaldi-developers
More recent scripts use the standard '--gpu 1' option, which is then interpreted by queue.pl .. by default it maps to '-q g.q -l gpu=1' I think, but you can configure it by creating and editing conf/queue.conf.  See http://kaldi-asr.org/doc/queue.html.

Dan

Xingyu Na

unread,
Feb 13, 2016, 9:18:59 PM2/13/16
to kaldi-de...@googlegroups.com
I noticed the chain commits are merged into master. Does it mean chain branch is deprecated and future work on chain be done on master?
X.
--
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To post to this group, send email to kaldi-de...@googlegroups.com.
Visit this group at https://groups.google.com/group/kaldi-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/CAEWAuyQ5X%3DP%2BupRA9hwvpQaf04sBvTOvDyCqhveJJ5nOnw5mMw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Povey

unread,
Feb 13, 2016, 9:22:02 PM2/13/16
to kaldi-developers
no, chain branch is still used for ongoing work, it's less stable than master.

Ilya Platonov

unread,
Feb 22, 2016, 4:42:48 PM2/22/16
to kaldi-developers, dpo...@gmail.com
Chain models referred to be faster then nnet2 setup ( I read about 3 times faster).

Still when I do "time steps/nnet3/decode.sh", it work slightly slower then my similar nnet2 setup.

So how is speed measured in this case? What part of decoding process is faster? How much improvement should I expect on full decoding process?

Thank you.

Daniel Povey

unread,
Feb 22, 2016, 4:45:06 PM2/22/16
to Ilya Platonov, kaldi-developers

Chain models referred to be faster then nnet2 setup ( I read about 3 times faster).

Still when I do "time steps/nnet3/decode.sh", it work slightly slower then my similar nnet2 setup.

Are you doing this on a chain model, or on a different nnet3 model?
 
So how is speed measured in this case? What part of decoding process is faster? How much improvement should I expect on full decoding process?

It's the real-time factor.  Sometimes to actually see the improvement you have to reduce the beams slightly.  In practice the speedup seems to be more like a factor of 2 than 3- but getting that factor of 2 improvement by reducing the beam is pretty easy, you won't see a substantial change in WER.

Dan
 
 

Daniel Povey

unread,
Feb 22, 2016, 4:50:07 PM2/22/16
to Ilya Platonov, kaldi-developers
BTW, the improvement in speed is in both the neural net (since most of it is evaluated on about 3 times fewer frames, plus it's smaller), and in the decoder search (since the frame rate is 3 times slower than the baseline).  However, the beams used in the baseline tend to leave a lot more states active in the chain models, so in order to see this speedup you need to reduce the beams a bit (based on Remi Francis's experiments, I suggest subtracting 2 from the baseline --beam and --lattice-beam).

Dan


Ilya Platonov

unread,
Feb 22, 2016, 5:02:57 PM2/22/16
to kaldi-developers, rea...@gmail.com, dpo...@gmail.com
I have been running vanilla nnet3 scripts, sorry for confusion.

Rémi Francis

unread,
Feb 24, 2016, 10:17:04 AM2/24/16
to kaldi-developers, rea...@gmail.com, dpo...@gmail.com
This was done on the model from the tdnn_2o script, I haven't tried the newer ones but since they are better the beams could probably be reduced.
However this was trying to match my cross entropy baseline; for my sMBR one, with the newer scripts I have a similar WER with the same beams. I haven't really measured the speed accuracy trade off with them.

Ilya Platonov

unread,
Feb 29, 2016, 2:28:36 PM2/29/16
to kaldi-developers, dpo...@gmail.com
I am trying to compile latest kaldi master on my StarCluster instance (and it has pretty old ubuntu raring) and getting this:
 
#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/usr/local/cuda/bin
#$ _THERE_=/usr/local/cuda/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_SIZE_=64
#$ TOP=/usr/local/cuda/bin/..
#$ NVVMIR_LIBRARY_DIR=/usr/local/cuda/bin/../nvvm/libdevice
#$ LD_LIBRARY_PATH=/usr/local/cuda/bin/../lib::/srv/train/kaldi/egs/speaktoit/s5/tools/mitlm-svn/lib:/srv/train/kaldi/egs/speaktoit/s5/../../../tools/openfst-1.3.4/lib
#$ PATH=/usr/local/cuda/bin/../open64/bin:/usr/local/cuda/bin/../nvvm/bin:/usr/local/cuda/bin:/srv/train/kaldi/egs/speaktoit/s5/utils/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/bin:/srv/train/kaldi/egs/speaktoit/s5/../../../tools/openfst/bin:/srv/train/kaldi/egs/speaktoit/s5/../../../src/fstbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/gmmbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/featbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/lm/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/sgmmbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/sgmm2bin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/fgmmbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/latbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/nnetbin:/srv/train/kaldi/egs/speaktoit/s5/../../../src/nnet2bin:/srv/train/kaldi/egs/speaktoit/s5/../../../src/nnet3bin:/srv/train/kaldi/egs/speaktoit/s5/../../../src/online2bin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/ivectorbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/lmbin/:/srv/train/kaldi/egs/speaktoit/s5/../../../src/chainbin:/srv/train/kaldi/egs/speaktoit/s5/../../../src/kwsbin:/srv/train/kaldi/egs/speaktoit/s5:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/local/cuda/bin:/opt/sge6/bin/linux-x64
#$ INCLUDES="-I/usr/local/cuda/bin/..//include"  
#$ LIBRARIES=  "-L/usr/local/cuda/bin/..//lib64"
#$ CUDAFE_FLAGS=
#$ OPENCC_FLAGS=
#$ PTXAS_FLAGS=
#$ gcc -D__CUDA_ARCH__=100 -E -x c++    -DCUDA_FLOAT_MATH_FUNCTIONS -DCUDA_NO_SM_11_ATOMIC_INTRINSICS -DCUDA_NO_SM_12_ATOMIC_INTRINSICS -DCUDA_NO_SM_13_DOUBLE_INTRINSICS  -D__CUDACC__ -D__NVCC__  -fPIC -I"/usr/local/cuda/include" -I"../" "-I/usr/local/cuda/bin/..//include"   -D"HAVE_CUDA" -include "cuda_runtime.h" -m64 -g -o "/tmp/tmpxft_00001255_00000000-18_chain-kernels.compute_10.cpp1.ii" "chain-kernels.cu
#$ cudafe --m64 --gnu_version=40703 -tused --no_remove_unneeded_entities  --gen_c_file_name "/tmp/tmpxft_00001255_00000000-3_chain-kernels.compute_10.cudafe1.c" --stub_file_name "/tmp/tmpxft_00001255_00000000-3_chain-kernels.compute_10.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_00001255_00000000-3_chain-kernels.compute_10.cudafe1.gpu" --nv_arch "compute_10" --gen_module_id_file --module_id_file_name "/tmp/tmpxft_00001255_00000000-2_chain-kernels.module_id" --include_file_name "tmpxft_00001255_00000000-1_chain-kernels.fatbin.c" "/tmp/tmpxft_00001255_00000000-18_chain-kernels.compute_10.cpp1.ii" 
chain-kernels.cu(28): error: identifier "atomicExch" is undefined
          detected during:
            instantiation of "void atomic_add(Real *, Real) [with Real=BaseFloat]" 
(47): here
            instantiation of "void atomic_add_thresholded(Real *, Real) [with Real=BaseFloat]" 
(211): here

1 error detected in the compilation of "/tmp/tmpxft_00001255_00000000-18_chain-kernels.compute_10.cpp1.ii".
# --error 0x2 --
make[1]: *** [chain-kernels.o] Error 2
make[1]: *** Waiting for unfinished jobs....g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Wno-unused-local-typedefs -Winit-self -DHAVE_EXECINFO_H=1 -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS -I/srv/train/kaldi/tools/ATLAS/include -I/srv/train/kaldi/tools/openfst/include  -g  -DHAVE_CUDA -I/usr/local/cuda/include   -c -o mixup-nnet.o mixup-nnet.cc


Should I use newer cuda?

Jan Trmal

unread,
Feb 29, 2016, 2:38:33 PM2/29/16
to kaldi-de...@googlegroups.com, Dan Povey
I believe atomicExch is supported from compute_13 or something like that -- compute_10 might be too old. It's no particularly issue of cuda version, it's more like issue of the target device (compute capability).
y.

--
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To post to this group, send email to kaldi-de...@googlegroups.com.
Visit this group at https://groups.google.com/group/kaldi-developers.

Ilya Platonov

unread,
Feb 29, 2016, 2:46:05 PM2/29/16
to kaldi-developers, dpo...@gmail.com
So does it mean I can not compile kaldi with cuda on this device? This is amazon ec2 g2.2xlarge 

I am all new to this cuda related stuff.

Vijayaditya Peddinti

unread,
Feb 29, 2016, 2:47:46 PM2/29/16
to kaldi-developers, Daniel Povey
We have used Kaldi with this particular ec2 instance type before.

--Vijay

Jan Trmal

unread,
Feb 29, 2016, 2:50:46 PM2/29/16
to kaldi-de...@googlegroups.com, Dan Povey
my guess from looking at the makefile would be update the cuda  to version higher than 6.5
That amazon machine has fairly modern architecture, i.e. it should work.
y.

Jan Trmal

unread,
Feb 29, 2016, 2:52:37 PM2/29/16
to kaldi-de...@googlegroups.com, Dan Povey
or just modify the cumatrix/Makefile so that the lines
 51   #For toolkit older than 6.5, add the compute capability 1.0                    
52   CUDA_VER_GT_6_5 := $(shell [ $(CUDA_VERSION) -ge 65 ] && echo true)            
53   ifneq ($(CUDA_VER_GT_6_5), true)                                               
54     CUDA_ARCH += -gencode arch=compute_13,code=sm_13 \                           
55                  -gencode arch=compute_10,code=sm_10                             
56   endif      

look like
 51   #For toolkit older than 6.5, add the compute capability 1.0                    
52   CUDA_VER_GT_6_5 := $(shell [ $(CUDA_VERSION) -ge 65 ] && echo true)            
53   ifneq ($(CUDA_VER_GT_6_5), true)                                               
54     CUDA_ARCH += -gencode arch=compute_13,code=sm_13                           
55   endif      

Daniel Povey

unread,
Feb 29, 2016, 2:52:41 PM2/29/16
to Jan Trmal, kaldi-developers
If you don't need the chain code, then you can just remove 'chain' and 'chainbin' from the targets in the Makefile.
Dan


On Mon, Feb 29, 2016 at 2:50 PM, Jan Trmal <jtr...@gmail.com> wrote:

Ilya Platonov

unread,
Feb 29, 2016, 2:54:21 PM2/29/16
to kaldi-developers, dpo...@gmail.com

I have been using ec2 instances with cuda for quite some time now on the same setup. Compilation fails on new chain models code, which I wanted to try out here.

It does compile find on my local machine

Here is some version info
root@master:/srv/train/kaldi/src/chain# cudafe++ -v
cudafe: NVIDIA (R) Cuda Language Front End
Portions Copyright (c) 2005-2013 NVIDIA Corporation
Portions Copyright (c) 1988-2013 Edison Design Group Inc.
Based on Edison Design Group C/C++ Front End, version 4.5 (Jul 17 2013 18:38:05)
Cuda compilation tools, release 5.5, V5.5.0

Daniel Povey

unread,
Feb 29, 2016, 2:56:20 PM2/29/16
to Ilya Platonov, kaldi-developers
I am assuming that 5.5 version is on the ec2 instance.  That's too old, I think the atomicExch is only supported from 6.5.  You have to update your CUDA, else there is no chance for it to work.
Dan

Ilya Platonov

unread,
Feb 29, 2016, 2:56:29 PM2/29/16
to kaldi-developers, dpo...@gmail.com
I mean it compiles fine on my local machine.

Jan Trmal

unread,
Feb 29, 2016, 2:59:21 PM2/29/16
to kaldi-de...@googlegroups.com, Dan Povey
Upgrading the cuda should solve your issue.
y.

On Mon, Feb 29, 2016 at 2:56 PM, Ilya Platonov <rea...@gmail.com> wrote:
I mean it compiles fine on my local machine.

--
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To post to this group, send email to kaldi-de...@googlegroups.com.
Visit this group at https://groups.google.com/group/kaldi-developers.

Ilya Platonov

unread,
Feb 29, 2016, 5:32:58 PM2/29/16
to kaldi-developers, dpo...@gmail.com
Yes upgrading to 6.5 solved this. Thank you.

Ilya Platonov

unread,
Mar 2, 2016, 5:10:41 PM3/2/16
to kaldi-developers, dpo...@gmail.com
I am trying to do chain training and I got bunch of issues:
I used this script as base https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/run_tdnn_2o.sh tweaked it a bit to fit my training data
1) touch: cannot touch 'exp/chain/tdnn_2o/egs/.nodelete': No such file or directory -- pretty straight forward, egs folder does not exist on line 211
2) 
steps/nnet3/chain/train_tdnn.sh: line 310: [: 1: unary operator expected
steps/nnet3/chain/train_tdnn.sh: line 311: [: 1: unary operator expected

Not sure why there is a error like this, but assume I can just ignore it.

3) Then it fails here:
steps/nnet3/chain/train_tdnn.sh: getting preconditioning matrix for input features.
queue.pl: 20 / 20 failed, log is in exp/chain/tdnn_2o/log/get_lda_stats.*.log


Then in log I get this

# Accounting: time=0 threads=1
# Finished at Wed Mar 2 21:58:56 UTC 2016 with status 255
# Running on node004
# Started at Wed Mar 2 21:58:54 UTC 2016
# nnet3-chain-acc-lda-stats --rand-prune=4.0 exp/chain/tdnn_2o/init.raw ark:exp/chain/tdnn_2o/egs/cegs.11.ark exp/chain/tdnn_2o/11.lda_stats 
nnet3-chain-acc-lda-stats --rand-prune=4.0 exp/chain/tdnn_2o/init.raw ark:exp/chain/tdnn_2o/egs/cegs.11.ark exp/chain/tdnn_2o/11.lda_stats 
WARNING (nnet3-chain-acc-lda-stats:Open():util/kaldi-table-inl.h:353) TableReader: failed to open stream exp/chain/tdnn_2o/egs/cegs.11.ark
ERROR (nnet3-chain-acc-lda-stats:SequentialTableReader():util/kaldi-table-inl.h:534) Error constructing TableReader: rspecifier is ark:exp/chain/tdnn_2o/egs/cegs.11.ark
ERROR (nnet3-chain-acc-lda-stats:SequentialTableReader():util/kaldi-table-inl.h:534) Error constructing TableReader: rspecifier is ark:exp/chain/tdnn_2o/egs/cegs.11.ark

[stack trace: ]
kaldi::KaldiGetStackTrace()
kaldi::KaldiErrorMessage::~KaldiErrorMessage()
kaldi::SequentialTableReader<kaldi::KaldiObjectHolder<kaldi::nnet3::NnetChainExample> >::SequentialTableReader(std::string const&)
nnet3-chain-acc-lda-stats(main+0x2e2) [0x82aace]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x2ae76f29bea5]
nnet3-chain-acc-lda-stats() [0x82a729]


So there are no exp/chain/tdnn_2o/egs/cegs.11.ark files in egs dir, but it has a lot of cegs_orig files.

Here is a full log:

root@master:/srv/train/kaldi/egs/speaktoit/s5# local/run_tdnn_2o.sh  --stage 12 2>&1 | tee train.log
local/run_tdnn_2o.sh --stage 12
steps/nnet3/chain/train_tdnn.sh --stage -10 --apply-deriv-weights false --lm-opts --num-extra-lm-states=2000 --get-egs-stage -10 --minibatch-size 128 --egs-opts --frames-overlap-per-eg
 0 --frames-per-eg 150 --num-epochs 8 --num-jobs-initial 3 --num-jobs-final 8 --splice-indexes -2,-1,0,1,2 -1,2 -3,3 -6,3 -6,3 --feat-type raw --cmvn-opts --norm-means=false --norm-var
s=false --initial-effective-lrate 0.001 --final-effective-lrate 0.0001 --max-param-change 1.0 --final-layer-normalize-target 0.5 --relu-dim 850 --cmd queue.pl --remove-egs false data/t
rain_hires exp/chain/tri5_2o_tree exp/tri3b_lats_nodup exp/chain/tdnn_2o
steps/nnet3/chain/train_tdnn.sh: creating phone language-model
steps/nnet3/chain/train_tdnn.sh: creating denominator FST
copy-transition-model exp/chain/tri5_2o_tree/final.mdl exp/chain/tdnn_2o/0.trans_mdl
LOG (copy-transition-model:main():copy-transition-model.cc:62) Copied transition model.
am-info exp/chain/tdnn_2o/0.trans_mdl
steps/nnet3/chain/train_tdnn.sh: creating neural net configs
steps/nnet3/tdnn/make_configs.py --pool-type none --include-log-softmax=false --final-layer-normalize-target 0.5 --splice-indexes -2,-1,0,1,2 -1,2 -3,3 -6,3 -6,3 --feat-dim 40 --ivector-dim 0 --relu-dim 850 --num-targets 7343 --use-presoftmax-prior-scale false exp/chain/tdnn_2o/configs
Append(Offset(input, -2), Offset(input, -1), input, Offset(input, 1), Offset(input, 2))
steps/nnet3/chain/train_tdnn.sh: calling get_egs.sh
steps/nnet3/chain/get_egs.sh --frames-overlap-per-eg 0 --cmvn-opts --norm-means=false --norm-vars=false --feat-type raw --transform-dir exp/tri3b_lats_nodup --left-context 1 --right-context 1 --frames-per-iter 800000 --stage -10 --cmd queue.pl --right-tolerance 10 --left-tolerance 5 --frames-per-eg 150 --frame-subsampling-factor 3 data/train_hires exp/chain/tdnn_2o exp/tri3b_lats_nodup exp/chain/tdnn_2o/egs
File data/train_hires/utt2uniq exists, so augmenting valid_uttlist to
include all perturbed versions of the same 'real' utterances.
steps/nnet3/chain/get_egs.sh: feature type is raw
steps/nnet3/chain/get_egs.sh: working out number of frames of training data
steps/nnet3/chain/get_egs.sh: working out feature dim
feat-to-dim 'ark,s,cs:utils/filter_scp.pl --exclude exp/chain/tdnn_2o/egs/valid_uttlist data/train_hires/split15/1/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_hires/split15/1/utt2spk scp:data/train_hires/split15/1/cmvn.scp scp:- ark:- |' -
apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_hires/split15/1/utt2spk scp:data/train_hires/split15/1/cmvn.scp scp:- ark:-
WARNING (feat-to-dim:Close():kaldi-io.cc:496) Pipe utils/filter_scp.pl --exclude exp/chain/tdnn_2o/egs/valid_uttlist data/train_hires/split15/1/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_hires/split15/1/utt2spk scp:data/train_hires/split15/1/cmvn.scp scp:- ark:- | had nonzero return status 36096
steps/nnet3/chain/get_egs.sh: creating 1184 archives, each with 5329 egs, with
steps/nnet3/chain/get_egs.sh:   150 labels per example, and (left,right) context = (1,1)
steps/nnet3/chain/get_egs.sh: copying training lattices
steps/nnet3/chain/get_egs.sh: Getting validation and training subset examples.
steps/nnet3/chain/get_egs.sh: ... extracting validation and training-subset alignments.
... Getting subsets of validation examples for diagnostics and combination.
steps/nnet3/chain/get_egs.sh: Generating training examples on disk


steps/nnet3/chain/get_egs.sh: recombining and shuffling order of archives on disk
steps/nnet3/chain/get_egs.sh: removing temporary archives
steps/nnet3/chain/get_egs.sh: removing temporary lattices
steps/nnet3/chain/get_egs.sh: removing temporary alignments and transforms
steps/nnet3/chain/get_egs.sh: Finished preparing training examples
steps/nnet3/chain/train_tdnn.sh: getting preconditioning matrix for input features.
queue.pl: 20 / 20 failed, log is in exp/chain/tdnn_2o/log/get_lda_stats.*.log


May be I should try to use wsj version of script instead of run_tdnn_2o.sh, but not sure if it will fix this.

Daniel Povey

unread,
Mar 2, 2016, 5:20:51 PM3/2/16
to Ilya Platonov, kaldi-developers
I think you may have hit a code path in the scripts that is buggy and was not tested, where $archives_multiple != 1.
I'm looking at it right now and will try to commit a fix.
Dan

Daniel Povey

unread,
Mar 2, 2016, 5:44:42 PM3/2/16
to Ilya Platonov, kaldi-developers
OK, I committed a couple of fixes.
To save re-dumping egs, you can go into the egs directory and do

 for x in egs.*.ark; do mv $x c$x; done

and then rerun.
However, I recommend that you run a more recent script- for example, check out the chain branch, and run the 6h script.  2o is a very old script and will not give the best results.

Dan

Daniel Povey

unread,
Mar 2, 2016, 5:47:03 PM3/2/16
to Ilya Platonov, kaldi-developers
... oh, and  you should rerun with --train-stage -3 to avoid re-dumping egs.
Dan

Ilya Platonov

unread,
Mar 6, 2016, 1:18:31 PM3/6/16
to kaldi-developers, rea...@gmail.com, dpo...@gmail.com
So I finally got my first chain models results on tdnn_o2.

It both faster (more then 2 times) and have significant WER improvement compared to my nnet2/tdnn results. And I have not finished training yet, so WER will probably improve.

So Dan and everyone else who is working on this - good job.

I do not use ivectors.

Now, I would love to have online decoder and put this into production :).

Daniel Povey

unread,
Mar 7, 2016, 1:15:40 AM3/7/16
to Ilya Platonov, kaldi-developers
That's great news!
There has been some improvements since then.  After spending a while improving results with the so-called 'jesus-layer' we discovered that a regular ReLU-based TDNN could do even better (it's not clear why the jesus-layer initially helped, perhaps something was going wrong in training at that time).
Anyway, 6z (present in branch 'chain') is the current recommended setup.
I hope to have an online-decoding setup for these models checked in within 2 months' time.  It's not very hard, but we have to first decide what types of features to support (especially as regards iVectors).

Dan

Ilya Platonov

unread,
Mar 17, 2016, 4:54:28 PM3/17/16
to kaldi-developers, dpo...@gmail.com
What do I do if I want to reduce computation time for chain models by factor 1.5, 2 or 4. What parameters should I tune in my training?

Daniel Povey

unread,
Mar 17, 2016, 5:14:52 PM3/17/16
to Ilya Platonov, kaldi-developers
I assume you are talking about the computation in test time.  Let's assume we're talking about the nnet computation, not the graph search (which can be controlled by max-active, beam, etc.).

The main thing you can do to reduce this is to reduce the model size- principally the relu-dim, but you'd probably also want to reduce the --num-leaves a bit.

You could also set all the --frame-subsampling-factor and --alignment-subsampling-factor options to a number more than 3 (e.g., 4), but to get a speedup you'd have to modify all the splicing indexes like -3,0,3 to -4,0,4.  I tried this in one of the tuning scripts that's checked in, the WER difference was quite small.  That would give you a 33% speedup.
 


Dan

Vijayaditya Peddinti

unread,
Mar 17, 2016, 5:16:42 PM3/17/16
to kaldi-developers, Ilya Platonov
If you reduce relu-dim you might want to also reduce the regularization constants (leaky-hmm-coefficient and xent-regularize) introduced to avoid over-fitting.

--Vijay

--
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To post to this group, send email to kaldi-de...@googlegroups.com.
Visit this group at https://groups.google.com/group/kaldi-developers.

Ilya Platonov

unread,
Mar 17, 2016, 5:26:45 PM3/17/16
to kaldi-developers, rea...@gmail.com
Thanks for the tips.

Ilya Platonov

unread,
Apr 5, 2016, 1:53:29 PM4/5/16
to kaldi-developers, dpo...@gmail.com
We successfully used chain models for this demo on Artik 10 device:

It is almost the same model as used here  https://github.com/kaldi-asr/kaldi/tree/master/egs/apiai_decode/s5, but a bit tweaked version to achieve realtime recognition (it was almost realtime without tweaks). 
I changed relu-dim to 650 and  frame-subsampling-factor to 4 during training.

Ilya Platonov

unread,
Apr 15, 2016, 2:16:08 PM4/15/16
to kaldi-developers, dpo...@gmail.com
What is the best latest script to use for training (in master branch)? I do use tdnn_2o right now.

On Monday, December 14, 2015 at 1:52:38 PM UTC-8, Dan Povey wrote:

Everyone,

I have decided the 'chain' models are ready to be publicized a bit more widely.
Rather than doing this all by email I prepared a documentation page:
(note, this is in a 'doc2/' version of the docs, not the normal 'doc/' location).

This is the outcome of all my experimentation with CTC; in the end I couldn't get improvement with CTC versus our best models (BTW, I hear Microsoft Research has had a similar experience), but with these 'chain' models I was able to use a similar sequence-level objective function and actually get some improvements, plus the speed advantages of the 3-fold frame subsampling.

I would appreciate some help from others in testing this stuff out, developing and tuning recipes for other corpora, and improving the GPU implementation; the documentation page says what the TODOs are.

Dan

Daniel Povey

unread,
Apr 15, 2016, 3:39:34 PM4/15/16
to Ilya Platonov, kaldi-developers
I'd recommend the 6z script in Switchboard.  It should be in master by now.
Dan

Reply all
Reply to author
Forward
0 new messages