i vector error while using train_ivector_extractor.sh

1,173 views
Skip to first unread message

永裕高

unread,
Mar 27, 2018, 9:08:10 AM3/27/18
to kaldi-help

Hi,
I got the error as figure shown, can somebody help me to fix it.

I was doing egs/aishell/v1/run.sh using aishell official dataset,it always came up with this problem

I have used the large memory machine and reduced num of jobs, but still can not finish ivector training.

Daniel Povey

unread,
Mar 27, 2018, 1:42:43 PM3/27/18
to kaldi-help
That's too little of the output for me to diagnose what went wrong.
You need to learn to paste as text.

Dan


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/69b90481-05ca-474d-bcea-a6fa72c16896%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

永裕高

unread,
Mar 27, 2018, 9:48:08 PM3/27/18
to kaldi-help
Sorry Dan,
while runing train_ivector_extractor.sh
Inside the log:

LOG (apply-cmvn-sliding[5.4.64~1-73527]:main():apply-cmvn-sliding.cc:75) Applied sliding-window cepstral mean normalization to 3247 utterances, 0 had errors.
LOG (select-voiced-frames[5.4.64~1-73527]:main():select-voiced-frames.cc:105) Done selecting voiced frames; processed 3247 utterances, 0 had errors.
LOG (ivector-extractor-acc-stats[5.4.64~1-73527]:main():ivector-extractor-acc-stats.cc:151) Done 3247 files, 0 with errors.  Total frames 1232807
LOG (ivector-extractor-acc-stats[5.4.64~1-73527]:main():ivector-extractor-acc-stats.cc:159) Wrote stats to -
ERROR (ivector-extractor-sum-accs[5.4.64~1-73527]:ExpectToken():io-funcs.cc:203) Failed to read token [started at file position -1], expected <IvectorExtractorStats>

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::ExpectToken(std::istream&, bool, char const*)
kaldi::IvectorExtractorStats::Read(std::istream&, bool, bool)
main
__libc_start_main
ivector-extractor-sum-accs() [0x40b6a9]


I have tried reducing the num of jobs and used a large memory machine, still output that

Yongyu


On Wednesday, March 28, 2018 at 1:42:43 AM UTC+8, Dan Povey wrote:
That's too little of the output for me to diagnose what went wrong.
You need to learn to paste as text.

Dan

On Tue, Mar 27, 2018 at 9:08 AM, 永裕高 <gyurob...@gmail.com> wrote:

Hi,
I got the error as figure shown, can somebody help me to fix it.

I was doing egs/aishell/v1/run.sh using aishell official dataset,it always came up with this problem

I have used the large memory machine and reduced num of jobs, but still can not finish ivector training.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,
Mar 27, 2018, 10:02:16 PM3/27/18
to kaldi-help
the real error was probably a 'std::bad_alloc' elsewhere in the log.
Try reducing the num-jobs all the way to 1. 

Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

永裕高

unread,
Mar 27, 2018, 10:31:58 PM3/27/18
to kaldi-help
Thank you Dan, I am going to set nj=1 and num thread and process to 4 and see what happen



Gao

1 2

unread,
Feb 24, 2019, 6:57:58 AM2/24/19
to kaldi-help
Running ‘local/nnet3/run_tdnn.sh’ in ‘aishell/s5’, after some stages of creating neural nets and have this exception
2019-02-24 08:21:01,386 [steps/nnet3/train_dnn.py:227 - train - INFO ] Initializing a basic network for estimating preconditioning matrix
/bin/sh: 1: None: not found
Traceback (most recent call last):
  File "steps/nnet3/train_dnn.py", line 456, in main
    train(args, run_opts)
  File "steps/nnet3/train_dnn.py", line 233, in train
    dir=args.dir))
  File "steps/libs/common.py", line 158, in execute_command
    p.returncode, command))
Exception: Command exited with status 127: None exp/nnet3/tdnn_sp/log/nnet_init.log                     nnet3-init --srand=-2 exp/nnet3/tdnn_sp/configs/init.config                     exp/nnet3/tdnn_sp/init.raw



here is the log in context………

creating neural net configs
tree-info exp/tri5a_sp_ali/tree
steps/nnet3/xconfig_to_configs.py --xconfig-file exp/nnet3/tdnn_sp/configs/network.xconfig --config-dir exp/nnet3/tdnn_sp/configs/
nnet3-init exp/nnet3/tdnn_sp/configs//init.config exp/nnet3/tdnn_sp/configs//init.raw
LOG (nnet3-init[5.3.24~1-c948]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to exp/nnet3/tdnn_sp/configs//init.raw
nnet3-info exp/nnet3/tdnn_sp/configs//init.raw
nnet3-init exp/nnet3/tdnn_sp/configs//ref.config exp/nnet3/tdnn_sp/configs//ref.raw
LOG (nnet3-init[5.3.24~1-c948]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to exp/nnet3/tdnn_sp/configs//ref.raw
nnet3-info exp/nnet3/tdnn_sp/configs//ref.raw
nnet3-init exp/nnet3/tdnn_sp/configs//ref.config exp/nnet3/tdnn_sp/configs//ref.raw
LOG (nnet3-init[5.3.24~1-c948]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to exp/nnet3/tdnn_sp/configs//ref.raw
nnet3-info exp/nnet3/tdnn_sp/configs//ref.raw
2019-02-24 08:20:50,437 [steps/nnet3/train_dnn.py:36 - <module> - INFO ] Starting DNN trainer (train_dnn.py)
steps/nnet3/train_dnn.py --stage=-10 --cmd= --feat.online-ivector-dir exp/nnet3/ivectors_train_sp --feat.cmvn-opts=--norm-means=false --norm-vars=false --trainer.num-epochs 4 --trainer.optimization.num-jobs-initial 2 --trainer.optimization.num-jobs-final 12 --trainer.optimization.initial-effective-lrate 0.0015 --trainer.optimization.final-effective-lrate 0.00015 --egs.dir  --cleanup.remove-egs true --cleanup.preserve-model-interval 500 --use-gpu true --feat-dir=data/train_sp_hires --ali-dir exp/tri5a_sp_ali --lang data/lang --reporting.email= --dir=exp/nnet3/tdnn_sp
['steps/nnet3/train_dnn.py', '--stage=-10', '--cmd=', '--feat.online-ivector-dir', 'exp/nnet3/ivectors_train_sp', '--feat.cmvn-opts=--norm-means=false --norm-vars=false', '--trainer.num-epochs', '4', '--trainer.optimization.num-jobs-initial', '2', '--trainer.optimization.num-jobs-final', '12', '--trainer.optimization.initial-effective-lrate', '0.0015', '--trainer.optimization.final-effective-lrate', '0.00015', '--egs.dir', '', '--cleanup.remove-egs', 'true', '--cleanup.preserve-model-interval', '500', '--use-gpu', 'true', '--feat-dir=data/train_sp_hires', '--ali-dir', 'exp/tri5a_sp_ali', '--lang', 'data/lang', '--reporting.email=', '--dir=exp/nnet3/tdnn_sp']



2019-02-24 08:20:50,437 [steps/nnet3/train_dnn.py:36 - <module> - INFO ] Starting DNN trainer (train_dnn.py)
steps/nnet3/train_dnn.py --stage=-10 --cmd= --feat.online-ivector-dir exp/nnet3/ivectors_train_sp --feat.cmvn-opts=--norm-means=false --norm-vars=false --trainer.num-epochs 4 --trainer.optimization.num-jobs-initial 2 --trainer.optimization.num-jobs-final 12 --trainer.optimization.initial-effective-lrate 0.0015 --trainer.optimization.final-effective-lrate 0.00015 --egs.dir  --cleanup.remove-egs true --cleanup.preserve-model-interval 500 --use-gpu true --feat-dir=data/train_sp_hires --ali-dir exp/tri5a_sp_ali --lang data/lang --reporting.email= --dir=exp/nnet3/tdnn_sp
['steps/nnet3/train_dnn.py', '--stage=-10', '--cmd=', '--feat.online-ivector-dir', 'exp/nnet3/ivectors_train_sp', '--feat.cmvn-opts=--norm-means=false --norm-vars=false', '--trainer.num-epochs', '4', '--trainer.optimization.num-jobs-initial', '2', '--trainer.optimization.num-jobs-final', '12', '--trainer.optimization.initial-effective-lrate', '0.0015', '--trainer.optimization.final-effective-lrate', '0.00015', '--egs.dir', '', '--cleanup.remove-egs', 'true', '--cleanup.preserve-model-interval', '500', '--use-gpu', 'true', '--feat-dir=data/train_sp_hires', '--ali-dir', 'exp/tri5a_sp_ali', '--lang', 'data/lang', '--reporting.email=', '--dir=exp/nnet3/tdnn_sp']
2019-02-24 08:20:50,522 [steps/nnet3/train_dnn.py:177 - train - INFO ] Arguments for the experiment
{'ali_dir': 'exp/tri5a_sp_ali',
 'backstitch_training_interval': 1,
 'backstitch_training_scale': 0.0,
 'cleanup': True,
 'cmvn_opts': '--norm-means=false --norm-vars=false',
 'combine_sum_to_one_penalty': 0.0,
 'command': None,
 'compute_per_dim_accuracy': False,
 'dir': 'exp/nnet3/tdnn_sp',
 'do_final_combination': True,
 'dropout_schedule': None,
 'egs_command': None,
 'egs_dir': None,
 'egs_opts': None,
 'egs_stage': 0,
 'email': None,
 'exit_stage': None,
 'feat_dir': 'data/train_sp_hires',
 'final_effective_lrate': 0.00015,
 'frames_per_eg': 8,
 'initial_effective_lrate': 0.0015,
 'input_model': None,
 'lang': 'data/lang',
 'max_lda_jobs': 10,
 'max_models_combine': 20,
 'max_objective_evaluations': 30,
 'max_param_change': 2.0,
 'minibatch_size': '512',
 'momentum': 0.0,
 'num_epochs': 4.0,
 'num_jobs_compute_prior': 10,
 'num_jobs_final': 12,
 'num_jobs_initial': 2,
 'online_ivector_dir': 'exp/nnet3/ivectors_train_sp',
 'preserve_model_interval': 500,
 'presoftmax_prior_scale_power': -0.25,
 'prior_subset_size': 20000,
 'proportional_shrink': 0.0,
 'rand_prune': 4.0,
 'remove_egs': True,
 'reporting_interval': 0.1,
 'samples_per_iter': 400000,
 'shuffle_buffer_size': 5000,
 'srand': 0,
 'stage': -10,
 'train_opts': [],
 'use_gpu': 'yes'}
2019-02-24 08:21:01,386 [steps/nnet3/train_dnn.py:227 - train - INFO ] Initializing a basic network for estimating preconditioning matrix
/bin/sh: 1: None: not found
Traceback (most recent call last):
  File "steps/nnet3/train_dnn.py", line 456, in main
    train(args, run_opts)
  File "steps/nnet3/train_dnn.py", line 233, in train
    dir=args.dir))
  File "steps/libs/common.py", line 158, in execute_command
    p.returncode, command))
Exception: Command exited with status 127: None exp/nnet3/tdnn_sp/log/nnet_init.log                     nnet3-init --srand=-2 exp/nnet3/tdnn_sp/configs/init.config                     exp/nnet3/tdnn_sp/init.raw

I would appreciate your help.


On Tuesday, 27 March 2018 19:42:43 UTC+2, Dan Povey wrote:
That's too little of the output for me to diagnose what went wrong.
You need to learn to paste as text.

Dan

On Tue, Mar 27, 2018 at 9:08 AM, 永裕高 <gyurob...@gmail.com> wrote:

Hi,
I got the error as figure shown, can somebody help me to fix it.

I was doing egs/aishell/v1/run.sh using aishell official dataset,it always came up with this problem

I have used the large memory machine and reduced num of jobs, but still can not finish ivector training.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,
Feb 24, 2019, 3:09:52 PM2/24/19
to kaldi-help
That script is outdated and no longer works.  You should use the script local/chain/run_tdnn.sh  to train your model, it works and also will give much better performance.

gaoxing...@163.com

unread,
Feb 25, 2019, 10:08:56 PM2/25/19
to kaldi-help, dpovey
Hi, Dan
I found there will be an error occured when the Dither() function called in multi-thread scinario. How to revised these codes under this situation? Thanks.


static std::mutex _RandMutex;

int Rand(struct RandomState* state) {
#if defined(_MSC_VER) || defined(__CYGWIN__)
  // On Windows and Cygwin, just call Rand()
  return rand();
#else
  if (state) {
  std::unique_lock<std::mutex> lock(_RandMutex);
  return rand_r(&(state->seed));
  } else {
    std::lock_guard<std::mutex> lock(_RandMutex);
    return rand();
  }
#endif
}





 

Daniel Povey

unread,
Feb 25, 2019, 10:59:48 PM2/25/19
to kaldi-help
On what platform?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Feb 25, 2019, 11:50:34 PM2/25/19
to gaoxing...@163.com, kaldi-help
Oh.
That's not an error, it is expected behavior.  If you want completely deterministic behavior, you should change the MFCC config to set dither=0.0 and energy-floor=0.1 or something like that. (I think the units are such that this is a suitable value).
Dan

On Mon, Feb 25, 2019 at 11:46 PM gaoxing...@163.com <gaoxing...@163.com> wrote:
I should discrib this error more concisely.

when data sent to feature_pipeline_
before Dither() is called,  they are same results between single thread scinario and multi-thread scinario.
But after Dither() is called, they are not equal.
So, I think there is a difference occured in Dither() function.




gaoxing...@163.com

unread,
Feb 26, 2019, 12:49:15 AM2/26/19
to kaldi-help

1 2

unread,
Feb 26, 2019, 5:55:12 PM2/26/19
to kaldi-help

As @Dan suggested, I switched from ‘local/nnet3/run_tdnn.sh’ to ‘local/chain/run_tdnn.sh’ in ‘aishell/s5’, after building a tree and it threw this exception “ Exception: Command exited with status 127: None exp/chain/tdnn_1a_sp/log/make_phone_lm.log”, I checked that ‘exp/chain/tdnn_1a_sp’ exists, but not ‘exp/chain/tdnn_1a_sp/log/make_phone_lm.log’

2019-02-26 17:01:21,015 [steps/nnet3/chain/train.py:327 - train - INFO ] Creating phone language-model

/bin/sh: 1: None: not found
Traceback (most recent call last):
  File "steps/nnet3/chain/train.py", line 624, in main
    train(args, run_opts)
  File "steps/nnet3/chain/train.py", line 329, in train
    lm_opts=args.lm_opts)
  File "steps/libs/nnet3/train/chain_objf/acoustic_model.py", line 50, in create_phone_lm
    tree_dir=tree_dir))

  File "steps/libs/common.py", line 158, in execute_command
    p.returncode, command))
Exception: Command exited with status 127: None exp/chain/tdnn_1a_sp/log/make_phone_lm.log             gunzip -c exp/chain/tri6_7d_tree_sp/ali.1.gz exp/chain/tri6_7d_tree_sp/ali.2.gz exp/chain/tri6_7d_tree_sp/ali.3.gz exp/chain/tri6_7d_tree_sp/ali.4.gz exp/chain/tri6_7d_tree_sp/ali.5.gz exp/chain/tri6_7d_tree_sp/ali.6.gz exp/chain/tri6_7d_tree_sp/ali.7.gz exp/chain/tri6_7d_tree_sp/ali.8.gz exp/chain/tri6_7d_tree_sp/ali.9.gz exp/chain/tri6_7d_tree_sp/ali.10.gz exp/chain/tri6_7d_tree_sp/ali.11.gz exp/chain/tri6_7d_tree_sp/ali.12.gz exp/chain/tri6_7d_tree_sp/ali.13.gz exp/chain/tri6_7d_tree_sp/ali.14.gz exp/chain/tri6_7d_tree_sp/ali.15.gz exp/chain/tri6_7d_tree_sp/ali.16.gz exp/chain/tri6_7d_tree_sp/ali.17.gz exp/chain/tri6_7d_tree_sp/ali.18.gz exp/chain/tri6_7d_tree_sp/ali.19.gz exp/chain/tri6_7d_tree_sp/ali.20.gz exp/chain/tri6_7d_tree_sp/ali.21.gz exp/chain/tri6_7d_tree_sp/ali.22.gz exp/chain/tri6_7d_tree_sp/ali.23.gz exp/chain/tri6_7d_tree_sp/ali.24.gz exp/chain/tri6_7d_tree_sp/ali.25.gz exp/chain/tri6_7d_tree_sp/ali.26.gz exp/chain/tri6_7d_tree_sp/ali.27.gz exp/chain/tri6_7d_tree_sp/ali.28.gz exp/chain/tri6_7d_tree_sp/ali.29.gz exp/chain/tri6_7d_tree_sp/ali.30.gz \|             ali-to-phones exp/chain/tri6_7d_tree_sp/final.mdl ark:- ark:- \|             chain-est-phone-lm --num-extra-lm-states=2000 ark:- exp/chain/tdnn_1a_sp/phone_lm.fst


the previous parts of the log

./local/chain/run_tdnn.sh: creating neural net configs using the xconfig parser
tree-info exp/chain/tri6_7d_tree_sp/tree
steps/nnet3/xconfig_to_configs.py --xconfig-file exp/chain/tdnn_1a_sp/configs/network.xconfig --config-dir exp/chain/tdnn_1a_sp/configs/
nnet3-init exp/chain/tdnn_1a_sp/configs//init.config exp/chain/tdnn_1a_sp/configs//init.raw
LOG (nnet3-init[5.3.24~1-c948]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to exp/chain/tdnn_1a_sp/configs//init.raw
nnet3-info exp/chain/tdnn_1a_sp/configs//init.raw
nnet3-init exp/chain/tdnn_1a_sp/configs//ref.config exp/chain/tdnn_1a_sp/configs//ref.raw
LOG (nnet3-init[5.3.24~1-c948]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to exp/chain/tdnn_1a_sp/configs//ref.raw
nnet3-info exp/chain/tdnn_1a_sp/configs//ref.raw
nnet3-init exp/chain/tdnn_1a_sp/configs//ref.config exp/chain/tdnn_1a_sp/configs//ref.raw
LOG (nnet3-init[5.3.24~1-c948]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to exp/chain/tdnn_1a_sp/configs//ref.raw
nnet3-info exp/chain/tdnn_1a_sp/configs//ref.raw
2019-02-26 17:01:09,970 [steps/nnet3/chain/train.py:35 - <module> - INFO ] Starting chain model trainer (train.py)
steps/nnet3/chain/train.py --stage -10 --cmd  --feat.online-ivector-dir exp/nnet3/ivectors_train_sp --feat.cmvn-opts --norm-means=false --norm-vars=false --chain.xent-regularize 0.1 --chain.leaky-hmm-coefficient 0.1 --chain.l2-regularize 0.00005 --chain.apply-deriv-weights false --chain.lm-opts=--num-extra-lm-states=2000 --egs.dir  --egs.stage -10 --egs.opts --frames-overlap-per-eg 0 --egs.chunk-width 150,110,90 --trainer.num-chunk-per-minibatch 64 --trainer.frames-per-iter 1500000 --trainer.num-epochs 4 --trainer.optimization.num-jobs-initial 1 --trainer.optimization.num-jobs-final 1 --trainer.optimization.initial-effective-lrate 0.001 --trainer.optimization.final-effective-lrate 0.0001 --trainer.max-param-change 2.0 --cleanup.remove-egs true --feat-dir data/train_sp_hires --tree-dir exp/chain/tri6_7d_tree_sp --lat-dir exp/tri5a_sp_lats --dir exp/chain/tdnn_1a_sp
['steps/nnet3/chain/train.py', '--stage', '-10', '--cmd', '', '--feat.online-ivector-dir', 'exp/nnet3/ivectors_train_sp', '--feat.cmvn-opts', '--norm-means=false --norm-vars=false', '--chain.xent-regularize', '0.1', '--chain.leaky-hmm-coefficient', '0.1', '--chain.l2-regularize', '0.00005', '--chain.apply-deriv-weights', 'false', '--chain.lm-opts=--num-extra-lm-states=2000', '--egs.dir', '', '--egs.stage', '-10', '--egs.opts', '--frames-overlap-per-eg 0', '--egs.chunk-width', '150,110,90', '--trainer.num-chunk-per-minibatch', '64', '--trainer.frames-per-iter', '1500000', '--trainer.num-epochs', '4', '--trainer.optimization.num-jobs-initial', '1', '--trainer.optimization.num-jobs-final', '1', '--trainer.optimization.initial-effective-lrate', '0.001', '--trainer.optimization.final-effective-lrate', '0.0001', '--trainer.max-param-change', '2.0', '--cleanup.remove-egs', 'true', '--feat-dir', 'data/train_sp_hires', '--tree-dir', 'exp/chain/tri6_7d_tree_sp', '--lat-dir', 'exp/tri5a_sp_lats', '--dir', 'exp/chain/tdnn_1a_sp']
2019-02-26 17:01:10,033 [steps/nnet3/chain/train.py:273 - train - INFO ] Arguments for the experiment
{'alignment_subsampling_factor': 3,
 'apply_deriv_weights': False,
 'backstitch_training_interval': 1,
 'backstitch_training_scale': 0.0,
 'chunk_left_context': 0,
 'chunk_left_context_initial': -1,
 'chunk_right_context': 0,
 'chunk_right_context_final': -1,
 'chunk_width': '150,110,90',

 'cleanup': True,
 'cmvn_opts': '--norm-means=false --norm-vars=false',
 'combine_sum_to_one_penalty': 0.0,
 'command': None,
 'compute_per_dim_accuracy': False,
 'deriv_truncate_margin': None,
 'dir': 'exp/chain/tdnn_1a_sp',

 'do_final_combination': True,
 'dropout_schedule': None,
 'egs_command': None,
 'egs_dir': None,
 'egs_opts': '--frames-overlap-per-eg 0',
 'egs_stage': -10,

 'email': None,
 'exit_stage': None,
 'feat_dir': 'data/train_sp_hires',
 'final_effective_lrate': 0.0001,
 'frame_subsampling_factor': 3,
 'frames_per_iter': 1500000,
 'initial_effective_lrate': 0.001,
 'input_model': None,
 'l2_regularize': 5e-05,
 'lat_dir': 'exp/tri5a_sp_lats',
 'leaky_hmm_coefficient': 0.1,
 'left_deriv_truncate': None,
 'left_tolerance': 5,
 'lm_opts': '--num-extra-lm-states=2000',

 'max_lda_jobs': 10,
 'max_models_combine': 20,
 'max_objective_evaluations': 30,
 'max_param_change': 2.0,
 'momentum': 0.0,
 'num_chunk_per_minibatch': '64',
 'num_epochs': 4.0,
 'num_jobs_final': 1,
 'num_jobs_initial': 1,
 'online_ivector_dir': 'exp/nnet3/ivectors_train_sp',
 'preserve_model_interval': 100,
 'presoftmax_prior_scale_power': -0.25,

 'proportional_shrink': 0.0,
 'rand_prune': 4.0,
 'remove_egs': True,
 'reporting_interval': 0.1,
 'right_tolerance': 5,
 'samples_per_iter': 400000,
 'shrink_saturation_threshold': 0.4,
 'shrink_value': 1.0,
 'shuffle_buffer_size

'srand': 0,
 'stage': -10,
 'train_opts': [],
 'tree_dir': 'exp/chain/tri6_7d_tree_sp',
 'use_gpu': 'yes',
 'xent_regularize': 0.1}

Daniel Povey

unread,
Feb 26, 2019, 5:59:48 PM2/26/19
to kaldi-help
That script has the option
--cmd "$decode_cmd" \
which could be run.pl or queue.pl
that variable must be unset.


1 2

unread,
Feb 27, 2019, 6:16:19 PM2/27/19
to kaldi-help
thanks! While that is fixed, at the iteration 1189/1319 (Epoch: 3.60/4.0 (90.1% complete) ) of training the initial acoustic model it failed (of course it continued to run):
run.pl: job failed, log is in exp/chain/tdnn_1a_sp/log/progress.1189.log
2019-02-27 22:49:37,807 [steps/libs/common.py:237 - background_command_waiter - WARNING ] Command exited with status 1: run.pl --max-jobs-run 10 exp/chain/tdnn_1a_sp/log/progress.1189.log                 nnet3-am-info exp/chain/tdnn_1a_sp/1189.mdl '&&'                 nnet3-show-progress --use-gpu=no                     "nnet3-am-copy --raw=true exp/chain/tdnn_1a_sp/1188.mdl - |"                     "nnet3-am-copy --raw=true exp/chain/tdnn_1a_sp/1189.mdl - |"
       
I don’t understand why it manually switched to use CPU towards the end of the training, as one of the arguments in [steps/nnet3/chain/train.py:271 - train - INFO ] Arguments for the experiment
 'use_gpu': True,

looking into exp/chain/tdnn_1a_sp/egs to exp/chain/tdnn_1a_sp gives ‘“ ERROR (nnet3-show-progress[5.3.24~1-c948]:ReadToken():io-funcs.cc:159) ReadToken, failed to read token at file position -1
:


# nnet3-am-info exp/chain/tdnn_1a_sp/1189.mdl && nnet3-show-progress --use-gpu=no "nnet3-am-copy --raw=true exp/chain/tdnn_1a_sp/1188.mdl - |" "nnet3-am-copy --raw=true exp/chain/tdnn_1a_sp/1189.mdl - |"
# Started at Wed Feb 27 22:49:37 CET 2019
#
nnet3-am-info exp/chain/tdnn_1a_sp/1189.mdl
input-dim: 43
ivector-dim: 100
num-pdfs: 4365
prior-dimension: 0
# Nnet info follows.
left-context: 12
right-context: 12
num-parameters: 12253730
modulus: 1
input-node name=ivector dim=100
input-node name=input dim=43
component-node name=lda component=lda input=Append(Offset(input, -1), input, Offset(input, 1), ReplaceIndex(ivector, t, 0)) input-dim=229 output-dim=229
component-node name=tdnn1.affine component=tdnn1.affine input=lda input-dim=229 output-dim=625
component-node name=tdnn1.relu component=tdnn1.relu input=tdnn1.affine input-dim=625 output-dim=625

..

nnet3-show-progress --use-gpu=no 'nnet3-am-copy --raw=true exp/chain/tdnn_1a_sp/1188.mdl - |' 'nnet3-am-copy --raw=true exp/chain/tdnn_1a_sp/1189.mdl - |'
LOG (nnet3-show-progress[5.3.24~1-c948]:SelectGpuId():cu-device.cc:110) Manually selected to compute on CPU.
nnet3-am-copy --raw=true exp/chain/tdnn_1a_sp/1188.mdl -
LOG (nnet3-am-copy[5.3.24~1-c948]:main():nnet3-am-copy.cc:140) Copied neural net from exp/chain/tdnn_1a_sp/1188.mdl to raw format as -
nnet3-am-copy --raw=true exp/chain/tdnn_1a_sp/1189.mdl -
WARNING (nnet3-show-progress[5.3.24~1-c948]:PeekToken():io-funcs.cc:182) Error ungetting '<' in PeekToken
ERROR (nnet3-show-progress[5.3.24~1-c948]:ReadToken():io-funcs.cc:159) ReadToken, failed to read token at file position -1

[ Stack-Trace: ]
nnet3-show-progress() [0x9ae156]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::ReadToken(std::istream&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)



Started like,
 
steps/nnet3/chain/get_egs.sh: Finished preparing training examples
2019-02-27 01:57:12,206 [steps/nnet3/chain/train.py:404 - train - INFO ] Copying the properties from exp/chain/tdnn_1a_sp/egs to exp/chain/tdnn_1a_sp
2019-02-27 01:57:12,244 [steps/nnet3/chain/train.py:409 - train - INFO ] Computing the preconditioning matrix for input features
2019-02-27 01:58:02,532 [steps/nnet3/chain/train.py:417 - train - INFO ] Preparing the initial acoustic model.
2019-02-27 01:58:03,880 [steps/nnet3/chain/train.py:451 - train - INFO ] Training will run for 4.0 epochs = 1320 iterations
2019-02-27 01:58:03,880 [steps/nnet3/chain/train.py:493 - train - INFO ] Iter: 0/1319    Epoch: 0.00/4.0 (0.0% complete)    lr: 0.001000   
2019-02-27 01:59:14,840 [steps/nnet3/chain/train.py:493 - train - INFO ] Iter: 1/1319    Epoch: 0.00/4.0 (0.1% complete)    lr: 0.000998   
2019-02-27 02:00:19,140 [steps/nnet3/chain/train.py:493 - train - INFO ] Iter: 2/1319    Epoch: 0.01/4.0 (0.2% complete)    lr: 0.000997   
2019-02-27 02:01:22,829 [steps/nnet3/chain/train.py:493 - train - INFO ] Iter: 3/1319    Epoch: 0.01/4.0 (0.2% complete)    lr: 0.000995   

….
2019-02-27 22:47:31,513 [steps/nnet3/chain/train.py:493 - train - INFO ] Iter: 1187/1319    Epoch: 3.60/4.0 (89.9% complete)    lr: 0.000126   
2019-02-27 22:48:34,643 [steps/nnet3/chain/train.py:493 - train - INFO ] Iter: 1188/1319    Epoch: 3.60/4.0 (90.0% complete)    lr: 0.000126   
2019-02-27 22:49:37,026 [steps/nnet3/chain/train.py:493 - train - INFO ] Iter: 1189/1319    Epoch: 3.60/4.0 (90.1% complete)    lr: 0.000126   
run.pl: job failed, log is in exp/chain/tdnn_1a_sp/log/progress.1189.log
2019-02-27 22:49:37,807 [steps/libs/common.py:237 - background_command_waiter - WARNING ] Command exited with status 1: run.pl --max-jobs-run 10 exp/chain/tdnn_1a_sp/log/progress.1189.log                 nnet3-am-info exp/chain/tdnn_1a_sp/1189.mdl '&&'                 nnet3-show-progress --use-gpu=no                     "nnet3-am-copy --raw=true exp/chain/tdnn_1a_sp/1188.mdl - |"                     "nnet3-am-copy --raw=true exp/chain/tdnn_1a_sp/1189.mdl - |"
       
2019-02-27 22:50:39,234 [steps/nnet3/chain/train.py:493 - train - INFO ] Iter: 1190/1319    Epoch: 3.61/4.0 (90.2% complete)    lr: 0.000125   
2019-02-27 22:51:42,867 [steps/nnet3/chain/train.py:493 - train - INFO ] Iter: 1191/1319    Epoch: 3.61/4.0 (90.2% complete)    lr: 0.000125   

Daniel Povey

unread,
Feb 27, 2019, 6:18:58 PM2/27/19
to kaldi-help
That was running some diagnostics; it is supposed to run on CPU.
I suspect  you were out of memory (so a sub-task got killed).
You can just re-start from that iter, it wouldn't be a problem.


1 2

unread,
Feb 28, 2019, 1:06:49 PM2/28/19
to kaldi-help

As part of learning transfer from ‘aishell’ to my language, I’m running train_lda_mllt.sh using the --use-lda-mat to provide the final.mat, set --mllt-iters ""  and dim (2500 20000 )
I’ checked that ‘aishell’ the ‘splice_opts’ options are empty, so I did the same thing for my language. But after aligning data using the provided matrix, at the stage of accumulating tree stats it throws:
bash: line 1:  3951 Aborted                 ( acc-tree-stats --ci-phones=1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50 /kaldi/exp/System1/tri2a_ali/final.mdl "ark,s,cs:apply-cmvn  --utt2spk=ark:/kaldi/data/train/split18/1/utt2spk scp:/kaldi/data/train/split18/1/cmvn.scp scp:/kaldi/data/train/split18/1/feats.scp ark:- | splice-feats  ark:- ark:- | transform-feats /kaldi/exp/System1/tri2b/0.mat ark:- ark:- |" "ark:gunzip -c /kaldi/exp/System1/tri2a_ali/ali.1.gz|" /kaldi/exp/System1/tri2b/1.treeacc ) 2>> /kaldi/exp/System1/tri2b/log/acc_tree.1.log >> /kaldi/exp/System1/tri2b/log/acc_tree.1.log
bash: line 1:  3950 Aborted  
….

looking into /kaldi/exp/System1/tri2b/log/acc_tree.1.log

ERROR (acc-tree-stats[5.3.24~1-c948]:~SequentialTableReaderArchiveImpl():util/kaldi-table-inl.h:678) TableReader: error detected closing archive 'apply-cmvn  --utt2spk=ark:/kaldi/data/trai$


# acc-tree-stats --ci-phones=1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50 /kaldi/exp/System1$
# Started at Thu Feb 28 18:39:05 CET 2019
#
acc-tree-stats --ci-phones=1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50 /kaldi/exp/System1/t$
transform-feats /kaldi/exp/System1/tri2b/0.mat ark:- ark:-
apply-cmvn --utt2spk=ark:/kaldi/data/train/split18/1/utt2spk scp:/kaldi/data/train/split18/1/cmvn.scp scp:/kaldi/data/train/split18/1/feats.scp ark:-
splice-feats ark:- ark:-
WARNING (transform-feats[5.3.24~1-c948]:main():transform-feats.cc:110) Transform matrix for utterance tr_10000_tr097082 has bad dimension 40x144 versus feat dim 117
WARNING (transform-feats[5.3.24~1-c948]:main():transform-feats.cc:110) Transform matrix for utterance tr_10001_tr097083 has bad dimension 40x144 versus feat dim 117
WARNING (transform-feats[5.3.24~1-c948]:main():transform-feats.cc:110) Transform matrix for utterance tr_10002_tr097084 has bad dimension 40x144 versus feat dim 117
WARNING (transform-feats[5.3.24~1-c948]:main():transform-feats.cc:110) Transform matrix for utterance tr_10003_tr097085 has bad dimension 40x144 versus feat dim 117
WARNING (transform-feats[5.3.24~1-c948]:main():transform-feats.cc:110) Transform matrix for utterance tr_10004_tr097086 has bad dimension 40x144 versus feat dim 117
WARNING (transform-feats[5.3.24~1-c948]:main():transform-feats.cc:110) Transform matrix for utterance tr_10005_tr097087 has bad dimension 40x144 versus feat dim 117
WARNING (transform-feats[5.3.24~1-c948]:main():transform-feats.cc:110) Transform matrix for utterance tr_10006_tr097088 has bad dimension 40x144 versus feat dim 117
WARNING (transform-feats[5.3.24~1-c948]:main():transform-feats.cc:110) Transform matrix for utterance tr_10007_tr097089 has bad dimension 40x144 versus feat dim 117

……


WARNING (acc-tree-stats[5.3.24~1-c948]:Close():kaldi-io.cc:512) Pipe gunzip -c /kaldi/exp/System1/tri2a_ali/ali.1.gz| had nonzero return status 36096
WARNING (acc-tree-stats[5.3.24~1-c948]:Close():kaldi-io.cc:512) Pipe apply-cmvn  --utt2spk=ark:/kaldi/data/train/split18/1/utt2spk scp:/kaldi/data/train/split18/1/cmvn.scp s$
ERROR (acc-tree-stats[5.3.24~1-c948]:~SequentialTableReaderArchiveImpl():util/kaldi-table-inl.h:678) TableReader: error detected closing archive 'apply-cmvn  --utt2spk=ark:/kaldi/data/trai$

[ Stack-Trace: ]
acc-tree-stats() [0x836b46]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::SequentialTableReaderArchiveImpl<kaldi::KaldiObjectHolder<kaldi::Matrix<float> > >::~SequentialTableReaderArchiveImpl()
kaldi::SequentialTableReaderArchiveImpl<kaldi::KaldiObjectHolder<kaldi::Matrix<float> > >::~SequentialTableReaderArchiveImpl()
kaldi::SequentialTableReader<kaldi::KaldiObjectHolder<kaldi::Matrix<float> > >::~SequentialTableReader()
main

the log in context

/kaldi-trunk/kaldi/egs/wsj/s5/steps/align_si.sh --nj 18 /kaldi/data/train /kaldi/lang /kaldi/exp/System1/tri2a /kaldi/exp/System1/tri2a_ali
/kaldi-trunk/kaldi/egs/wsj/s5/steps/align_si.sh: feature type is delta
/kaldi-trunk/kaldi/egs/wsj/s5/steps/align_si.sh: aligning data in /kaldi/data/train using model from /kaldi/exp/System1/tri2a, putting alignments in /kaldi/exp/System1/tri2a_ali
/root/kaldi-trunk/kaldi/egs/wsj/s5/steps/diagnostic/analyze_alignments.sh --cmd run.pl /kaldi/lang /kaldi/exp/System1/tri2a_ali
/root/kaldi-trunk/kaldi/egs/wsj/s5/steps/diagnostic/analyze_alignments.sh: see stats in /kaldi/exp/System1/tri2a_ali/log/analyze_alignments.log
/kaldi-trunk/kaldi/egs/wsj/s5/steps/align_si.sh: done aligning data.
/kaldi-trunk/kaldi/egs/wsj/s5/steps/train_lda_mllt.sh --use-lda-mat /home/yonasd/1111/kaldi-trunk/kaldi/egs/aishell/s5/exp/tri3a/final.mat --mllt-iters  2500 20000 /kaldi/data/train /kaldi/lang /kaldi/exp/System1/tri2a_ali /kaldi/exp/System1/tri2b
/kaldi-trunk/kaldi/egs/wsj/s5/steps/train_lda_mllt.sh: Using supplied LDA matrix /home/yonasd/1111/kaldi-trunk/kaldi/egs/aishell/s5/exp/tri3a/final.mat
/kaldi-trunk/kaldi/egs/wsj/s5/steps/train_lda_mllt.sh: Accumulating tree stats
bash: line 1:  3951 Aborted                 ( acc-tree-stats --ci-phones=1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50 /kaldi/exp/System1/tri2a_ali/final.mdl "ark,s,cs:apply-cmvn  --utt2spk=ark:/kaldi/data/train/split18/1/utt2spk scp:/kaldi/data/train/split18/1/cmvn.scp scp:/kaldi/data/train/split18/1/feats.scp ark:- | splice-feats  ark:- ark:- | transform-feats /kaldi/exp/System1/tri2b/0.mat ark:- ark:- |" "ark:gunzip -c /kaldi/exp/System1/tri2a_ali/ali.1.gz|" /kaldi/exp/System1/tri2b/1.treeacc ) 2>> /kaldi/exp/System1/tri2b/log/acc_tree.1.log >> /kaldi/exp/System1/tri2b/log/acc_tree.1.log
bash: line 1:  3950 Aborted                 ( acc-tree-stats --ci-phones=1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50 /kaldi/exp/System1/tri2a_ali/final.mdl "ark,s,cs:apply-cmvn  --utt2spk=ark:/kaldi/data/train/split18/2/utt2spk scp:/kaldi/data/train/split18/2/cmvn.scp scp:/kaldi/data/train/split18/2/feats.scp ark:- | splice-feats  ark:- ark:- | transform-feats /kaldi/exp/System1/tri2b/0.mat ark:- ark:- |" "ark:gunzip -c /kaldi/exp/System1/tri2a_ali/ali.2.gz|" /kaldi/exp/System1/tri2b/2.treeacc ) 2>> /kaldi/exp/System1/tri2b/log/acc_tree.2.log >> /kaldi/exp/System1/tri2b/log/acc_tree.2.log


finally
run.pl: 18 / 18 failed, log is in /kaldi/exp/System1/tri2b/log/acc_tree.*.log

It worked well from other recipes like ‘mini-librispeech’,  I’m trying to understand why it’s not working for ‘aishell’.

Daniel Povey

unread,
Feb 28, 2019, 1:10:12 PM2/28/19
to kaldi-help
Looks like it's about the splice opts, I think the LDA matrix uses wider splicing than the default.  You might need something like --splice-opts "--left-context=5 --right-context=5"
... which is unusually wide.


1 2

unread,
Feb 28, 2019, 1:31:53 PM2/28/19
to kaldi-help
thanks! Fixed!

1 2

unread,
Feb 28, 2019, 7:29:21 PM2/28/19
to kaldi-help

Running  ‘local/chain/tuning/run_tdnn_wsj_rm_1a.sh’ in ‘rm/s5’ reached at this stage and failed with this error “ERROR (nnet3-chain-compute-prob[5.3.24~1-c948]:AcceptInput():nnet-compute.cc:493) Num-cols mismatch for input 'input': 43 in computation-request, 40 provided”. I tried to search the file containing “input-node name=input dim=40” to replace it with 43, but couldn’t find it.

[steps/nnet3/chain/train.py:404 - train - INFO ] Copying the properties from exp/chain/tdnn_wsj_rm_1a/egs to exp/chain/tdnn_wsj_rm_1a

[steps/nnet3/chain/train.py:417 - train - INFO ] Preparing the initial acoustic model.
[steps/nnet3/chain/train.py:451 - train - INFO ] Training will run for 2.0 epochs = 48 iterations
[steps/nnet3/chain/train.py:493 - train - INFO ] Iter: 0/47    Epoch: 0.00/2.0 (0.0% complete)    lr: 0.005000   
run.pl: job failed, log is in exp/chain/tdnn_wsj_rm_1a/log/compute_prob_train.0.log


looking into exp/chain/tdnn_wsj_rm_1a/log/compute_prob_train.0.log:

# nnet3-chain-compute-prob --l2-regularize=5e-05 --leaky-hmm-coefficient=0.1 --xent-regularize=0.1 "nnet3-am-copy --raw=true exp/chain/tdnn_wsj_rm_1a/0.mdl - |" exp/chain/tdnn_wsj_rm_1a/den.fst "ark,bg:n$
# Started at Fri Mar  1 00:19:33 CET 2019
#
nnet3-chain-compute-prob --l2-regularize=5e-05 --leaky-hmm-coefficient=0.1 --xent-regularize=0.1 'nnet3-am-copy --raw=true exp/chain/tdnn_wsj_rm_1a/0.mdl - |' exp/chain/tdnn_wsj_rm_1a/den.fst 'ark,bg:nne$
nnet3-am-copy --raw=true exp/chain/tdnn_wsj_rm_1a/0.mdl -
WARNING (nnet3-am-copy[5.3.24~1-c948]:Check():nnet-nnet.cc:789) Node prefinal-chain.batchnorm is never used to compute any output.
WARNING (nnet3-am-copy[5.3.24~1-c948]:Check():nnet-nnet.cc:789) Node prefinal-xent.batchnorm is never used to compute any output.
LOG (nnet3-am-copy[5.3.24~1-c948]:main():nnet3-am-copy.cc:140) Copied neural net from exp/chain/tdnn_wsj_rm_1a/0.mdl to raw format as -
WARNING (nnet3-chain-compute-prob[5.3.24~1-c948]:Check():nnet-nnet.cc:789) Node prefinal-chain.batchnorm is never used to compute any output.
WARNING (nnet3-chain-compute-prob[5.3.24~1-c948]:Check():nnet-nnet.cc:789) Node prefinal-xent.batchnorm is never used to compute any output.
nnet3-chain-copy-egs ark:exp/chain/tdnn_wsj_rm_1a/egs/train_diagnostic.cegs ark:-
nnet3-chain-merge-egs --minibatch-size=1:64 ark:- ark:-
ERROR (nnet3-chain-compute-prob[5.3.24~1-c948]:AcceptInput():nnet-compute.cc:493) Num-cols mismatch for input 'input': 43 in computation-request, 40 provided.

[ Stack-Trace: ]
nnet3-chain-compute-prob() [0x109f4d6]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::nnet3::NnetComputer::AcceptInput(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, kaldi::CuMatrix<float>*)
kaldi::nnet3::NnetComputer::AcceptInputs(kaldi::nnet3::Nnet const&, std::vector<kaldi::nnet3::NnetIo, std::allocator<kaldi::nnet3::NnetIo> > const&)
kaldi::nnet3::NnetChainComputeProb::Compute(kaldi::nnet3::NnetChainExample const&)


On Thursday, 28 February 2019 19:10:12 UTC+1, Dan Povey wrote:

Daniel Povey

unread,
Feb 28, 2019, 7:35:19 PM2/28/19
to kaldi-help
It likely has something to do with the presence or absence of pitch features.
You have to make sure the features exactly match if you want to do transfer learning.

Dan

1 2

unread,
Mar 1, 2019, 3:13:08 PM3/1/19
to kaldi-help
Still struggling with the error 'ERROR (nnet3-chain-compute-prob[5.3.24~1-c948]:AcceptInput():nnet-compute.cc:493) Num-cols mismatch for input 'input': 43 in computation-request, 40 provided.'
 
I checked scripts that control ’pitch’ settings, also passed ‘input dim=43 name=input’ to ‘configs/network.xconfig’. Googling doesn’t seem to help much. Could you give me some hint?
  
input dim=43 name=input
  #relu-renorm-layer name=tdnn-target input=Append(tdnn6.renorm@-3,tdnn6.renorm) dim=450
  ## adding the layers for chain branch
  relu-renorm-layer name=tdnn-target input=Append(tdnn6.batchnorm@-3,tdnn6.batchnorm) dim=450
relu-renorm-layer name=prefinal-chain input=tdnn-target dim=625 target-rms=0.5
  output-layer name=output include-log-softmax=false dim=971 max-change=1.5
  relu-renorm-layer name=prefinal-xent input=tdnn-target dim=625 target-rms=0.5
  output-layer name=output-xent dim=971 learning-rate-factor=5.0 max-change=1.5
...

Desh Raj

unread,
Mar 1, 2019, 3:22:24 PM3/1/19
to kaldi-help
Hi, if you are transfering from a config which takes 43-dim inputs, your new inputs must also be 43-dim, unless you are changing the network config somehow. Look at the configs/final.config file of the source directory from where you are transfering to see what input it expects, and check your feat-dim in your data directory to see if there's a mismatch.

Desh

1 2

unread,
Mar 1, 2019, 4:34:53 PM3/1/19
to kaldi-help
Hi Desh!

thanks for your help! I'm transferring from 'aishell', while I got the final.config file in the two directories and the input dim is 43:
/kaldi-trunk/kaldi/egs/aishell/s5/exp/chain/tdnn_1a_sp/configs/final.config
/kaldi-trunk/kaldi/egs/aishell/s5/exp/nnet3/tdnn_sp/configs/final.config


input-node name=ivector dim=100
input-node name=input dim=43

my learning transfer script 'run_tdnn_wsj_rm_1a.sh' in 'rm/s5/cal/chain/tuning/'. But I couldn't find a file containing 'feat-dim'  in rm/data

Desh Raj

unread,
Mar 1, 2019, 6:47:28 PM3/1/19
to kaldi...@googlegroups.com
After running the get_egs.sh step, if you look at exp/chain/tdnn_1a_sp/egs/info/feat_dim, it contains the dimensionality of features in your input.

Desh

You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/5G3fD7orcLI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

1 2

unread,
Mar 1, 2019, 7:19:41 PM3/1/19
to kaldi-help
it's 40, and I reset it as 43, then local/chain/tuning/run_tdnn_wsj_rm_1a.sh--stage 8 --train-stage 0

but complaining as " Exception: There is mismatch between featdim/ivector_dim of the current experiment and the provided egs directory

also, reset ivector_dim from 100 to 43, but the same exception message
...

Desh Raj

unread,
Mar 1, 2019, 7:31:52 PM3/1/19
to kaldi...@googlegroups.com
You can't just "reset" the values there. The get_egs.sh script figures out those values from the inputs you have provides. If your network is supposed to take 43-dim inputs (which you say is what's in the final.config file), then you need to provide it 43-dim inputs. This is because the model you are transferring is trained with that dimensionality.

Desh

You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/5G3fD7orcLI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

1 2

unread,
Mar 1, 2019, 7:46:28 PM3/1/19
to kaldi-help
Ok, thanks! As I had no such problems when I transfer the 'mini-librispeech' model, I'm a bit challenged to take the right action. I think, now there are two possibilities, either retrain the source model i.e 'aishell' with 40 feat-dim or to provide my target language data with feat-dim 43. The former seems to be more doable, could you give me some hint how I retrain the 'aishell' model with feat-dim 40?

would appreciate your help?

1 2

unread,
Mar 1, 2019, 8:52:38 PM3/1/19
to kaldi-help
now I figured out that I could retrain the source model with local/chain/run_tdnn.sh by setting 'input dim=40 (from 43) name=input', where as leaving 'input dim=100 name=ivector' as it is.


On Saturday, 2 March 2019 01:31:52 UTC+1, Desh Raj wrote:

Desh Raj

unread,
Mar 1, 2019, 10:29:09 PM3/1/19
to kaldi...@googlegroups.com
You are confusing the actual input with the network configuration. When you set 'input dim=40' in the xconfig, this just means that your network expects inputs of dimension 40. You still have to prepare the actual 40-dim MFCC vectors. Look at the lines here where the MFCC preparation happens. I think the mismatch is because aishell has pitch features which makes it 43-dim, whereas your new data just has 40-dim MFCC features. If you want to use the aishell transferred model, you will probably need to obtain similar MFCC+pitch features for your new data as has been done in the link above.

Desh

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/5G3fD7orcLI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

minnth...@gmail.com

unread,
Mar 14, 2019, 4:22:09 AM3/14/19
to kaldi-help
Do I need to reduce the number of components?
Because I have got the EER result with 200 components but I have not got the answer with 1024.

On Wednesday, March 28, 2018 at 8:32:16 AM UTC+6:30, Dan Povey wrote:
the real error was probably a 'std::bad_alloc' elsewhere in the log.
Try reducing the num-jobs all the way to 1. 

Dan


On Tue, Mar 27, 2018 at 9:48 PM, 永裕高 <gyurob...@gmail.com> wrote:
Sorry Dan,
while runing train_ivector_extractor.sh
Inside the log:

LOG (apply-cmvn-sliding[5.4.64~1-73527]:main():apply-cmvn-sliding.cc:75) Applied sliding-window cepstral mean normalization to 3247 utterances, 0 had errors.
LOG (select-voiced-frames[5.4.64~1-73527]:main():select-voiced-frames.cc:105) Done selecting voiced frames; processed 3247 utterances, 0 had errors.
LOG (ivector-extractor-acc-stats[5.4.64~1-73527]:main():ivector-extractor-acc-stats.cc:151) Done 3247 files, 0 with errors.  Total frames 1232807
LOG (ivector-extractor-acc-stats[5.4.64~1-73527]:main():ivector-extractor-acc-stats.cc:159) Wrote stats to -
ERROR (ivector-extractor-sum-accs[5.4.64~1-73527]:ExpectToken():io-funcs.cc:203) Failed to read token [started at file position -1], expected <IvectorExtractorStats>

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::ExpectToken(std::istream&, bool, char const*)
kaldi::IvectorExtractorStats::Read(std::istream&, bool, bool)
main
__libc_start_main
ivector-extractor-sum-accs() [0x40b6a9]


I have tried reducing the num of jobs and used a large memory machine, still output that

Yongyu


On Wednesday, March 28, 2018 at 1:42:43 AM UTC+8, Dan Povey wrote:
That's too little of the output for me to diagnose what went wrong.
You need to learn to paste as text.

Dan

On Tue, Mar 27, 2018 at 9:08 AM, 永裕高 <gyurob...@gmail.com> wrote:

Hi,
I got the error as figure shown, can somebody help me to fix it.

I was doing egs/aishell/v1/run.sh using aishell official dataset,it always came up with this problem

I have used the large memory machine and reduced num of jobs, but still can not finish ivector training.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages