hi,
I am using the semi-supervised training setup from fisher_english.
I get the following error when at stage 10 of the script :
https://github.com/kaldi-asr/kaldi/blob/master/egs/fisher_english/s5/local/semisup/run_50k.shThe stage 10 of the script above, calls the script:
https://github.com/kaldi-asr/kaldi/blob/master/egs/fisher_english/s5/local/semisup/chain/tuning/run_tdnn_50k_semisupervised_1a.sh and the error is at stage 15 of the latter script. Error as below:
2018-04-06 12:32:38,243 [steps/nnet3/chain/train.py:273 - train - INFO ] Arguments for the experiment
{'alignment_subsampling_factor': 3,
'apply_deriv_weights': True,
'backstitch_training_interval': 1,
'backstitch_training_scale': 0.0,
'chunk_left_context': 0,
'chunk_left_context_initial': -1,
'chunk_right_context': 0,
'chunk_right_context_final': -1,
'chunk_width': '160,140,110,80',
'cleanup': True,
'cmvn_opts': '--norm-means=false --norm-vars=false',
'combine_sum_to_one_penalty': 0.0,
'command': 'slurm.pl --mem 4G --config conf/slurm.conf',
'compute_per_dim_accuracy': False,
'deriv_truncate_margin': None,
'dir': 'exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a',
'do_final_combination': True,
'dropout_schedule': None,
'egs_command': None,
'egs_dir': 'exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/comb_egs',
'egs_opts': None,
'egs_stage': 0,
'email': None,
'exit_stage': None,
'feat_dir': 'data/train_sup50k_sp_hires',
'final_effective_lrate': 0.0001,
'frame_subsampling_factor': 3,
'frames_per_iter': 1500000,
'initial_effective_lrate': 0.001,
'input_model': None,
'l2_regularize': 5e-05,
'lat_dir': 'exp/semisup_50k/chain_semi50k_250k/tri4a_train_sup50k_sp_unk_lats',
'leaky_hmm_coefficient': 0.1,
'left_deriv_truncate': None,
'left_tolerance': 5,
'lm_opts': '--num-extra-lm-states=2000',
'max_lda_jobs': 10,
'max_models_combine': 20,
'max_objective_evaluations': 30,
'max_param_change': 2.0,
'momentum': 0.0,
'num_chunk_per_minibatch': '128',
'num_epochs': 4.0,
'num_jobs_final': 16,
'num_jobs_initial': 3,
'online_ivector_dir': 'exp/semisup_50k/nnet3_semi50k_250k/ivectors_train_sup50k_sp_hires',
'preserve_model_interval': 100,
'presoftmax_prior_scale_power': -0.25,
'proportional_shrink': 0.0,
'rand_prune': 4.0,
'remove_egs': False,
'reporting_interval': 0.1,
'right_tolerance': 5,
'samples_per_iter': 400000,
'shrink_saturation_threshold': 0.4,
'shrink_value': 1.0,
'shuffle_buffer_size': 5000,
'srand': 0,
'stage': -4,
'train_opts': [],
'transform_dir': 'exp/semisup_50k/chain_semi50k_250k/tri4a_train_sup50k_sp_unk_lats',
'tree_dir': 'exp/semisup_50k/chain_semi50k_250k/tree_bi_a',
'use_gpu': 'yes',
'xent_regularize': 0.1}
2018-04-06 12:32:38,370 [steps/nnet3/chain/train.py:339 - train - INFO ] Initializing a basic network for estimating preconditioning matrix
2018-04-06 12:32:46,456 [steps/nnet3/chain/train.py:411 - train - INFO ] Copying the properties from exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/comb_egs to exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a
2018-04-06 12:32:46,459 [steps/nnet3/chain/train.py:425 - train - INFO ] Computing the preconditioning matrix for input features
/data/sls/qcri/asr/sameer_v1/asr/kaldi-forked/kaldi/egs/mit_qcri/s5/utils//slurm.pl: 10 / 10 failed, log is in exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/log/get_lda_stats.*.log
Traceback (most recent call last):
File "steps/nnet3/chain/train.py", line 625, in main
train(args, run_opts)
File "steps/nnet3/chain/train.py", line 431, in train
use_multitask_egs=use_multitask_egs)
File "steps/libs/nnet3/train/chain_objf/acoustic_model.py", line 417, in compute_preconditioning_matrix
rand_prune=rand_prune))
File "steps/libs/common.py", line 152, in execute_command
p.returncode, command))
Exception: Command exited with status 1: slurm.pl --mem 4G --config conf/slurm.conf JOB=1:10 exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/log/get_lda_stats.JOB.log nnet3-chain-acc-lda-stats --rand-prune=4.0 exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/init.raw "ark:nnet3-chain-copy-egs scp:exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/comb_egs/cegs.JOB.scp ark:- |" exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/JOB.lda_stats
Looking at the suggested log:
Enter code hereCUDA_VISIBLE_DEVICES set to NoDevFiles, unsetting it...
nnet3-chain-acc-lda-stats --rand-prune=4.0 exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/init.raw 'ark:nnet3-chain-copy-egs scp:exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/comb_egs/cegs.1.scp ark:- |' exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/1.lda_stats
nnet3-chain-copy-egs scp:exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/comb_egs/cegs.1.scp ark:-
WARNING (nnet3-chain-copy-egs[5.4.76~1-97e61]:EnsureObjectLoaded():util/kaldi-table-inl.h:310) Failed to open file exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/egs_train_unsup250k_sp/cegs.1.1.ark:75
ERROR (nnet3-chain-copy-egs[5.4.76~1-97e61]:Value():util/kaldi-table-inl.h:164) Failed to load object from exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/egs_train_unsup250k_sp/cegs.1.1.ark:75 (to suppress this error, add the permissive (p, ) option to the rspecifier.
[ Stack-Trace: ]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::SequentialTableReaderScriptImpl<kaldi::KaldiObjectHolder<kaldi::nnet3::NnetChainExample> >::Value()
kaldi::SequentialTableReader<kaldi::KaldiObjectHolder<kaldi::nnet3::NnetChainExample> >::Value()
main
__libc_start_main
_start
LOG (nnet3-chain-acc-lda-stats[5.4.76~1-97e61]:main():nnet3-chain-acc-lda-stats.cc:195) Processed 128 examples.
LOG (nnet3-chain-acc-lda-stats[5.4.76~1-97e61]:WriteStats():nnet3-chain-acc-lda-stats.cc:67) Accumulated stats, soft frame count = 6444. Wrote to exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/1.lda_stats
WARNING (nnet3-chain-acc-lda-stats[5.4.76~1-97e61]:Close():kaldi-io.cc:512) Pipe nnet3-chain-copy-egs scp:exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/comb_egs/cegs.1.scp ark:- | had nonzero return status 65280
ERROR (nnet3-chain-acc-lda-stats[5.4.76~1-97e61]:~SequentialTableReaderArchiveImpl():util/kaldi-table-inl.h:678) TableReader: error detected closing archive 'nnet3-chain-copy-egs scp:exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/comb_egs/cegs.1.scp ark:- |'
[ Stack-Trace: ]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::SequentialTableReaderArchiveImpl<kaldi::KaldiObjectHolder<kaldi::nnet3::NnetChainExample> >::~SequentialTableReaderArchiveImpl()
kaldi::SequentialTableReaderArchiveImpl<kaldi::KaldiObjectHolder<kaldi::nnet3::NnetChainExample> >::~SequentialTableReaderArchiveImpl()
kaldi::SequentialTableReader<kaldi::KaldiObjectHolder<kaldi::nnet3::NnetChainExample> >::~SequentialTableReader()
main
__libc_start_main
_start
terminate called after throwing an instance of 'std::runtime_error'
...
Following the error `
Failed to load object from exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/egs_train_unsup250k_sp/cegs.1.1.ark:75
` from above, I peek into the folder `
exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/egs_train_unsup250k_sp
`.
There is no `
cegs.1.1.ark
` but there is `
cegs_original.1.1.ark
`.
What could be the problem?. Looking at the stages before the error stage, seems to finish successfully.