semi-supervised training error

396 wyświetleń
Przejdź do pierwszej nieodczytanej wiadomości

sameer khurana

nieprzeczytany,
6 kwi 2018, 12:46:136.04.2018
do kaldi-help
hi,

I am using the semi-supervised training setup from fisher_english.

I get the following error when at stage 10 of the script : https://github.com/kaldi-asr/kaldi/blob/master/egs/fisher_english/s5/local/semisup/run_50k.sh

The stage 10 of the script above, calls the script: https://github.com/kaldi-asr/kaldi/blob/master/egs/fisher_english/s5/local/semisup/chain/tuning/run_tdnn_50k_semisupervised_1a.sh and the error is at stage 15 of the latter script. Error as below:

2018-04-06 12:32:38,243 [steps/nnet3/chain/train.py:273 - train - INFO ] Arguments for the experiment
{'alignment_subsampling_factor': 3,
 
'apply_deriv_weights': True,
 
'backstitch_training_interval': 1,
 
'backstitch_training_scale': 0.0,
 
'chunk_left_context': 0,
 
'chunk_left_context_initial': -1,
 
'chunk_right_context': 0,
 
'chunk_right_context_final': -1,
 
'chunk_width': '160,140,110,80',
 
'cleanup': True,
 
'cmvn_opts': '--norm-means=false --norm-vars=false',
 
'combine_sum_to_one_penalty': 0.0,
 
'command': 'slurm.pl --mem 4G --config conf/slurm.conf',
 
'compute_per_dim_accuracy': False,
 
'deriv_truncate_margin': None,
 
'dir': 'exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a',
 
'do_final_combination': True,
 
'dropout_schedule': None,
 
'egs_command': None,
 
'egs_dir': 'exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/comb_egs',
 
'egs_opts': None,
 
'egs_stage': 0,
 
'email': None,
 
'exit_stage': None,
 
'feat_dir': 'data/train_sup50k_sp_hires',
 
'final_effective_lrate': 0.0001,
 
'frame_subsampling_factor': 3,
 
'frames_per_iter': 1500000,
 
'initial_effective_lrate': 0.001,
 
'input_model': None,
 
'l2_regularize': 5e-05,
 
'lat_dir': 'exp/semisup_50k/chain_semi50k_250k/tri4a_train_sup50k_sp_unk_lats',
 
'leaky_hmm_coefficient': 0.1,
 
'left_deriv_truncate': None,
 
'left_tolerance': 5,
 
'lm_opts': '--num-extra-lm-states=2000',
 
'max_lda_jobs': 10,
 
'max_models_combine': 20,
 
'max_objective_evaluations': 30,
 
'max_param_change': 2.0,
 
'momentum': 0.0,
 
'num_chunk_per_minibatch': '128',
 
'num_epochs': 4.0,
 
'num_jobs_final': 16,
 
'num_jobs_initial': 3,
 
'online_ivector_dir': 'exp/semisup_50k/nnet3_semi50k_250k/ivectors_train_sup50k_sp_hires',
 
'preserve_model_interval': 100,
 
'presoftmax_prior_scale_power': -0.25,
 
'proportional_shrink': 0.0,
 
'rand_prune': 4.0,
 
'remove_egs': False,
 
'reporting_interval': 0.1,
 
'right_tolerance': 5,
 
'samples_per_iter': 400000,
 
'shrink_saturation_threshold': 0.4,
 
'shrink_value': 1.0,
 
'shuffle_buffer_size': 5000,
 
'srand': 0,
 
'stage': -4,
 
'train_opts': [],
 
'transform_dir': 'exp/semisup_50k/chain_semi50k_250k/tri4a_train_sup50k_sp_unk_lats',
 
'tree_dir': 'exp/semisup_50k/chain_semi50k_250k/tree_bi_a',
 
'use_gpu': 'yes',
 
'xent_regularize': 0.1}
2018-04-06 12:32:38,370 [steps/nnet3/chain/train.py:339 - train - INFO ] Initializing a basic network for estimating preconditioning matrix

2018-04-06 12:32:46,456 [steps/nnet3/chain/train.py:411 - train - INFO ] Copying the properties from exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/comb_egs to exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a
2018-04-06 12:32:46,459 [steps/nnet3/chain/train.py:425 - train - INFO ] Computing the preconditioning matrix for input features
/data/sls/qcri/asr/sameer_v1/asr/kaldi-forked/kaldi/egs/mit_qcri/s5/utils//slurm.pl: 10 / 10 failed, log is in exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/log/get_lda_stats.*.log
Traceback (most recent call last):
 
File "steps/nnet3/chain/train.py", line 625, in main
    train
(args, run_opts)
 
File "steps/nnet3/chain/train.py", line 431, in train
    use_multitask_egs
=use_multitask_egs)
 
File "steps/libs/nnet3/train/chain_objf/acoustic_model.py", line 417, in compute_preconditioning_matrix
    rand_prune
=rand_prune))
 
File "steps/libs/common.py", line 152, in execute_command
    p
.returncode, command))
Exception: Command exited with status 1: slurm.pl --mem 4G --config conf/slurm.conf JOB=1:10 exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/log/get_lda_stats.JOB.log                 nnet3-chain-acc-lda-stats --rand-prune=4.0                 exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/init.raw "ark:nnet3-chain-copy-egs   scp:exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/comb_egs/cegs.JOB.scp ark:- |"                 exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/JOB.lda_stats


Looking at the suggested log:

Enter code hereCUDA_VISIBLE_DEVICES set to NoDevFiles, unsetting it...
nnet3
-chain-acc-lda-stats --rand-prune=4.0 exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/init.raw 'ark:nnet3-chain-copy-egs   scp:exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/comb_egs/cegs.1.scp ark:- |' exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/1.lda_stats
nnet3
-chain-copy-egs scp:exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/comb_egs/cegs.1.scp ark:-
WARNING
(nnet3-chain-copy-egs[5.4.76~1-97e61]:EnsureObjectLoaded():util/kaldi-table-inl.h:310) Failed to open file exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/egs_train_unsup250k_sp/cegs.1.1.ark:75
ERROR
(nnet3-chain-copy-egs[5.4.76~1-97e61]:Value():util/kaldi-table-inl.h:164) Failed to load object from exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/egs_train_unsup250k_sp/cegs.1.1.ark:75 (to suppress this error, add the permissive (p, ) option to the rspecifier.

[ Stack-Trace: ]

kaldi
::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi
::MessageLogger::~MessageLogger()
kaldi
::SequentialTableReaderScriptImpl<kaldi::KaldiObjectHolder<kaldi::nnet3::NnetChainExample> >::Value()
kaldi
::SequentialTableReader<kaldi::KaldiObjectHolder<kaldi::nnet3::NnetChainExample> >::Value()
main
__libc_start_main
_start


LOG
(nnet3-chain-acc-lda-stats[5.4.76~1-97e61]:main():nnet3-chain-acc-lda-stats.cc:195) Processed 128 examples.
LOG
(nnet3-chain-acc-lda-stats[5.4.76~1-97e61]:WriteStats():nnet3-chain-acc-lda-stats.cc:67) Accumulated stats, soft frame count = 6444.  Wrote to exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/1.lda_stats
WARNING
(nnet3-chain-acc-lda-stats[5.4.76~1-97e61]:Close():kaldi-io.cc:512) Pipe nnet3-chain-copy-egs   scp:exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/comb_egs/cegs.1.scp ark:- | had nonzero return status 65280
ERROR
(nnet3-chain-acc-lda-stats[5.4.76~1-97e61]:~SequentialTableReaderArchiveImpl():util/kaldi-table-inl.h:678) TableReader: error detected closing archive 'nnet3-chain-copy-egs   scp:exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/comb_egs/cegs.1.scp ark:- |'

[ Stack-Trace: ]

kaldi
::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi
::MessageLogger::~MessageLogger()
kaldi
::SequentialTableReaderArchiveImpl<kaldi::KaldiObjectHolder<kaldi::nnet3::NnetChainExample> >::~SequentialTableReaderArchiveImpl()
kaldi
::SequentialTableReaderArchiveImpl<kaldi::KaldiObjectHolder<kaldi::nnet3::NnetChainExample> >::~SequentialTableReaderArchiveImpl()
kaldi
::SequentialTableReader<kaldi::KaldiObjectHolder<kaldi::nnet3::NnetChainExample> >::~SequentialTableReader()
main
__libc_start_main
_start

terminate called after throwing an instance of
'std::runtime_error'
...

Following the error `Failed to load object from exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/egs_train_unsup250k_sp/cegs.1.1.ark:75` from above, I peek into the folder `exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/egs_train_unsup250k_sp`.

There is no `cegs.1.1.ark` but there is `cegs_original.1.1.ark`.

What could be the problem?. Looking at the stages before the error stage, seems to finish successfully.

Vimal Manohar

nieprzeczytany,
6 kwi 2018, 14:33:516.04.2018
do kaldi...@googlegroups.com
I am guessing some stage before that failed. In particular, check if everything worked in stage 13 of creating the egs. Because cegs_original.*.ark should get deleted once that script runs successfully. Check the log files in that folder to see if there are any errors. If shuffle.*.log has no errors, then cegs.1.1.ark etc. should have been created.

Vimal

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/5ff836cc-a389-425f-9eb3-0a8f0a247ea4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Vimal Manohar
PhD Student
Electrical & Computer Engineering
Johns Hopkins University

sameer khurana

nieprzeczytany,
6 kwi 2018, 15:49:136.04.2018
do kaldi-help
thanks for the reply.

In the `exp/semisup_50k/chain_semi50k_250k/tdnn_semisup_1a/egs_train_unsup250k_sp/log`, i can see three types of files; get_egs*, lattice_copy*, shuffle*

There are no errors in lattice_copy* and get_egs*

In shuffle, at the bottom of the log file, I have something like:

WARNING (nnet3-chain-normalize-egs[5.4.76~1-97e61]:PeekToken():io-funcs.cc:182) Error ungetting '<' in PeekToken
LOG
(nnet3-chain-normalize-egs[5.4.76~1-97e61]:main():nnet3-chain-normalize-egs.cc:94) Added normalization to 20325 egs; had errors on 0
LOG
(nnet3-chain-shuffle-egs[5.4.76~1-97e61]:main():nnet3-chain-shuffle-egs.cc:104) Shuffled order of 20325 neural-network training examples
LOG
(nnet3-chain-copy-egs[5.4.76~1-97e61]:main():nnet3-chain-copy-egs.cc:395) Read 20325 neural-network training examples, wrote 20325
# Accounting: begin_time=1523017897
# Accounting: end_time=1523018573
# Accounting: time=676 threads=1
# Finished at Fri Apr 6 08:42:53 EDT 2018 with status 0

Does that mean anything?

Daniel Povey

nieprzeczytany,
6 kwi 2018, 16:00:446.04.2018
do kaldi-help
I'm assuming instead of cegs_original you mean cegs_orig.
This could have happened because you already finished training and it automatically removed the egs, and then you tried to rerun the partially-run setup using the --stage option.



To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Vimal Manohar

nieprzeczytany,
6 kwi 2018, 16:14:286.04.2018
do kaldi...@googlegroups.com
I created a pull request to fix the bug that deletes some files https://github.com/kaldi-asr/kaldi/pull/2339. You can try that.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

sameer khurana

nieprzeczytany,
6 kwi 2018, 16:21:106.04.2018
do kaldi-help
Hi Dan,

i am using the option: --cleanup.remove-egs false

and this was the first time training.

sameer khurana

nieprzeczytany,
6 kwi 2018, 16:22:056.04.2018
do kaldi-help
okay, thanks
Odpowiedz wszystkim
Odpowiedz autorowi
Przekaż
Nowe wiadomości: 0