Broken Spanish run_tdnn_1g.sh?

311 views
Skip to first unread message

Jaskaran Singh Puri

unread,
Jul 20, 2019, 12:05:52 PM7/20/19
to kaldi-help
While running this script: fisher_callhome_spanish/s5/local/chain/run_tdnn_1g.sh

I get an error, "egs has missing or malformed files"
It is unable to locate the "egs/info/feat_dim"

There's no "egs" directory getting created by the script nor are any of the files in the "info" dir.
I compared the structure to aspire's egs, there are lot of files that are not getting creating in the info dir

Also, I don't see any "mkdir -p $dir/egs" in this spanish script
Can we use the aspire's run_tdnn_7b.sh for spanish as this seems to be a bug?

Jan Trmal

unread,
Jul 20, 2019, 12:29:39 PM7/20/19
to kaldi-help
are you sure you didn't modify the script? run git diff <filename>
I'd start looking at train_stage being set to something else than the default -10 or the common_egs_dir defined.
I'm not sure if it's necessary to create the egs dir beforehand, if you do not care about the IO balancing
y.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/896d5817-f2a0-473f-a662-3895ceaeeb26%40googlegroups.com.

Jaskaran Singh Puri

unread,
Jul 20, 2019, 12:33:25 PM7/20/19
to kaldi-help
No changes have been made in the script, especially the initial params, infact I pulled the entire kaldi this month only.
Also, I'm training on the original LDC corpus, so the entire process should have run as it is.


On Saturday, July 20, 2019 at 9:59:39 PM UTC+5:30, Yenda wrote:
are you sure you didn't modify the script? run git diff <filename>
I'd start looking at train_stage being set to something else than the default -10 or the common_egs_dir defined.
I'm not sure if it's necessary to create the egs dir beforehand, if you do not care about the IO balancing
y.

On Sat, Jul 20, 2019 at 12:05 PM Jaskaran Singh Puri <jaskar...@gmail.com> wrote:
While running this script: fisher_callhome_spanish/s5/local/chain/run_tdnn_1g.sh

I get an error, "egs has missing or malformed files"
It is unable to locate the "egs/info/feat_dim"

There's no "egs" directory getting created by the script nor are any of the files in the "info" dir.
I compared the structure to aspire's egs, there are lot of files that are not getting creating in the info dir

Also, I don't see any "mkdir -p $dir/egs" in this spanish script
Can we use the aspire's run_tdnn_7b.sh for spanish as this seems to be a bug?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Jan Trmal

unread,
Jul 20, 2019, 12:35:27 PM7/20/19
to kaldi-help
In that case I think there will be error before this error somewhere -- I don't see anything wrong with the script
y.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/c25c6c5a-58f7-4e9c-b741-6a8dc40483ac%40googlegroups.com.

Jan Trmal

unread,
Jul 20, 2019, 12:37:37 PM7/20/19
to kaldi-help
I'd re-run the script with --stage 19 and watched for errors/suspicious output w.r.t egs generation.
If you can, share the full output of this with us
y.

Jaskaran Singh Puri

unread,
Jul 21, 2019, 5:01:53 AM7/21/19
to kaldi-help
Latest error, at stage 19

The only chnage I've made is, change 'dir' variable to custom path, so all files are created where I have enough disk space, btw it didn't work at defaults as well

Capture.PNG

Jan Trmal

unread,
Jul 21, 2019, 9:59:22 AM7/21/19
to kaldi-help
iirc, if you set the common_egs_dir, it is assumed the egs do already exist, i.e., in this case, it's not a surprising error.
y.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/cdbfc435-48dd-491c-89e3-c05418ae7af2%40googlegroups.com.

Jaskaran Singh Puri

unread,
Jul 21, 2019, 10:31:50 AM7/21/19
to kaldi-help
So I should remove the common_egs_dir param passed to train.py?

Jan Trmal

unread,
Jul 21, 2019, 10:33:30 AM7/21/19
to kaldi-help
yes

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/b1fc6f4e-8bbc-4b58-8a4c-e80a6457006c%40googlegroups.com.

Jaskaran Singh Puri

unread,
Jul 21, 2019, 10:53:40 AM7/21/19
to kaldi-help
Still the same error
yes

Jan Trmal

unread,
Jul 21, 2019, 10:55:35 AM7/21/19
to kaldi-help
you have to send the output. Plus show how you call it (or show the parameter array) from the output.
y.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/0fd899a7-7021-4f30-80ce-a7cd71b99fd8%40googlegroups.com.

Jaskaran Singh Puri

unread,
Jul 21, 2019, 11:03:52 AM7/21/19
to kaldi-help
Running directly from stage 19. Completely remove the --egs-dir param to train.py in the tdnn script

local/chain/run_tdnn_1g.sh
local/chain/run_tdnn_1g.sh: creating neural net configs using the xconfig parser
tree
-info exp/chain/tri5a_tree/tree
nnet3
-init /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//init.config /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//init.raw
LOG
(nnet3-init[5.5]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//init.raw
nnet3
-info /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//init.raw
nnet3
-init /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.config /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.raw
LOG
(nnet3-init[5.5]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.raw
nnet3
-info /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.raw
nnet3
-init /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.config /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.raw
LOG
(nnet3-init[5.5]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.raw
nnet3
-info /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.raw
steps
/nnet3/xconfig_to_configs.py --xconfig-file /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs/network.xconfig --config-dir /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs/


steps
/nnet3/chain/train.py --stage=-10 --cmd run.pl --mem 4G --feat.online-ivector-dir exp/nnet3/ivectors_train_sp_hires --feat.cmvn-opts --norm-means=false --norm-vars=false --chain.xent-regularize 0.1 --chain.leaky-hmm-coefficient 0.1 --chain.l2-regularize 0.0 --chain.apply-deriv-weights false --chain.lm-opts=--num-extra-lm-states=2000 --trainer.dropout-schedule 0,0@0.20,0.3@0.50,0 --trainer.srand 0 --trainer.max-param-change 2.0 --trainer.num-epochs 4 --trainer.frames-per-iter 5000000 --trainer.optimization.num-jobs-initial 1 --trainer.optimization.num-jobs-final=1 --trainer.optimization.initial-effective-lrate 0.0005 --trainer.optimization.final-effective-lrate 0.00005 --trainer.num-chunk-per-minibatch 128,64 --trainer.optimization.momentum 0.0 --egs.chunk-width 140,100,160 --egs.chunk-left-context 0 --egs.chunk-right-context 0 --egs.opts --frames-overlap-per-eg 0 --cleanup.remove-egs true --use-gpu true --feat-dir data/train_sp_hires --tree-dir exp/chain/tri5a_tree --lat-dir exp/tri5a_lats_nodup_sp --dir /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn
['steps/nnet3/chain/train.py', '--stage=-10', '--cmd', 'run.pl --mem 4G', '--feat.online-ivector-dir', 'exp/nnet3/ivectors_train_sp_hires', '--feat.cmvn-opts', '--norm-means=false --norm-vars=false', '--chain.xent-regularize', '0.1', '--chain.leaky-hmm-coefficient', '0.1', '--chain.l2-regularize', '0.0', '--chain.apply-deriv-weights', 'false', '--chain.lm-opts=--num-extra-lm-states=2000', '--trainer.dropout-schedule', '0,0...@0.20,0...@0.50,0', '--trainer.srand', '0', '--trainer.max-param-change', '2.0', '--trainer.num-epochs', '4', '--trainer.frames-per-iter', '5000000', '--trainer.optimization.num-jobs-initial', '1', '--trainer.optimization.num-jobs-final=1', '--trainer.optimization.initial-effective-lrate', '0.0005', '--trainer.optimization.final-effective-lrate', '0.00005', '--trainer.num-chunk-per-minibatch', '128,64', '--trainer.optimization.momentum', '0.0', '--egs.chunk-width', '140,100,160', '--egs.chunk-left-context', '0', '--egs.chunk-right-context', '0', '--egs.opts', '--frames-overlap-per-eg 0', '--cleanup.remove-egs', 'true', '--use-gpu', 'true', '--feat-dir', 'data/train_sp_hires', '--tree-dir', 'exp/chain/tri5a_tree', '--lat-dir', 'exp/tri5a_lats_nodup_sp', '--dir', '/notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn']




2019-07-21 15:00:30,662 [steps/nnet3/chain/train.py:35 - <module> - INFO ] Starting chain model trainer (train.py)
2019-07-21 15:00:30,672 [steps/nnet3/chain/train.py:273 - train - INFO ] Arguments for the experiment
{'alignment_subsampling_factor': 3,
 
'apply_deriv_weights': False,
 
'backstitch_training_interval': 1,
 
'backstitch_training_scale': 0.0,
 
'chunk_left_context': 0,
 
'chunk_left_context_initial': -1,
 
'chunk_right_context': 0,
 
'chunk_right_context_final': -1,
 
'chunk_width': '140,100,160',
 
'cleanup': True,
 
'cmvn_opts': '--norm-means=false --norm-vars=false',
 
'combine_sum_to_one_penalty': 0.0,
 
'command': 'run.pl --mem 4G',
 
'compute_per_dim_accuracy': False,
 
'deriv_truncate_margin': None,
 
'dir': '/notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn',
 
'do_final_combination': True,
 
'dropout_schedule': '0,0...@0.20,0...@0.50,0',
 
'egs_command': None,
 
'egs_dir': None,
 
'egs_opts': '--frames-overlap-per-eg 0',
 
'egs_stage': 0,
 
'email': None,
 
'exit_stage': None,
 
'feat_dir': 'data/train_sp_hires',
 
'final_effective_lrate': 5e-05,
 
'frame_subsampling_factor': 3,
 
'frames_per_iter': 5000000,
 
'initial_effective_lrate': 0.0005,
 
'input_model': None,
 
'l2_regularize': 0.0,
 
'lat_dir': 'exp/tri5a_lats_nodup_sp',
 
'leaky_hmm_coefficient': 0.1,
 
'left_deriv_truncate': None,
 
'left_tolerance': 5,
 
'lm_opts': '--num-extra-lm-states=2000',
 
'max_lda_jobs': 10,
 
'max_models_combine': 20,
 
'max_objective_evaluations': 30,
 
'max_param_change': 2.0,
 
'momentum': 0.0,
 
'num_chunk_per_minibatch': '128,64',
 
'num_epochs': 4.0,
 
'num_jobs_final': 1,
 
'num_jobs_initial': 1,
 
'online_ivector_dir': 'exp/nnet3/ivectors_train_sp_hires',
 
'preserve_model_interval': 100,
 
'presoftmax_prior_scale_power': -0.25,
 
'proportional_shrink': 0.0,
 
'rand_prune': 4.0,
 
'remove_egs': True,
 
'reporting_interval': 0.1,
 
'right_tolerance': 5,
 
'samples_per_iter': 400000,
 
'shrink_saturation_threshold': 0.4,
 
'shrink_value': 1.0,
 
'shuffle_buffer_size': 5000,
 
'srand': 0,
 
'stage': -10,
 
'train_opts': [],
 
'tree_dir': 'exp/chain/tri5a_tree',
 
'use_gpu': 'yes',
 
'xent_regularize': 0.1}
2019-07-21 15:00:30,976 [steps/nnet3/chain/train.py:327 - train - INFO ] Creating phone language-model
2019-07-21 15:00:47,141 [steps/nnet3/chain/train.py:332 - train - INFO ] Creating denominator FST
copy
-transition-model exp/chain/tri5a_tree/final.mdl /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/0.trans_mdl
LOG
(copy-transition-model[5.5]:main():copy-transition-model.cc:62) Copied transition model.
2019-07-21 15:00:48,357 [steps/nnet3/chain/train.py:339 - train - INFO ] Initializing a basic network for estimating preconditioning matrix
2019-07-21 15:00:48,510 [steps/nnet3/chain/train.py:361 - train - INFO ] Generating egs
2019-07-21 15:00:48,511 [steps/libs/nnet3/train/common.py:491 - verify_egs_dir - ERROR ] The egs dir /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/egs has missing or malformed files.
Traceback (most recent call last):
 
File "steps/nnet3/chain/train.py", line 624, in main
    train
(args, run_opts)
 
File "steps/nnet3/chain/train.py", line 400, in train
    egs_right_context_final
))
 
File "steps/libs/nnet3/train/common.py", line 399, in verify_egs_dir
    egs_dir
)).readline())
FileNotFoundError: [Errno 2] No such file or directory: '/notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/egs/info/feat_dim'





Jan Trmal

unread,
Jul 21, 2019, 11:20:04 AM7/21/19
to kaldi-help
if you say you didn't modify anything, then something is going wrong with the get_egs call
Can you check if there are logs in ...egs/log/ and egs/q/*.log
In the latter case, probably specifically egs/q/get_egs.log
Make sure there are no errors or suspicious messages...

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/7970f694-a50d-4f97-937e-aa224e86709a%40googlegroups.com.

Jaskaran Singh Puri

unread,
Jul 21, 2019, 12:08:38 PM7/21/19
to kaldi-help
There's no egs dir getting created by the script at the path of $dir

Daniel Povey

unread,
Jul 21, 2019, 2:25:00 PM7/21/19
to kaldi-help
I notice a typo `multipsplice`.. perhaps to do with that.


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/cdbfc435-48dd-491c-89e3-c05418ae7af2%40googlegroups.com.

Daniel Povey

unread,
Jul 21, 2019, 2:33:07 PM7/21/19
to kaldi-help
Oh no that typo was there in the original, it doesn't matter.
It looks like get_egs.sh is not getting called.  If it were being called, it would echo its arguments.
You could do `git status -uno` at the top level to see if you changed any script code.

Look for the function generate_chain_egs in
steps/libs/nnet3/train/chain_objf/acoustic_model.py
You could add some print statements in there to figure out what is happening.
You can add --stage -3 to train.py so it will start from the get_egs stage, for easier debugging.

Jan Trmal

unread,
Jul 24, 2019, 2:59:13 PM7/24/19
to kaldi-help
I was able to reproduce this if you set train_stage=0 (or anything above -3) instead of, say,  train_stage=-10
y.

Reply all
Reply to author
Forward
0 new messages