Broken Spanish run_tdnn

Jaskaran Singh Puri

unread,

Jul 20, 2019, 12:05:52 PM7/20/19

to kaldi-help

While running this script: fisher_callhome_spanish/s5/local/chain/run_tdnn_1g.sh

I get an error, "egs has missing or malformed files"

It is unable to locate the "egs/info/feat_dim"

There's no "egs" directory getting created by the script nor are any of the files in the "info" dir.

I compared the structure to aspire's egs, there are lot of files that are not getting creating in the info dir

Also, I don't see any "mkdir -p $dir/egs" in this spanish script

Can we use the aspire's run_tdnn_7b.sh for spanish as this seems to be a bug?

Jan Trmal

unread,

Jul 20, 2019, 12:29:39 PM7/20/19

to kaldi-help

are you sure you didn't modify the script? run git diff <filename>

I'd start looking at train_stage being set to something else than the default -10 or the common_egs_dir defined.

I'm not sure if it's necessary to create the egs dir beforehand, if you do not care about the IO balancing

y.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/896d5817-f2a0-473f-a662-3895ceaeeb26%40googlegroups.com.

Jaskaran Singh Puri

unread,

Jul 20, 2019, 12:33:25 PM7/20/19

to kaldi-help

No changes have been made in the script, especially the initial params, infact I pulled the entire kaldi this month only.

Also, I'm training on the original LDC corpus, so the entire process should have run as it is.

On Saturday, July 20, 2019 at 9:59:39 PM UTC+5:30, Yenda wrote:

are you sure you didn't modify the script? run git diff <filename>
I'd start looking at train_stage being set to something else than the default -10 or the common_egs_dir defined.
I'm not sure if it's necessary to create the egs dir beforehand, if you do not care about the IO balancing
y.

On Sat, Jul 20, 2019 at 12:05 PM Jaskaran Singh Puri <jaskar...@gmail.com> wrote:

While running this script: fisher_callhome_spanish/s5/local/chain/run_tdnn_1g.sh

I get an error, "egs has missing or malformed files"
It is unable to locate the "egs/info/feat_dim"

There's no "egs" directory getting created by the script nor are any of the files in the "info" dir.
I compared the structure to aspire's egs, there are lot of files that are not getting creating in the info dir

Also, I don't see any "mkdir -p $dir/egs" in this spanish script
Can we use the aspire's run_tdnn_7b.sh for spanish as this seems to be a bug?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Jan Trmal

unread,

Jul 20, 2019, 12:35:27 PM7/20/19

to kaldi-help

In that case I think there will be error before this error somewhere -- I don't see anything wrong with the script

y.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/c25c6c5a-58f7-4e9c-b741-6a8dc40483ac%40googlegroups.com.

Jan Trmal

unread,

Jul 20, 2019, 12:37:37 PM7/20/19

to kaldi-help

I'd re-run the script with --stage 19 and watched for errors/suspicious output w.r.t egs generation.

If you can, share the full output of this with us

y.

Jaskaran Singh Puri

unread,

Jul 21, 2019, 5:01:53 AM7/21/19

to kaldi-help

Latest error, at stage 19

The only chnage I've made is, change 'dir' variable to custom path, so all files are created where I have enough disk space, btw it didn't work at defaults as well

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/c25c6c5a-58f7-4e9c-b741-6a8dc40483ac%40googlegroups.com.

Jan Trmal

unread,

Jul 21, 2019, 9:59:22 AM7/21/19

to kaldi-help

iirc, if you set the common_egs_dir, it is assumed the egs do already exist, i.e., in this case, it's not a surprising error.

y.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/cdbfc435-48dd-491c-89e3-c05418ae7af2%40googlegroups.com.

Jaskaran Singh Puri

unread,

Jul 21, 2019, 10:31:50 AM7/21/19

to kaldi-help

So I should remove the common_egs_dir param passed to train.py?

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/cdbfc435-48dd-491c-89e3-c05418ae7af2%40googlegroups.com.

Jan Trmal

unread,

Jul 21, 2019, 10:33:30 AM7/21/19

to kaldi-help

yes

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/b1fc6f4e-8bbc-4b58-8a4c-e80a6457006c%40googlegroups.com.

Jaskaran Singh Puri

unread,

Jul 21, 2019, 10:53:40 AM7/21/19

to kaldi-help

Still the same error

yes

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/b1fc6f4e-8bbc-4b58-8a4c-e80a6457006c%40googlegroups.com.

Jan Trmal

unread,

Jul 21, 2019, 10:55:35 AM7/21/19

to kaldi-help

you have to send the output. Plus show how you call it (or show the parameter array) from the output.

y.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/0fd899a7-7021-4f30-80ce-a7cd71b99fd8%40googlegroups.com.

Jaskaran Singh Puri

unread,

Jul 21, 2019, 11:03:52 AM7/21/19

to kaldi-help

Running directly from stage 19. Completely remove the --egs-dir param to train.py in the tdnn script

local/chain/run_tdnn_1g.sh
local/chain/run_tdnn_1g.sh: creating neural net configs using the xconfig parser
tree-info exp/chain/tri5a_tree/tree
nnet3-init /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//init.config /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//init.raw
LOG (nnet3-init[5.5]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//init.raw
nnet3-info /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//init.raw
nnet3-init /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.config /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.raw
LOG (nnet3-init[5.5]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.raw
nnet3-info /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.raw
nnet3-init /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.config /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.raw
LOG (nnet3-init[5.5]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.raw
nnet3-info /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs//ref.raw
steps/nnet3/xconfig_to_configs.py --xconfig-file /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs/network.xconfig --config-dir /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/configs/


steps/nnet3/chain/train.py --stage=-10 --cmd run.pl --mem 4G --feat.online-ivector-dir exp/nnet3/ivectors_train_sp_hires --feat.cmvn-opts --norm-means=false --norm-vars=false --chain.xent-regularize 0.1 --chain.leaky-hmm-coefficient 0.1 --chain.l2-regularize 0.0 --chain.apply-deriv-weights false --chain.lm-opts=--num-extra-lm-states=2000 --trainer.dropout-schedule 0,0@0.20,0.3@0.50,0 --trainer.srand 0 --trainer.max-param-change 2.0 --trainer.num-epochs 4 --trainer.frames-per-iter 5000000 --trainer.optimization.num-jobs-initial 1 --trainer.optimization.num-jobs-final=1 --trainer.optimization.initial-effective-lrate 0.0005 --trainer.optimization.final-effective-lrate 0.00005 --trainer.num-chunk-per-minibatch 128,64 --trainer.optimization.momentum 0.0 --egs.chunk-width 140,100,160 --egs.chunk-left-context 0 --egs.chunk-right-context 0 --egs.opts --frames-overlap-per-eg 0 --cleanup.remove-egs true --use-gpu true --feat-dir data/train_sp_hires --tree-dir exp/chain/tri5a_tree --lat-dir exp/tri5a_lats_nodup_sp --dir /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn
['steps/nnet3/chain/train.py', '--stage=-10', '--cmd', 'run.pl --mem 4G', '--feat.online-ivector-dir', 'exp/nnet3/ivectors_train_sp_hires', '--feat.cmvn-opts', '--norm-means=false --norm-vars=false', '--chain.xent-regularize', '0.1', '--chain.leaky-hmm-coefficient', '0.1', '--chain.l2-regularize', '0.0', '--chain.apply-deriv-weights', 'false', '--chain.lm-opts=--num-extra-lm-states=2000', '--trainer.dropout-schedule', '0,0...@0.20,0...@0.50,0', '--trainer.srand', '0', '--trainer.max-param-change', '2.0', '--trainer.num-epochs', '4', '--trainer.frames-per-iter', '5000000', '--trainer.optimization.num-jobs-initial', '1', '--trainer.optimization.num-jobs-final=1', '--trainer.optimization.initial-effective-lrate', '0.0005', '--trainer.optimization.final-effective-lrate', '0.00005', '--trainer.num-chunk-per-minibatch', '128,64', '--trainer.optimization.momentum', '0.0', '--egs.chunk-width', '140,100,160', '--egs.chunk-left-context', '0', '--egs.chunk-right-context', '0', '--egs.opts', '--frames-overlap-per-eg 0', '--cleanup.remove-egs', 'true', '--use-gpu', 'true', '--feat-dir', 'data/train_sp_hires', '--tree-dir', 'exp/chain/tri5a_tree', '--lat-dir', 'exp/tri5a_lats_nodup_sp', '--dir', '/notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn']




2019-07-21 15:00:30,662 [steps/nnet3/chain/train.py:35 - <module> - INFO ] Starting chain model trainer (train.py)
2019-07-21 15:00:30,672 [steps/nnet3/chain/train.py:273 - train - INFO ] Arguments for the experiment
{'alignment_subsampling_factor': 3,
 'apply_deriv_weights': False,
 'backstitch_training_interval': 1,
 'backstitch_training_scale': 0.0,
 'chunk_left_context': 0,
 'chunk_left_context_initial': -1,
 'chunk_right_context': 0,
 'chunk_right_context_final': -1,
 'chunk_width': '140,100,160',
 'cleanup': True,
 'cmvn_opts': '--norm-means=false --norm-vars=false',
 'combine_sum_to_one_penalty': 0.0,
 'command': 'run.pl --mem 4G',
 'compute_per_dim_accuracy': False,
 'deriv_truncate_margin': None,
 'dir': '/notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn',
 'do_final_combination': True,
 'dropout_schedule': '0,0...@0.20,0...@0.50,0',
 'egs_command': None,
 'egs_dir': None,
 'egs_opts': '--frames-overlap-per-eg 0',
 'egs_stage': 0,
 'email': None,
 'exit_stage': None,
 'feat_dir': 'data/train_sp_hires',
 'final_effective_lrate': 5e-05,
 'frame_subsampling_factor': 3,
 'frames_per_iter': 5000000,
 'initial_effective_lrate': 0.0005,
 'input_model': None,
 'l2_regularize': 0.0,
 'lat_dir': 'exp/tri5a_lats_nodup_sp',
 'leaky_hmm_coefficient': 0.1,
 'left_deriv_truncate': None,
 'left_tolerance': 5,
 'lm_opts': '--num-extra-lm-states=2000',
 'max_lda_jobs': 10,
 'max_models_combine': 20,
 'max_objective_evaluations': 30,
 'max_param_change': 2.0,
 'momentum': 0.0,
 'num_chunk_per_minibatch': '128,64',
 'num_epochs': 4.0,
 'num_jobs_final': 1,
 'num_jobs_initial': 1,
 'online_ivector_dir': 'exp/nnet3/ivectors_train_sp_hires',
 'preserve_model_interval': 100,
 'presoftmax_prior_scale_power': -0.25,
 'proportional_shrink': 0.0,
 'rand_prune': 4.0,
 'remove_egs': True,
 'reporting_interval': 0.1,
 'right_tolerance': 5,
 'samples_per_iter': 400000,
 'shrink_saturation_threshold': 0.4,
 'shrink_value': 1.0,
 'shuffle_buffer_size': 5000,
 'srand': 0,
 'stage': -10,
 'train_opts': [],
 'tree_dir': 'exp/chain/tri5a_tree',
 'use_gpu': 'yes',
 'xent_regularize': 0.1}
2019-07-21 15:00:30,976 [steps/nnet3/chain/train.py:327 - train - INFO ] Creating phone language-model
2019-07-21 15:00:47,141 [steps/nnet3/chain/train.py:332 - train - INFO ] Creating denominator FST
copy-transition-model exp/chain/tri5a_tree/final.mdl /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/0.trans_mdl
LOG (copy-transition-model[5.5]:main():copy-transition-model.cc:62) Copied transition model.
2019-07-21 15:00:48,357 [steps/nnet3/chain/train.py:339 - train - INFO ] Initializing a basic network for estimating preconditioning matrix
2019-07-21 15:00:48,510 [steps/nnet3/chain/train.py:361 - train - INFO ] Generating egs
2019-07-21 15:00:48,511 [steps/libs/nnet3/train/common.py:491 - verify_egs_dir - ERROR ] The egs dir /notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/egs has missing or malformed files.
Traceback (most recent call last):
  File "steps/nnet3/chain/train.py", line 624, in main
    train(args, run_opts)
  File "steps/nnet3/chain/train.py", line 400, in train
    egs_right_context_final))
  File "steps/libs/nnet3/train/common.py", line 399, in verify_egs_dir
    egs_dir)).readline())
FileNotFoundError: [Errno 2] No such file or directory: '/notebooks/jpuri/training_v3/spanish/exp/chain/multipsplice_tdnn/egs/info/feat_dim'

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/0fd899a7-7021-4f30-80ce-a7cd71b99fd8%40googlegroups.com.

Jan Trmal

unread,

Jul 21, 2019, 11:20:04 AM7/21/19

to kaldi-help

if you say you didn't modify anything, then something is going wrong with the get_egs call

Can you check if there are logs in ...egs/log/ and egs/q/*.log

In the latter case, probably specifically egs/q/get_egs.log

Make sure there are no errors or suspicious messages...

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/7970f694-a50d-4f97-937e-aa224e86709a%40googlegroups.com.

Jaskaran Singh Puri

unread,

Jul 21, 2019, 12:08:38 PM7/21/19

to kaldi-help

There's no egs dir getting created by the script at the path of $dir

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/7970f694-a50d-4f97-937e-aa224e86709a%40googlegroups.com.

Daniel Povey

unread,

Jul 21, 2019, 2:25:00 PM7/21/19

to kaldi-help

I notice a typo `multipsplice`.. perhaps to do with that.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/cdbfc435-48dd-491c-89e3-c05418ae7af2%40googlegroups.com.

Daniel Povey

unread,

Jul 21, 2019, 2:33:07 PM7/21/19

to kaldi-help

Oh no that typo was there in the original, it doesn't matter.

It looks like get_egs.sh is not getting called. If it were being called, it would echo its arguments.

You could do `git status -uno` at the top level to see if you changed any script code.

Look for the function generate_chain_egs in

steps/libs/nnet3/train/chain_objf/acoustic_model.py

You could add some print statements in there to figure out what is happening.

You can add --stage -3 to train.py so it will start from the get_egs stage, for easier debugging.

Jan Trmal

unread,

Jul 24, 2019, 2:59:13 PM7/24/19

to kaldi-help

I was able to reproduce this if you set train_stage=0 (or anything above -3) instead of, say, train_stage=-10

y.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyRHKOm4PnT9XfmmU8YW7NED8dpZ3sj-BXSSumDFxY75mw%40mail.gmail.com.

Reply all

Reply to author

Forward

Broken Spanish run_tdnn_1g.sh?

Jaskaran Singh Puri

Jan Trmal

Jaskaran Singh Puri

Jan Trmal

Jan Trmal

Jaskaran Singh Puri

Jan Trmal

Jaskaran Singh Puri

Jan Trmal

Jaskaran Singh Puri

Jan Trmal

Jaskaran Singh Puri

Jan Trmal

Jaskaran Singh Puri

Daniel Povey

Daniel Povey

Jan Trmal