nn training epochs, overfitting, and visualizing training process

Armin Oliya

unread,

Feb 13, 2018, 8:18:07 PM2/13/18

to kaldi-help

I have realized that most nn training recipes use 4 for num-epochs; a few questions:

- shouldn't num-epochs be dependent of train set size and neural net size?

- how do you decide on the optimal number of epochs/learning rate?

- in general how do you track accuracy/loss on train and validation sets during training? are there any options to export these metrics to viz tools like visdom or tensorboard?

Thanks!

Daniel Povey

unread,

Feb 13, 2018, 11:19:27 PM2/13/18

to kaldi-help

I have realized that most nn training recipes use 4 for num-epochs; a few questions:

- shouldn't num-epochs be dependent of train set size and neural net size?

Yes, sometimes when there is less data it makes sense to use more epochs, up to 10 or so.

- how do you decide on the optimal number of epochs/learning rate?

You'd normally have to tune them.

- in general how do you track accuracy/loss on train and validation sets during training? are there any options to export these metrics to viz tools like visdom or tensorboard?

Personally I rely on grepping in the logs (e.g.: `grep Overall exp/chain/tdnn1b_sp/log/compute_prob_*.100.log`)

but you can also use

steps/nnet3/report/generate_plots.py

(see its usage message; it generates a pdf plot)

Dan

Thanks!

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/718e5e4d-7a5f-431a-ae2c-a12755302478%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Armin Oliya

unread,

Feb 14, 2018, 10:48:16 AM2/14/18

to kaldi-help

Thanks Dan,

less data > more epochs .. i thought it should be the other way around to avoid overfitting (less data > less epochs).

generate_plots.py is really handy! so looking at a small experiment (40h) the log chart looks like below:

i don't know how to interpret the log probability exactly but it looks like it's overfitting after 100 iterations; that makes me think:

- is there an "early stopping" option to stop trining when valid loss is not improving?

- if not, would it make sense to look at this chart during training, stop when val loss is plummeting, take xx.mdl, and use it as final.mdl?

On Wednesday, February 14, 2018 at 12:19:27 AM UTC+1, Dan Povey wrote:

I have realized that most nn training recipes use 4 for num-epochs; a few questions:

- shouldn't num-epochs be dependent of train set size and neural net size?

Yes, sometimes when there is less data it makes sense to use more epochs, up to 10 or so.

- how do you decide on the optimal number of epochs/learning rate?

You'd normally have to tune them.

- in general how do you track accuracy/loss on train and validation sets during training? are there any options to export these metrics to viz tools like visdom or tensorboard?

Personally I rely on grepping in the logs (e.g.: `grep Overall exp/chain/tdnn1b_sp/log/compute_prob_*.100.log`)
but you can also use
steps/nnet3/report/generate_plots.py
(see its usage message; it generates a pdf plot)

Dan

Thanks!

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,

Feb 14, 2018, 6:04:01 PM2/14/18

to kaldi-help

We don't do early stopping because it turns out the valid loss is not a good guide to WER, and also because of concerns about repeatability (small random changes causing drastic changes in the model due to making it stop early).

Dan

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/000ae4ce-0a2b-4eb8-a97d-99c2ed40a513%40googlegroups.com.

Armin Oliya

unread,

Feb 14, 2018, 11:19:11 PM2/14/18

to kaldi-help

Then can i ask what the take away from this chart is?

i just finished two transfer learning experiments, one with 2 training epochs with a chart similar to the one attached, so i cut num_epochs to 1 for the second experiment and got some wer improvement (less overfitting on target domain).

about repeatability, assuming val loss is a good indication of wer, wouldn't it be alleviated if early stopping window is large enough (eg. no improvement after 100 iters)?

Daniel Povey

unread,

Feb 14, 2018, 11:23:47 PM2/14/18

to kaldi-help

I'd say your chart shows overfitting, but not because the validation loss eventually gets worse (it's still possible for WER to be increasing while validation loss is degrading). It's overfitting because there is a factor of 3 between your training and validation loss functions: anything more than about a factor of 1.5 difference is too much.

Normally you'd want to use a model with fewer parameters in that case.

Dan

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/0ced7833-512e-482b-afb7-b7f85cc41f28%40googlegroups.com.

Armin Oliya

unread,

Feb 19, 2018, 5:35:35 PM2/19/18

to kaldi-help

Thanks, got your point.

Armin Oliya

unread,

Mar 8, 2018, 1:41:41 PM3/8/18

to kaldi-help

Hi Dan,

I just finished an experiment on 700 hours of data (before speed/volume perturbation) and the logs look like attached. I mostly followed swbd recipe and used run_tdnn_7n for training the acoustic model, 4 epochs on 1 gpu.

Looking at the log chart I get a feeling that it's underfitting becaue it hasn't reached a point beyond which training probs improve and validation probs plateau. Is this observation correct? if so, how would you train the net for a few more epochs with the latest mdl?

Thanks

Armin

log_probability_output_swbd_tdnn_7n.pdf

accuracy.report_7n

Daniel Povey

unread,

Mar 8, 2018, 6:49:25 PM3/8/18

to kaldi-help

I suspect that for that model type, when you use 1 GPU it ends up training too fast, leading to too much parameter noise. (The interaction with the parallelization and things like learning rates are quite complicated). The easiest fix (and this might generate some useful information for me) would be as follows: halve the learning rates and the global max-change (change the max-change from 2.0 to 1.0), and train for another epoch, by increasing the --num-epochs parameter to chain/train.py by one, and training from the lsat --iter for which you have a model.

Also send me, separately, the last-numbered progress log.

Dan

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/eda514f6-3dbc-4ef6-980d-5697ca77f7fe%40googlegroups.com.

Armin Oliya

unread,

Mar 8, 2018, 9:51:59 PM3/8/18

to kaldi-help

Thanks Dan, I started running it for another epoch, is gonna take like two more days (aws v100).

I actually opted for one gpu cause from related discussions i got the impression that it leads to more accurate results and running on multiple gpus is mainly to reduce wall time. and in fact, i did two experiments on a smaller set (300 hours, less phones, smaller lexicon, one accent only) and 1 gpu version got up to 1.0% wer improvement compared to 2>4 gpus across a few test sets. Looking at their charts though, as you suggested there's more jitter for 1 gpu training and it also seems to have terminated prematurely (4 epochs for both, charts attached).

last numbered logs for main experiment (700h) also attached. Thanks!

logs_tdnn7n_300h.zip

logs_700h_tdnn7n_1gpu_4ep.zip

Daniel Povey

unread,

Mar 8, 2018, 10:00:37 PM3/8/18

to kaldi-help

You should also halve the l2 regularization.

Right now the tdnnX.affine layers are training too fast relative to the other layers, and reducing the l2 will make them train slower.

LOG (nnet3-show-progress[5.4.16~1-8b500]:main():nnet3-show-progress.cc:157) Relative parameter differences per layer are [ tdnn1.affine:0.0854533 tdnn2l:0.0623831 tdnn2.affine:0.0801025 tdnn3l:0.0422916 tdnn3.affine:0.0859651 tdnn4l:0.0444857 tdnn4.affine:0.0824367 tdnn5l:0.0217604 tdnn5.affine:0.0840922 tdnn6l:0.0542129 tdnn6.affine:0.0771912 tdnn7l:0.0346742 tdnn7.affine:0.0799106 tdnn8l:0.0505169 tdnn8.affine:0.0750518 tdnn9l:0.0335023 tdnn9.affine:0.0811985 tdnn10l:0.0488606 tdnn10.affine:0.0751174 tdnn11l:0.0328729 tdnn11.affine:0.0772726 prefinal-l:0.0307793 prefinal-chain.affine:0.0715717 output.linear:0.0345637 output.affine:0.0304675 prefinal-xent.affine:0.0541264 output-xent.linear:0.00615398 output-xent.affine:0.0211829 ]

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/146d9ac6-2219-45f6-85d6-df1709d5a728%40googlegroups.com.

Daniel Povey

unread,

Mar 8, 2018, 10:01:15 PM3/8/18

to kaldi-help

... I specifically mean the l2 regularization for the affine layers, it will be in a variable like $opts or $affine_opts in the xconfig section.

Armin Oliya

unread,

Mar 8, 2018, 10:15:25 PM3/8/18

to kaldi-help

Got it, updated and running. Thanks!

Armin Oliya

unread,

Mar 9, 2018, 11:07:23 AM3/9/18

to kaldi-help

Hi Dan,

so i halved max change (--trainer.max-param-change) and regularization ($opts) in run_tdnn_7n.sh and re-ran with --stage 12 --train_stage <latest>, but i doubt if they are being applied. Their effective value in progress.x.log haven't changed and relative param change is till showing higher values for affine layers.

configs/final.config shows halved l2-regularization values but no change in max-change values.

What did i miss?

Hossein Hadian

unread,

Mar 9, 2018, 11:21:21 AM3/9/18

to kaldi-help

I guess the "xconfig" changes you made have been applied to 0.raw and 0.mdl only. I guess you should update your latest model using "nnet3-am-copy --nnet-config <new-final.config> ..." before resuming the training.

Hossein

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/be67e7f8-d810-451c-8849-b21d54bce097%40googlegroups.com.

Armin Oliya

unread,

Mar 9, 2018, 12:17:15 PM3/9/18

to kaldi-help

Thanks Hossein,

So is the following the right way to do it (highlights are changes from original run_tdnn_7n.sh):



if [ $stage -le 12 ]; then
  echo "$0: creating neural net configs using the xconfig parser";

  num_targets=$(tree-info $treedir/tree |grep num-pdfs|awk '{print $2}')
  learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
  opts="l2-regularize=0.001"
  linear_opts="orthonormal-constraint=1.0"
  output_opts="l2-regularize=0.0005 bottleneck-dim=256"

  mkdir -p $dir/configs

  cat <<EOF > $dir/configs/network.xconfig
  input dim=100 name=ivector
  input dim=40 name=input
  # please note that it is important to have input layer with the name=input
  # as the layer immediately preceding the fixed-affine-layer to enable
  # the use of short notation for the descriptor

  ....

  output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor $output_opts
EOF
  steps/nnet3/xconfig_to_configs.py --xconfig-file $dir/configs/network.xconfig --config-dir $dir/configs/
fi



 $train_cmd $dir/log/generate_input_mdl_transfer.log \
      nnet3-am-copy  --raw=true --nnet-config=$dir/configs/final.config \
      $dir/6380.mdl $dir/6380_lower_l2.raw || exit 1;

echo "Copied pretrained neural net weights"



if [ $stage -le 13 ]; then

  steps/nnet3/chain/train.py --stage $train_stage \
    --cmd "$train_cmd" \
    --trainer.input-model $dir/6380_lower_l2.raw \
    --feat.online-ivector-dir ${exp}/nnet3/ivectors_${train_set} \
    --feat.cmvn-opts "--norm-means=false --norm-vars=false" \
    --chain.xent-regularize $xent_regularize \
...

Hossein Hadian

unread,

Mar 9, 2018, 1:18:24 PM3/9/18

to kaldi-help

AFAIK --trainer.input-model is for creating the initial model so it might not help. I guess it'd be easier to do it manually for the last model, e.g.

nnet3-am-copy --nnet-config=$dir/configs/final.config \

      $dir/6380.mdl $dir/6380.mdl

and then call the script with --stage 13 --train-stage 6380

Hossein

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/d622d574-b130-4c6f-a073-c10261b8de93%40googlegroups.com.

Armin Oliya

unread,

Mar 9, 2018, 2:30:49 PM3/9/18

to kaldi-help

Thanks Hossein, i think it worked, looking at the logs i see updated l2 values.

there's a sudden drop in output probs though which seems to be getting back to previous range (above -2.0).

exp/full_cgn_flnl_swbd/chain/tdnn7n_sp/log/compute_prob_train.6328.log:LOG (nnet3-chain-compute-prob[5.4.16~1-8b500]:PrintTotalStats():nnet-chain-diagnostics.cc:193) Overall log-probability for 'output-xent' is -3.53733 per frame, over 17602 frames.
exp/full_cgn_flnl_swbd/chain/tdnn7n_sp/log/compute_prob_train.6328.log:LOG (nnet3-chain-compute-prob[5.4.16~1-8b500]:PrintTotalStats():nnet-chain-diagnostics.cc:193) Overall log-probability for 'output' is -0.349972 per frame, over 17602 frames.
exp/full_cgn_flnl_swbd/chain/tdnn7n_sp/log/compute_prob_valid.6328.log:LOG (nnet3-chain-compute-prob[5.4.16~1-8b500]:PrintTotalStats():nnet-chain-diagnostics.cc:193) Overall log-probability for 'output-xent' is -3.4315 per frame, over 17294 frames.
exp/full_cgn_flnl_swbd/chain/tdnn7n_sp/log/compute_prob_valid.6328.log:LOG (nnet3-chain-compute-prob[5.4.16~1-8b500]:PrintTotalStats():nnet-chain-diagnostics.cc:193) Overall log-probability for 'output' is -0.323175 per frame, over 17294 frames.

Daniel Povey

unread,

Mar 9, 2018, 8:11:53 PM3/9/18

to kaldi-help

Doing that would have basically re-initialized your model from scratch, so you would have lost all your previous training. Not much point in that.

I realize that my instruction to change the l2 was pointless in this context as you can't easily change the l2 of a trained model. (well, you could do it by printing as text with --binary=false and editing the text, if you really wanted).

Halving the --trainer.max-param-change should have made a difference though.

You can tell in the training logs, train.*.log, whether it's being applied-- grep for 'global'.

Dan

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/dbb70bda-7980-41c3-8cb2-12c509ba188e%40googlegroups.com.

Armin Oliya

unread,

Mar 9, 2018, 8:42:37 PM3/9/18

to kaldi-help

Thanks Dan,

To be clear what exactly would re-initialize the model? cause i referenced "--raw=true" option from Pegah's transfer learning experiment 1b and i thought it copies the weights over and continues training with new data.

yes i see the max-param-change having taken effect, sorry for confusion.

Daniel Povey

unread,

Mar 9, 2018, 8:46:53 PM3/9/18

to kaldi-help

The

nnet3-am-copy --nnet-config <new-final.config>

would have reinitialized the model.

You can't use that mechanism to just change the l2, it will change the parameters too.

I am working on some code changes that will make it slightly easier to control the learning rates of these types of networks. I think the reason why we're seeing underfitting for larger networks is definitely that the learning rates are too high. I tried doing what I told you-- halving the max-change and retraining for a while-- for a model I had, and it made a substantial improvement to the objective functions. (It didn't improve the WER, likely because the model had too many parameters or already had a suitable learning rate). You can see the results below.

Dan

#!/bin/bash

# 7m25t3 is a quick experiment to see whether by slowing down training, we can

# improve results. It's like 7m25t but with half the max-change, and one more

# epoch; BUT we start from the last iter of 7m25t, so only running the last

# epoch of training. Definitely trains more completely, but WER is worse.

# local/chain/compare_wer_general.sh --rt03 tdnn7m25q_sp tdnn7m25t_sp tdnn7m25t3_sp

# System tdnn7m25q_sp tdnn7m25t_sp tdnn7m25t3_sp

# WER on train_dev(tg) 12.05 11.64 11.79

# WER on train_dev(fg) 11.08 10.74 10.96

# WER on eval2000(tg) 14.8 14.7 15.1

# WER on eval2000(fg) 13.2 13.2 13.7

# WER on rt03(tg) 18.0 17.9 18.4

# WER on rt03(fg) 15.7 15.8 16.3

# Final train prob -0.076 -0.074 -0.059

# Final valid prob -0.091 -0.089 -0.080

# Final train prob (xent) -0.982 -0.904 -0.789

# Final valid prob (xent) -1.0077 -0.9288 -0.8406

# Num-parameters 22735140 23259428 23259428

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/87cd9c8a-c970-48bf-baf8-0548e2fd23cf%40googlegroups.com.

Daniel Povey

unread,

Mar 9, 2018, 9:28:36 PM3/9/18

to kaldi-help

Armin,

If you have time to train a model from scratch, I have a suggestion for a configuration.

First you would have to merge the latest code from master.

I made a code change so that if you set orthonormal-constraint to a negative value, it will constrain the matrix to (any constant) times a semi-orthogonal matrix. That means that you can control the learning speed of the linear components using l2 regularization, which is more consistent with how we control it for normal affine layers, and makes configuration more straightforward.

Below is what I consider a reasonable setting for Switchboard experiments. I'm trying this right now, but note that it will tend to optimize more completely than older setups (due to a lower level of gradient noise), which will could potentially lead to overfitting.

opts="l2-regularize=0.001 dropout-proportion=0.0 dropout-per-dim=true dropout-per-dim-continuous=true"

linear_opts="orthonormal-constraint=-1.0 l2-regularize=0.001"

output_opts="l2-regularize=0.001"

For larger setups, closer to 1000 hours, you could consider further halving the l2-regularize value in $opts and $linear_opts, and also halving the initial and final learning rates.

(Note: the l2 for relu-renorm layers should be interpreted as part of the learning rate, while the l2 for the output layers and LSTM layers has more of the traditional function of l2 regularization, to control overfitting and to prevent saturation of neurons).

Dan

Armin Oliya

unread,

Mar 9, 2018, 10:21:37 PM3/9/18

to kaldi-help

Right, still trying to understand details but i think i'm following.

Thanks for tips and results sharing, i think i let my remaining lower-lr epoch finish anyway and see if results match yours.

A few questions on the new setting:

- Is it right to say that the new setting with orthonormal constraint and your following suggestions will be mostly effective on single-gpu?

- why would you train more conservatively (smaller lr, l2) with larger datasets? i thought larger data has a regularization effect by itself which could counter e.g. possible side effect of too many params in the model?

- by ~1000 hours you mean hours before data augmentation (speed/volume), right?

Thanks!

Daniel Povey

unread,

Mar 9, 2018, 10:29:16 PM3/9/18

to kaldi-help

Thanks for tips and results sharing, i think i let my remaining lower-lr epoch finish anyway and see if results match yours.

Make sure you start from the current-trained model (i.e. don't start from the one where you used the nnet config). Otherwise it will be undertrained, as you'll have trained just for one epoch.

A few questions on the new setting:

- Is it right to say that the new setting with orthonormal constraint and your following suggestions will be mostly effective on single-gpu?

They will make more of a difference on a single GPU, but they should help in general because they enable the model to train at a speed that's more consistent across different types of layers, and they will also make it train a little more slowly overall (because l2 is smaller), which will be helpful for large dataset.

These changes might also be helpful for Switchboard-size data. I have realized now, that we were probably training these models too fast, and hence underfitting (due to parameter noise). I was using the intuition that the relative change per (relu) layer should be in the range 0.03 to 0.06 for Switchboard-size data, but I am now revising that figure a little bit downward for this particular architecture, because each layer is factored into two pieces, so it might make sense to have a smaller change per factor.

- why would you train more conservatively (smaller lr, l2) with larger datasets? i thought larger data has a regularization effect by itself which could counter e.g. possible side effect of too many params in the model?

l2 has nothing to do with regularization or overfitting when applied to relu+batchnorm layers, because shrinking the parameters gives you an equivalent model. Higher l2 just makes the model train faster. For larger datasets you want it to train slower, because by the end of an epoch you don't want it to have completely forgotten the data from the beginning of that same epoch. (I.e. when you finish training, you want the model to have some memory of all the data you trained it on, not just the most recently seen data).

- by ~1000 hours you mean hours before data augmentation (speed/volume), right?

Yes.

Dan

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/b5c52048-5eb6-49f1-960d-5f474e6a988f%40googlegroups.com.

Armin Oliya

unread,

Mar 10, 2018, 10:37:05 AM3/10/18

to kaldi-help

Thanks for the clear explanation, got your point :)

Armin Oliya

unread,

Mar 11, 2018, 2:05:28 AM3/11/18

to kaldi-help

I also got generally worse results (~1% wer) despite notable improvement on objective function.

With the new setting and 300/500 hours of data how many epochs would you recommend with 1 gpu and 2>4 gpus?

Daniel Povey

unread,

Mar 11, 2018, 2:11:12 AM3/11/18

to kaldi-help

For 300 hours of data before augmentation (a swbd-type setup) I'd say about 6-8 epochs if using >1 GPU and maybe 4-6 epochs if using 1 GPU.

It's hard to say more precisely.

It could be that for the newer setup that trains more slowly, a smaller model is needed.

How much data were you using for the experiment where you trained an already-trained model for one more epochs and got a WER degradation?

Dan

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/c5e8e7de-a375-468d-bdc1-2fa96c5dd717%40googlegroups.com.

Armin Oliya

unread,

Mar 11, 2018, 2:23:07 AM3/11/18

to kaldi-help

Thanks Dan, some more questions:

For 300 hours of data before augmentation (a swbd-type setup) I'd say about 6-8 epochs if using >1 GPU and maybe 4-6 epochs if using 1 GPU.

Will learning rates stay in the same range of 1e-3 > 1e-4?

Would it make sense to run with the upper bounds (8 on multigpu, 6 on single gpu) and evaluate wer with mdl 's corresponding to end of lower epochs?

How would you change num epochs when doubling hours (600h)?

It could be that for the newer setup that trains more slowly, a smaller model is needed.

Which of the other chain tdnn versions do you recommend for this?

How much data were you using for the experiment where you trained an already-trained model for one more epochs and got a WER degradation?

700 hours, 4 epochs with default setup + 1 epoch with halved learning rates, single gpu

Armin

Daniel Povey

unread,

Mar 11, 2018, 2:26:36 AM3/11/18

to kaldi-help

For 300 hours of data before augmentation (a swbd-type setup) I'd say about 6-8 epochs if using >1 GPU and maybe 4-6 epochs if using 1 GPU.

Will learning rates stay in the same range of 1e-3 > 1e-4?

probably, yes.

Would it make sense to run with the upper bounds (8 on multigpu, 6 on single gpu) and evaluate wer with mdl 's corresponding to end of lower epochs?

Sure, you can do that.

How would you change num epochs when doubling hours (600h)?

reduce it slightly, e.g. subtract t 2.

It could be that for the newer setup that trains more slowly, a smaller model is needed.

Which of the other chain tdnn versions do you recommend for this?

I don't have anything ready yet.

--

Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/1f1d8391-3b48-4161-9891-b5972cb67366%40googlegroups.com.

Armin Oliya

unread,

Mar 11, 2018, 2:31:18 AM3/11/18

to kaldi-help

Got it, thanks!

Daniel Povey

unread,

Mar 11, 2018, 3:14:10 AM3/11/18

to kaldi-help

Actually there is one setup you could try.

On github find my personal Kaldi repo and look for the branch 'for_gaofeng'.

There is a script there numbered 7m26f; you could try that on the 300h setup.

It's a substantially smaller model and trained for just 4 epochs (but will learn slower and optimize

more completely).

I really don't know how good it will be: it might turn out that the incomplete optimization was part of the reason for why it was working so well before. We'll have to see.

Dan

--

Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/97bfd8fa-3230-4836-92e6-e52349d73401%40googlegroups.com.

Reply all

Reply to author

Forward