About the relationship of dataset size and layers of chain model？

Branden Stark

unread,

Mar 9, 2018, 4:11:11 AM3/9/18

to kaldi-help

Hello,everyone

I want to use my large data set to train a tdnn chain model . The dataset is about 5000 hours, if I make the speed up ,it will become 15000 hours. For such a large data set ，how should I set neural network parameters？ Who can give me some advice？ If the network's parameters are too large， maybe the decode speed will become slow.

Here are my plain:

relu-batchnorm-layer name=tdnn1 dim=1200

relu-batchnorm-layer name=tdnn2 input=Append(-1,1) dim=1200

relu-batchnorm-layer name=tdnn3 input=Append(-1,1) dim=1200

relu-batchnorm-layer name=tdnn4 input=Append(-3,3) dim=1200

relu-batchnorm-layer name=tdnn5 input=Append(-3,3) dim=1200

relu-batchnorm-layer name=tdnn6 input=Append(-3,3) dim=1200

relu-batchnorm-layer name=tdnn7 input=Append(-3,3) dim=1200

relu-batchnorm-layer name=tdnn8 input=Append(-3,3) dim=1200

attention-relu-renorm-layer name=attention1 num-heads=15 value-dim=80 key-dim=40 num-left-inputs=5 num-right-inputs=2 time-stride=3

relu-batchnorm-layer name=prefinal-chain input=attention1 dim=1200 target-rms=0.5

output-layer name=output include-log-softmax=false dim=$num_targets max-change=1.5

relu-batchnorm-layer name=prefinal-xent input=attention1 dim=1200 target-rms=0.5

output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor max-change=1.5

Daniel Povey

unread,

Mar 9, 2018, 5:09:56 PM3/9/18

to kaldi-help

If you want a model that's not too large and performs well for large amounts of training data, I recommend that you take the configuration in local/chain/run_tdnn_7n.sh from egs/swbd/s5c, and change the following section:

opts="l2-regularize=0.002"

linear_opts="orthonormal-constraint=1.0"

output_opts="l2-regularize=0.0005 bottleneck-dim=256"

to:

opts="l2-regularize=0.0005"

linear_opts="l2-regularize=0.0005 orthonormal-constraint=-1.0"

output_opts="l2-regularize=0.00025 bottleneck-dim=320"

and

--trainer.num-epochs 6 \

to

--trainer.num-epochs 2 \

and

--trainer.optimization.initial-effective-lrate 0.001 \

--trainer.optimization.final-effective-lrate 0.0001 \

to

--trainer.optimization.initial-effective-lrate 0.0005 \

--trainer.optimization.final-effective-lrate 0.00005 \

Note that I changed orthonormal-constraint from 1.0 to -1.0, don't miss that. And for this to work, you need fully up-to-date code; I merged a change half an hour ago that is required for this to run.

I'd appreciate hearing back from you RE how this worked out.

The discussion in this thread

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/kaldi-help/l8SNSqjPGqk/cMUjNRZJBgAJ

may be relevant to understand some of the issues at stake.

Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/2fcbfc1b-7b9c-45d8-80f4-fd8ac1549ded%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Branden Stark

unread,

Mar 11, 2018, 10:33:15 PM3/11/18

to kaldi-help

Thanks Dan,

I saw the configuration in run_tdnn_7n.sh, I will do it as soon as possible.

在 2018年3月10日星期六 UTC+8上午6:09:56，Dan Povey写道：

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

lian li

unread,

Mar 12, 2018, 4:19:08 AM3/12/18

to kaldi-help

Thanks Dan. I have the same question.

I have 10000 hours data, after speed up about 30000 hours. I use the configuration in egs/swbd/s5c/local/chain/run_tdnn_7n.sh, the final WER is worse than egs/swbd/s5c/local/chain/run_tdnn_lstm.sh.

I will change some configs you mentioned above and re-train again. And do you have further suggestions for large dataset?

在 2018年3月10日星期六 UTC+8上午6:09:56，Dan Povey写道：

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,

Mar 12, 2018, 9:16:38 PM3/12/18

to kaldi-help

Did you use the same num-gpus as the checked-in script?

It would be interesting to see a comparison of the train an valid objective functions with the different model types, and of the num-parameters.

also grep for "Relative" in one of the later progress.X.log files, and show it.

Dan

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/ff56ccdf-641f-4d16-aaff-9fb9d9b96d38%40googlegroups.com.

Daniel Povey

unread,

Mar 12, 2018, 9:25:44 PM3/12/18

to kaldi-help

Also (to: lian li),

It would be nice to know how much worse it was, than the TDNN+LSTM baseline.

When Gaofeng tried the tdnn_7n setup on Fisher+Swbd, it was better than the baseline TDNN but not quite as good as the TDNN+LSTM setup.

In this PR

https://github.com/kaldi-asr/kaldi/pull/2272

the currently best configuration is 7m25u. (It's better than the checked-in 7n setup).

Something else you could try is the 7m26g setup. That's supposed to be basically the same as 7m25u, but it's intended to be more easily configurable for different sizes of data. (I haven't finished running it so I don't know).

For a significantly larger dataset like yours, it would be worthwhile to reduce the l2, from (for example, from 0.003 to 0.001 for TDNN layers, and from 0.001 to 0.0005 for the output layers).

You would have to recompile after merging that branch.

Dan

Message has been deleted

lian li

unread,

Mar 12, 2018, 9:50:29 PM3/12/18

to kaldi-help

Thanks for your kind reply, Dan.

In my testset (about 3000 utts), the TDNN+LSTM baseline WER is 6.55%, the tdnn_7n WER is 6.96%.

After I re-train the model as you mentioned below, I will try 7m25u.

opts="l2-regularize=0.002"
linear_opts="orthonormal-constraint=1.0"
output_opts="l2-regularize=0.0005 bottleneck-dim=256"
to:

opts="l2-regularize=0.0005"
linear_opts="l2-regularize=0.0005 orthonormal-constraint=-1.0"
output_opts="l2-regularize=0.00025 bottleneck-dim=320"

在 2018年3月13日星期二 UTC+8上午9:25:44，Dan Povey写道：

Reply all

Reply to author

Forward