Intuition behind num-jobs-initial and num-jobs-final

779 views
Skip to first unread message

Truong Do

unread,
Feb 19, 2018, 12:26:37 PM2/19/18
to kaldi-help
Hi,

What is the intuition of using num-jobs-initial and num-jobs-final? why not just use a fixed number of jobs?


Daniel Povey

unread,
Feb 19, 2018, 2:02:12 PM2/19/18
to kaldi-help
This gets quite complicated due to the interactions between learning-rates, max-change values and so on.  Suppose you are dominated by the max-change so the effective learning rates in the individual jobs is independent of the num-jobs.  In this case, the more jobs you have, the less 'noisy' the parameter values are.  Early in training, you are already far from the optimum so a little extra noise makes less difference.
Now, suppose there were no max-change.  The learning rates per job are set to the 'effective learning rate' times the num-jobs.  Early in training, the 'effective' learning rate is high, so if you had too many jobs, the learning rate per job would be too high and you'd get instability.

In general (and this is due to the max-change) it would be fine to set num-jobs-initial to be the same as num-jobs-final if you weren't concerned about wasting GPUs (e.g. if you were not on a queue with other jobs running).  But you might have to increase the num-epochs a bit.

Dan


On Mon, Feb 19, 2018 at 12:26 PM, Truong Do <truon...@gmail.com> wrote:
Hi,

What is the intuition of using num-jobs-initial and num-jobs-final? why not just use a fixed number of jobs?


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/b593ac2b-35fe-4699-94a4-a9f1c5e392a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Truong Do

unread,
Feb 19, 2018, 4:16:24 PM2/19/18
to kaldi-help
Thanks Daniel for your clarification!
> Now, suppose there were no max-change.  The learning rates per job are set to the 'effective learning rate' times the num-jobs.
I understood this part from the code steps/libs/nnet3/train/common.py:get_learning_rate
but I don't quite get the idea of max-change.

>Suppose you are dominated by the max-change so the effective learning rates in the individual jobs is independent of the num-jobs.  In this case, the more jobs you have, the less 'noisy' the parameter values are.
Can you explain a bit more about the max-change, when will it dominate and when there is no max-change?

On Tuesday, February 20, 2018 at 4:02:12 AM UTC+9, Dan Povey wrote:
This gets quite complicated due to the interactions between learning-rates, max-change values and so on.  Suppose you are dominated by the max-change so the effective learning rates in the individual jobs is independent of the num-jobs.  In this case, the more jobs you have, the less 'noisy' the parameter values are.  Early in training, you are already far from the optimum so a little extra noise makes less difference.
Now, suppose there were no max-change.  The learning rates per job are set to the 'effective learning rate' times the num-jobs.  Early in training, the 'effective' learning rate is high, so if you had too many jobs, the learning rate per job would be too high and you'd get instability.

In general (and this is due to the max-change) it would be fine to set num-jobs-initial to be the same as num-jobs-final if you weren't concerned about wasting GPUs (e.g. if you were not on a queue with other jobs running).  But you might have to increase the num-epochs a bit.

Dan

On Mon, Feb 19, 2018 at 12:26 PM, Truong Do <truon...@gmail.com> wrote:
Hi,

What is the intuition of using num-jobs-initial and num-jobs-final? why not just use a fixed number of jobs?


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,
Feb 19, 2018, 8:20:16 PM2/19/18
to kaldi-help
The --max-param-change argument to nnet3-chain-train (usually 2.0) limits the l2-norm change in model parameters on each minibatch.  There are also per-component limits (usually set at the script level to 0.75, again in l2-norm change; or to 1.5 for the final layer).  Look at the code if you're still confused.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages