The idea is that if you are training with, say, 4 jobs, then when you average over the 4 jobs the contribution of any given minibatch is diluted by a factor of 4, so for the training to make the same rate of progress (per minibatch seen, not per time), you need to multiply the learning rate by 4. Setting it that way makes the training a bit less sensitive to the num-jobs (i.e. it's intended that you won't have to re-tune the learning rate when you change the num-jobs). If the learning rate is high enough that instability might be an issue, you will tend to hit the "max-change" constraints. That, of course, will make you train slower than you otherwise would, which is why when you use more jobs, you tend to have to increase the number of epochs a bit.