Hi all,
I just looked at the codes in ParallelCriterion.lua. However, I want to know how the weights in ParallelCriterion influence the convergence of training process, and does the weights need to sum to 1?
I know the weights will influence the Output (weighted sum of all outputs) and GradInput (the normal GradInput * weight). The GradInput is actually smaller than the normally computed GradInput, if weight < 1. The only thing matters to convergence is GradInput, but if the GradInput is smaller, will this be bad for convergence?