From some old discussions (
link1,
link2) I got the idea that 'weight_decay' parameter is the regularization parameter for L2 loss over the weights. For example, in the
cifar10 solver, the weight_decay value is 0.004. Does it mean the loss to be minimized is is "cross-entropy + 0.004*sum_of_L2_Norm_of_all_weights"? Is it, by any chance, "cross-entropy + 0.004/2*sum_of_L2_Norm_of_all_weights"?