Hello everyone, I am trying to experimentally show that two nets A and B have the same loss and gradients during training given the same sequence of training samples. This can be shown in Theory but I also need to show empirically. I trained the two nets A and B on CPU with same RNG seed; the loss of the two nets were different to the order of 10^(-13) for each training iteration. This can be explained as numerical error.
However, when I trained them on GPU, the difference is 'huge' (to the order of 10^(-1)); this exceeds the possibility of numerical error. I suspect that the training sequence is not guaranteed the same on GPU since it may run on different GPU devices. I tried to synchronise and set same RNG seed for all GPU devices by cutorch.synchronizeAll() and cutorch.manualSeedAll(seed), but this did not work.
Do you think my suspect is right? If my suspect is right, do you have any suggestions on how to solve the problem of fixing the training sequence?
Thank you!