Hi all,
in the end I am talking about
scale[index] in template <typename Dtype>
__global__ void LRNComputeOutput(const int nthreads, const Dtype* const in,
const Dtype* const scale, const Dtype negative_beta, Dtype* const out)
in
lrn_layer.cu ... but the same problem also appears when using cpu.
with custom data I do observe after as little as 4 iterations NaNs which occur during the forward pass in the lrn layer with across_channels normalization. It is definitely the forward pass and definitely the lrn layer, I can code in C++ (for CPUs, no ecxperience with GPU programming) to trace it back.
it happens already in iteration 4 with base_lr = 0.01 and momentum = 0.9
I am doing retraining, so switching to within_channel normalization is no option for me. Also, I am in time pressure, this makes learning GPU programming in 3 hours a bit hard.
I am using a caffe master from 3rd october 2015 . I cannot switch to a newer caffe master because my version of caffe computes extra stuff in the backward pass (LRP, Bach et al, plos one 2015) .
My question is: anybody had similar troubles ?
How did you fix that?
I can check: my meanfile gets used in data_transformer.cpp . so this is not the problem.
k_ in lrn_layer.cpp is set to 1 .
I have found the problem: scale[index] becomes negative on the order of - 10^3 . of course taking a pow with beta = 0.75 yields funny Nans .what is -1000 ^{0.75} :)
My question is: should scale[index] in
lrn_layer.cu become negative or not ?
Best, Alex