Inconsistent output by loaded caffemodel

Krishna Kumar singh

unread,

Oct 5, 2015, 4:17:28 PM10/5/15

to torch7

Hi,

I am trying to load standard Caffenet model and do forward pass. But if I do the forward pass twice for the same input, I am getting different outputs. Also parameters are not changing between the two forward pass and I have removed the dropout layers. I have attached the code and output.

Code:-

require 'nn'
require 'cutorch'
require 'inn'
require 'cudnn'
require 'loadcaffe'

-- load caffenet model
model=loadcaffe.load('/home/vision3/Downloads/caffe-master/models/bvlc_reference_caffenet/deploy.prototxt','/home/vision3/Downloads/caffe-master/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel','cudnn');

-- remove dropout layer to avoid randomness
model:remove(22)
model:remove(19)
print(model);


-- declare input
inp=torch.Tensor(3,224,224);


-- transfer data to GPU
model=model:cuda();
inp=inp:cuda();



-- store current network parameters
p_old=model:getParameters():clone();


-- first forward pass
local o1=model:forward(inp):clone():float();

-- store parameters after first pass
p_new=model:getParameters():clone();

-- second forward pass
local o2=model:forward(inp):clone():float();

-- print number of elements not similar in output
print('Number of non-equal output elements: '..torch.sum( torch.ne(o1,o2)));


-- print number of elements not similar in paramaeters
print('Number of non-equal parameter elements: '..torch.sum(torch.ne(p_old,p_new)));

Output:-

Successfully loaded /home/vision3/Downloads/caffe-master/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
MODULE data UNDEFINED
warning: module 'data [type 5]' not found
conv1: 96 3 11 11
conv2: 256 48 5 5
conv3: 384 256 3 3
conv4: 384 192 3 3
conv5: 256 192 3 3
fc6: 1 1 9216 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
nn.Sequential {
  [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> output]
  (1): cudnn.SpatialConvolution(3 -> 96, 11x11, 4,4)
  (2): cudnn.ReLU
  (3): cudnn.SpatialMaxPooling
  (4): inn.SpatialCrossResponseNormalization
  (5): cudnn.SpatialConvolution(96 -> 256, 5x5, 1,1, 2,2)
  (6): cudnn.ReLU
  (7): cudnn.SpatialMaxPooling
  (8): inn.SpatialCrossResponseNormalization
  (9): cudnn.SpatialConvolution(256 -> 384, 3x3, 1,1, 1,1)
  (10): cudnn.ReLU
  (11): cudnn.SpatialConvolution(384 -> 384, 3x3, 1,1, 1,1)
  (12): cudnn.ReLU
  (13): cudnn.SpatialConvolution(384 -> 256, 3x3, 1,1, 1,1)
  (14): cudnn.ReLU
  (15): cudnn.SpatialMaxPooling
  (16): nn.View
  (17): nn.Linear(9216 -> 4096)
  (18): cudnn.ReLU
  (19): nn.Linear(4096 -> 4096)
  (20): cudnn.ReLU
  (21): nn.Linear(4096 -> 1000)
  (22): nn.SoftMax
}
Number of non-equal output elements: 1000    
Number of non-equal parameter elements: 0

I also found that output is consistent till layer 4 and changes from layer 5 (2nd convolution layer). Can someone please explain me this behavior.

Francisco Vitor Suzano Massa

unread,

Oct 5, 2015, 4:56:39 PM10/5/15

to torch7

Could you try initializing your input data, by doing something like inp = torch.Tensor(3,224,224):uniform(), or something like that ?
Maybe SpatialCrossResponseNormalization is having problems with this non initialized data ?

...

Francisco Vitor Suzano Massa

unread,

Oct 5, 2015, 5:08:12 PM10/5/15

to torch7

maybe because of the non-initialized data, you are having nan's after the normalization layer (inf/inf or something like that) ?

Krishna Kumar singh

unread,

Oct 5, 2015, 6:06:43 PM10/5/15

to torch7

Hi Francisco,

Thanks for your suggestion. I tried to initialize the data but it didn't help. I am still getting different outputs.

Sergey Zagoruyko

unread,

Oct 5, 2015, 6:38:14 PM10/5/15

to torch7 on behalf of Krishna Kumar singh

Hi, you should update cudnn bindings, there was a bug in R3 branch in how it handled groups.

--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at http://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

Sadegh Aliakbarian

unread,

Oct 5, 2015, 8:01:34 PM10/5/15

to torch7

Hi,

I think it is because of the drop-out in the network that randomly chooses values for the next layer. Drop out should be used just in the training. if you want to test your model, try model:evaluate().

I hope it works for you.

...

Sadegh Aliakbarian

unread,

Oct 5, 2015, 8:06:16 PM10/5/15

to torch7

I forgot to say that put model:evaluate() after you loading your model (or before model:forward()). The rest of the code should remain unchanged.

Krishna Kumar singh

unread,

Oct 5, 2015, 11:06:34 PM10/5/15

to torch7

Hi,

I have already used model:evaluate(), and it still gives different outputs. Also, output changes from fifth layer (2nd convolution layer) much before the drop out layers. I will try to change cudnn bindings.

Krishna Kumar singh

unread,

Oct 6, 2015, 12:51:56 AM10/6/15

to torch7

Hi,

Updating the cudnn worked, and now I am getting consistent output. Thanks for the help.

Reply all

Reply to author

Forward