cunn timings
0.01 *
7.5459
0.0938
0.0825
0.0796
0.0792
0.0811
0.0787
0.0789
0.0778
0.0877
[torch.DoubleTensor of size 10]
cudnn times
1.3346
0.0008
0.0008
0.0008
0.0008
0.0008
0.0008
0.0008
0.0008
0.0009
[torch.DoubleTensor of size 10]
My interest is understanding what is going on in that first inference call, and how long the "warmup" period lasts. E.g. if I want to deploy an app with this forward inference should I do at least one "warmup" call and then I'm ok? Or do I need to do one every X seconds? Or is there something deeper going on? It's acting an awful lot like there is some state being loaded onto the GPU during the first run and used for subsequent runs, which is fine, but I just want some confirmation that this is expected behavior and my "warmup" idea will work. Platform is TX1.
require 'nn'
require 'cunn'
require 'cutorch'
require 'cudnn'
net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 6, 5, 5))
net:add(nn.ReLU())
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.SpatialConvolution(6, 16, 5, 5))
net = net:cuda()
input = torch.rand(1,3,1000,1000)
input = input:cuda()
nruns = 10
times = torch.zeros(nruns)
for i=1,nruns do
start = os.clock()
out=net:forward(input)
stoptime = os.clock()
times[i] = stoptime - start
end
print('\n\ncunn timings\n\n')
print(times)
cudnn.fastest = true
cudnn.convert(net, cudnn)
for i=1,nruns do
start = os.clock()
out=net:forward(input)
stoptime = os.clock()
times[i] = stoptime - start
end
print('\n\ncudnn times\n\n')
print(times)