how to pinpoint a memory leak in Torch?

217 views
Skip to first unread message

Alexander Weiss

unread,
Jun 10, 2016, 12:06:58 AM6/10/16
to torch7
I'm constantly having GPU memory leaks in Torch and I often just don't know how to track them down.  Right now I'm training an encoder-decoder network on two GPUs -- one is encoding, the other is decoding -- and the decoder is leaking horribly.  I use a lot of print statements with cutorch.getMemoryUsage and I can see the amount of available GPU memory steadily decreasing at each mini-batch iteration until it finally crashes, but of course it is impossible to pinpoint what's not being deallocated.  Is there a tool that can help me understand exactly what allocated memory the garbage collector is failing to pick up?

Alexander Weiss

unread,
Jun 10, 2016, 6:27:34 PM6/10/16
to torch7
I found my memory leak.  This leaves me with two questions:

1)  I'm still curious if there is preferred tool for tracking memory leaks in Torch.

2)  I found the leak, but I don't completely understand it.  The code is posted below (with the fix included).

The problem originated from a sloppy hack I was using to move the stored max-pooling indices from the encoder GPU to the decoder GPU.  What I did was, during construction of the encoder-decoder network, I built two tables called self.pooling_modules and self.unpooling_modules, with elements pointing to all of the SpatialMaxPooling and associated SpatialMaxUnpooling layers in the encoder and decoder, respectively.  After each forward pass through the encoder, I transferred the max-pooling indices to the decoder GPU by cloning the SpatialMaxPooling layers of the encoder into the "pooling" elements of each associated SpatialMaxUnpooling layer in the decoder.  These clones were the culprits my memory leak.  To fix it, I had to explicitly erase all of the Torch tensors in the cloned modules after each use, even though the "pooling" element was overwritten in each training step.  Can anyone provide insight into why the garbage collector skipped over these Tensors?

function encoder_decoder:training_step(encoder_input, decoder_target)
-- forward propagate through the encoder
cutorch.setDevice(self.gpuID.encoder)
local encoder_output = self.encoder:forward(encoder_input)

-- move max-pool indices from encoder GPU to decoder GPU
cutorch.setDevice(self.gpuID.decoder)
for pooling_idx, unpool_mod in ipairs(self.unpooling_modules) do
unpool_mod.pooling = self.pooling_modules[pooling_idx]:clone()
end

-- move compressed input to decoder GPU and forward propagate through the decoder
local decoder_input = encoder_output:clone()
local decoder_output = self.decoder:forward(decoder_input)

-- compute cost and its gradient with respect to output
local decoder_cost = self.criterion:forward( decoder_output , decoder_target )
local decoder_dcost_dout = self.criterion:backward( decoder_output , decoder_target )

-- back propagate through decoder
local decoder_backward_output = self.decoder:backward( decoder_input, decoder_dcost_dout)

-- transfer back propagated output to encoder and continue back propagation
cutorch.setDevice(self.gpuID.encoder)
local encoder_dcost_dout = decoder_backward_output:clone()
self.encoder:backward(encoder_input, encoder_dcost_dout)

-- clean up max-pool data from decoder GPU (Required to avoid memory leak!)
cutorch.setDevice(self.gpuID.decoder)
for _, unpool_mod in ipairs(self.unpooling_modules) do
unpool_mod.pooling:empty()
end
cutorch.setDevice(self.gpuID.encoder)

return decoder_output
end
Reply all
Reply to author
Forward
0 new messages