Re: Memory question on Knet

Deniz Yuret

unread,

Jun 9, 2020, 5:17:34 AM6/9/20

to Jan Vargas, knet-users

Hi Jan,

On Sat, Jun 6, 2020 at 4:17 PM Jan Vargas <j.varga...@gmail.com> wrote:

Dr Yuret,
First of all, thank you for your work on Knet. You have clearly put a lot of work into it and it well developed.

Thank you for your kind words.

I was hoping you could help me with some of my code. First off, I'm a hobby programmer when it comes to coding, so please bear with me if some of my questions or issues are naive.

I work with a lot of 3D medical imaging, and as a result my datasets and models are quite large. I'd previously successfully implemented a 3D version of UNet and Fabian Isensee's modified UNet in Keras on Python. I've been trying to move my code over to Julia since I like the speed and the language far more than Python.

While in Keras, I was successful able to load and train a model on my GPU (I have a Tesla K80), however the model and training pushed the limits of GPU memory. I'm training 160 x 192 x 192 sized 3D volumes, so I as you can imaging, I could only train on a batch of 1.

I've successfully ported a lot of my code and architecture into Julia and Knet. I can train a single batch on the GPU, however when I try to move on to a second batch, I run into out of memory problems.

This is a common problem in Julia, the GPU memory management is not as strict.

My training loop looks like:

for batch in train_gen
x = KnetArray(batch[1])
y = KnetArray(batch[2])

Knet.progress!(adam(model,[(x, y)]))
# ... loss calculations, etc

Like I said, my first batch runs fine but then on the second iteration I don't think Knet is freeing memory.

You can try two things:

(1) By default Knet uses CuArrays for memory management. You can force CuArrays to garbage collect with GC.gc() before the second iteration.

(2) Alternatively you can try the older Knet memory manager which you can activate with `Knet.cuallocator()=false`, which garbage collects with Knet.gc().

Finally you can open a CuArrays issue with a minimal example if all else fails.

I think MXNet and TF solve this by moving parameters back to the CPU. Since KNet is so fast, I'd be ok with losing some time doing this on each training cycle.

I would think the activations take more space than the parameters, no?

I was hoping you could provide some tips for how to aggressively reclaim memory during my training cycle. I suspect that in order to do this I'd probably have to unpack the code in progress....

Thank you for all your help and work so far.

Best,
Jan Vargas

best,

deniz

Iulian-Vasile Cioarca

unread,

Jun 9, 2020, 6:58:27 AM6/9/20

to knet-users

You can also check:

https://github.com/denizyuret/Knet.jl/issues/485

I didn't have enough time to continue investigations, but if you use Knet.cuallocator()=true, you can switch the CuArrays allocator to 'none' (most aggressive in memory reclaim), or 'split'(more balanced). Instructions here (bottom of page):

https://juliagpu.gitlab.io/CUDA.jl/usage/memory/

I'm curious to see if this helps.

BR,

Iulian

Deniz Yuret

unread,

Jun 10, 2020, 7:38:26 AM6/10/20

to Jan Vargas, knet-users

Hi Jan,

I can try to debug the problem if you send me a minimal self-running example.

The only possible memory leak I know of happens when the results computing during '@diff' do not get freed across iterations. This happens for example in RNNs when going from one minibatch to the next and you want to keep the state. In these cases the solution is to detach the values from the rest of the tape using `x=value(x)`. I don't know if something like this may be going on in your case.

best,

deniz

On Wed, Jun 10, 2020 at 3:39 AM Jan Vargas <j.varga...@gmail.com> wrote:

Hi Dr. Yuret,
So I gave all that a try, and it doesn't seem to help.

For what it's worth, I can train the same model in Flux without the memory problem.

dice_loss(x, y::AbstractArray; smooth::Float32=1.f0) = 1 - (2*sum(y .* x) + smooth) / (sum(y.^2) + sum(x.^2) + smooth)

function calc(model, x, y::AbstractArray)
ld = @diff dice_loss(Array(model(KnetArray(x))), y)
for w in params(model)
Knet.update!(w, grad(ld, w))
end
loss = value(ld)
GC.gc()
return loss
end

function on_batch(model::UNet3Dv2KNet, generator)
for batch in generator
x = batch[1]
y = batch[2]
@time loss = calc(model, x, y)
println(loss)
end
end

model = UNet3Dv2KNet()
setoptim!(model, opt)

for i in 1:epochs
@info "Epoch $i of $epochs"

on_batch(model, train_gen)
end

I split out my code into several functions to try and track down what was going on, I think the hang-up is that for some reason after update! memory isn't cleared. I've tried both GC.gc(), Knet.gc(), CuArrays.reclaim() + GC.gc(). Is it possible there is some memory leak within update?

Thanks,
Jan

Reply all

Reply to author

Forward