Dr Yuret,First of all, thank you for your work on Knet. You have clearly put a lot of work into it and it well developed.
I was hoping you could help me with some of my code. First off, I'm a hobby programmer when it comes to coding, so please bear with me if some of my questions or issues are naive.I work with a lot of 3D medical imaging, and as a result my datasets and models are quite large. I'd previously successfully implemented a 3D version of UNet and Fabian Isensee's modified UNet in Keras on Python. I've been trying to move my code over to Julia since I like the speed and the language far more than Python.While in Keras, I was successful able to load and train a model on my GPU (I have a Tesla K80), however the model and training pushed the limits of GPU memory. I'm training 160 x 192 x 192 sized 3D volumes, so I as you can imaging, I could only train on a batch of 1.I've successfully ported a lot of my code and architecture into Julia and Knet. I can train a single batch on the GPU, however when I try to move on to a second batch, I run into out of memory problems.
My training loop looks like:for batch in train_genx = KnetArray(batch[1])y = KnetArray(batch[2])Knet.progress!(adam(model,[(x, y)]))# ... loss calculations, etcLike I said, my first batch runs fine but then on the second iteration I don't think Knet is freeing memory.
I think MXNet and TF solve this by moving parameters back to the CPU. Since KNet is so fast, I'd be ok with losing some time doing this on each training cycle.
I was hoping you could provide some tips for how to aggressively reclaim memory during my training cycle. I suspect that in order to do this I'd probably have to unpack the code in progress....Thank you for all your help and work so far.Best,Jan Vargas
You can also check:
Hi Dr. Yuret,So I gave all that a try, and it doesn't seem to help.For what it's worth, I can train the same model in Flux without the memory problem.dice_loss(x, y::AbstractArray; smooth::Float32=1.f0) = 1 - (2*sum(y .* x) + smooth) / (sum(y.^2) + sum(x.^2) + smooth)function calc(model, x, y::AbstractArray)ld = @diff dice_loss(Array(model(KnetArray(x))), y)for w in params(model)Knet.update!(w, grad(ld, w))endloss = value(ld)GC.gc()return lossendfunction on_batch(model::UNet3Dv2KNet, generator)for batch in generatorx = batch[1]y = batch[2]@time loss = calc(model, x, y)println(loss)endendmodel = UNet3Dv2KNet()setoptim!(model, opt)for i in 1:epochs@info "Epoch $i of $epochs"on_batch(model, train_gen)endI split out my code into several functions to try and track down what was going on, I think the hang-up is that for some reason after update! memory isn't cleared. I've tried both GC.gc(), Knet.gc(), CuArrays.reclaim() + GC.gc(). Is it possible there is some memory leak within update?Thanks,Jan