The situation was: normally I ran it with gc enabled and I decided to pull the latest master to test the new improvements. I kept gc and enabled cnmem at 0.45 (since you mentioned you might use that as a default). Which gave the cnmem out of memory error. Same happened when I set it at 0.87 (max free memory available on my GPU). When I disabled cnmem it worked again. It never worked without gc so I kept gc on as default. Can't really comment on speed ups at the moment.
Hopefully that helps a bit. Thought I would quickly try it out and since your proposed 0.45 failed for me, thought I should probably mention it. I might try to find some minimal example next week if needed.
No, I have put gc on by default because my recent experiments didn't run without it. I have not tested trying to disable it recently. I should be able to try it later tonight and I will report back.
Hi Jeffrey,If you still see the problem with cnmem and you can share a repro with me, I'm interested (I wrote CNMEM and helped Frédéric integrate it into Theano). I can tweak the internal policy of cnmem to deal with "hard" cases (assuming your case is hard). We also have the freedom to add new strategies and let the user choose the best strategy to claim/reclaim memory.Thanks,Julien
Hi Frédéric,
No, only with allow_gc True and cnmem False (=0) does it work. Hence, part of my concern.
Jeffrey
--
Hi Frédéric,
I already emailed Julien a few weeks ago (soon after I posted here) with a minimal example replicating the issue. Haven't heard back but he might be on holiday. I can put it here as well if you're interested.
Best,
Jeffrey
ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device 0 failed:
initCnmem: cnmemInit call failed! Reason=CNMEM_STATUS_OUT_OF_MEMORY. numdev=1
[lib]
cnmem = 1
Julien have an example that cause this type of behavior. If you can keep a way for you to test it do it. It would be e good to that you test out when Julien update cnmem.
Fred
--
with gc.allow either True or False and cnmem enabled:[lib]
cnmem = 1
p.s. We will enable it by default in 1 or 2 weeks if we don't have report of problems. We aren't sure of the default % of the GPU to allocate. We thought of using 45% by default. What do you think? This would allow 2 jobs by default (it need some memory for the driver).
with gc.allow either True or False and cnmem enabled:[lib]
cnmem = 1
If you set cnmem to 1, it will ask CNMeM to initially use 100% of GPU memory, which is impossible (even an unused K40c has like 23 MiB already in use). You should either set it to a fraction (such as 0.5) or a number of megabytes (such as 500).
p.s. We will enable it by default in 1 or 2 weeks if we don't have report of problems. We aren't sure of the default % of the GPU to allocate. We thought of using 45% by default. What do you think? This would allow 2 jobs by default (it need some memory for the driver).
Sounds like a plausible default, but maybe you should set an upper limit as well? Users may be confused if even their simplest models already take up close to 6 GiB on their Tesla or Titan X. Maybe use cnmem=1 for a default heuristic: 45%, but at most 2 GiB? (cnmem=1 is not a useful value otherwise, as seen above.)
By the way, similar to Doug, I also observe a significant speedup even if combined with allow_gc=False. Good job!
So we don't know exactly a safe max.
I don't understand how we can get such good speed up with cnmem compared to allow_gc=False. Do you have an idea?