I've determined that tensorflow is leaking memory on each iteration of training. If I remove the call to sess.run, then memory usage remains constant; otherwise, `resource.getrusage(resource.RUSAGE_SELF).ru_maxrss` shows that every so often a few MB are leaked. It appears that the amount leaked depends on the size of the graph (larger graph/batch size -> more leaked). The amount leaked does not appear to plateau - if I run for long enough (on the order of a day) I can even exhaust the system memory of 64GB.I am using python 3.5, and I've tried multiple tensorflow versions (0.10, 0.11, 0.12) on different OSes (Ubuntu, CentOS) with both cpu and gpu, and the leak always seems to occur. Are there any ways of debugging this?
--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/1da90762-9d2a-47f2-acdd-90d9744eb673%40tensorflow.org.
There are some suggestions for how to debug a memory leak here:Using tcmalloc can be particularly useful, because many common TensorFlow allocation patterns can lead to heap fragmentation with the standard malloc implementation.If none of these suggestions work, please open a GitHub issue with a minimal program that reproduces the issue, and someone on the team will take a look.Derek.
On Tue, Dec 27, 2016 at 5:11 PM, Vlad Firoiu <vla...@gmail.com> wrote:
I've determined that tensorflow is leaking memory on each iteration of training. If I remove the call to sess.run, then memory usage remains constant; otherwise, `resource.getrusage(resource.RUSAGE_SELF).ru_maxrss` shows that every so often a few MB are leaked. It appears that the amount leaked depends on the size of the graph (larger graph/batch size -> more leaked). The amount leaked does not appear to plateau - if I run for long enough (on the order of a day) I can even exhaust the system memory of 64GB.I am using python 3.5, and I've tried multiple tensorflow versions (0.10, 0.11, 0.12) on different OSes (Ubuntu, CentOS) with both cpu and gpu, and the leak always seems to occur. Are there any ways of debugging this?
--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/CAELqS02FMNbx20fKYNsLQLvDGBzZi1rWafbQ0oQGDk8isx%3DijA%40mail.gmail.com.