valgrind will complete if leveldb is used instead of lmdb.
Compiled with GPU and run with GPU
==4315== LEAK SUMMARY:
==4315== definitely lost: 96 bytes in 4 blocks
==4315== indirectly lost: 0 bytes in 0 blocks
==4315== possibly lost: 70,778,113 bytes in 20,095 blocks
==4315== still reachable: 114,126,748 bytes in 153,749 blocks
==4315== suppressed: 0 bytes in 0 blocks
Compiled with GPU and run as CPU
==5232== LEAK SUMMARY:
==5232== definitely lost: 96 bytes in 4 blocks
==5232== indirectly lost: 0 bytes in 0 blocks
==5232== possibly lost: 70,775,503 bytes in 20,075 blocks
==5232== still reachable: 114,116,564 bytes in 153,675 blocks
==5232== suppressed: 0 bytes in 0 blocks
Compiled as CPU only
==12882== LEAK SUMMARY:
==12882== definitely lost: 0 bytes in 0 blocks
==12882== indirectly lost: 0 bytes in 0 blocks
==12882== possibly lost: 86,909 bytes in 1,906 blocks
==12882== still reachable: 552,984 bytes in 6,332 blocks
==12882== suppressed: 0 bytes in 0 blocks
This valgrind page
http://valgrind.org/docs/manual/faq.html#faq.deflost gives the following for 'possibly lost'.
"possibly
lost" means your program is leaking memory, unless you're doing unusual
things with pointers that could cause them to point into the middle of
an allocated block; see the user manual for some possible causes.
The
mnist GPU run was repeated 273 times (just under 100 minutes) at which
point the computer was freezing up from lack of memory. 'cat
/proc/meminfo' was used to collect memory stats every 10 seconds during
the sequence. Charts of the 42 stats in /proc/meminfo are combined on to a
single page at
https://github.com/neilnelson/misc/blob/master/meminfo.png. (Right
click, select View Image, click on the image to expand.)
The interesting charts are
MemFree declines to about zero (175372 bytes) at the 219 run.
Buffers stays the same until MemFree gets to zero and then declines to zero
Cached is similar to Buffers
SwapFree declines (an increase in swap usage) when MemFree gets to zero.
Just before LEAK SUMMARY in the 'compiled with GPU and run as CPU' run instance is the following 'possibly lost' section.
==5232== 22,144,025 bytes in 2,207 blocks are possibly lost in loss record 2,543 of 2,544
==5232== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5232== by 0x28E90F1D: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.352.63)
==5232== by 0x28E4134C: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.352.63)
==5232== by 0x28E5211F: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.352.63)
==5232== by 0x28F3FCCF: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.352.63)
==5232== by 0x28F3FFDF: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.352.63)
==5232== by 0xE43FD2C: ??? (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232== by 0xE432AAF: ??? (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232== by 0xE43EDB6: ??? (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232== by 0xE443570: ??? (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232== by 0xE4371DB: ??? (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232== by 0xE4256A1: ??? (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232== by 0xE458C9E: ??? (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232== by 0xE146C11: cudnnCreate (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232==
by 0x51CE276:
caffe::CuDNNConvolutionLayer<float>::LayerSetUp(std::vector<caffe::Blob<float>*,
std::allocator<caffe::Blob<float>*> > const&,
std::vector<caffe::Blob<float>*,
std::allocator<caffe::Blob<float>*> > const&)
(cudnn_conv_layer.cpp:53)
==5232== by 0x518EC4B:
caffe::Layer<float>::SetUp(std::vector<caffe::Blob<float>*,
std::allocator<caffe::Blob<float>*> > const&,
std::vector<caffe::Blob<float>*,
std::allocator<caffe::Blob<float>*> > const&)
(layer.hpp:71)
==5232== by 0x51946E0: caffe::Net<float>::Init(caffe::NetParameter const&) (net.cpp:139)
==5232==
by 0x5192A76: caffe::Net<float>::Net(caffe::NetParameter
const&, caffe::Net<float> const*) (net.cpp:27)
==5232== by 0x516BB72: caffe::Solver<float>::InitTrainNet() (solver.cpp:105)
==5232== by 0x516B395: caffe::Solver<float>::Init(caffe::SolverParameter const&) (solver.cpp:57)
This
seems a little odd in that the 22,144,025 possibly lost bytes is
related to Cuda libs out of caffe::CuDNNConvolutionLayer. That is, if
the GPU is not being used in a CPU run, Cuda libs would not expected to
be used as shown.
The 'possibly lost' figure of the 'Compiled as
CPU only' LEAK SUMMARY is insignificant when compared to the figures
from the Cuda compiled code.
The memory decline over the first
192 Cuda mnist runs averages to 59.8 megabytes per run. valgrind shows
67.5 megabytes 'possibly lost' for a single run.