Indications of Caffe memory leaks

Neil Nelson

unread,

Mar 22, 2016, 11:24:59 AM3/22/16

to Caffe Users

This study uses the zip of caffe-master from https://github.com/BVLC/caffe. INSTALL.md in the zip file has a date-time of Mar 9 13:42.

The runs use the tutorial mnist training procedure with all files on a RAM drive. RAM drive memory usage was constant throughout the runs.

The usual location of the Caffe executable is at caffe-master/.build_release/tools/caffe. After noting the memory issue I recompiled Caffe with

DEBUG ?= 1

set in the Makefile and used the Caffe executable in caffe-master/build/tools/caffe. This does not appear to affect the results.

solver.prototxt file contains

--------
# The train/test solver.prototxt net protocol buffer definition
# net: "/voxforge/mnist/train_val.prototxt"

train_net: "/voxforge/caffe/train.prototxt"
test_net: "/voxforge/caffe/test.prototxt"

# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 25 training iterations.
test_interval: 25
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 10 iterations
display: 10
# The maximum number of iterations
max_iter: 200
# snapshot intermediate results
# snapshot: 5000
# snapshot_prefix: "/voxforge/mnist/lenet"
# solver_mode: CPU or GPU
--------

The first picture shows computer memory use over 30 consecutive runs of the Caffe mnist procedure. The blue dots, giving a stair step appearance, are from a program that records the computer's memory use every two seconds using 'free' on Ubuntu 14.04 concurrently with the 30 Caffe runs.

The red line through the blue steps is a regression line. The vertical green lines show when each Caffe run stops and another starts. The regression line shows that just over 54 megabytes of memory are consumed and not released on each Caffe run.

The second picture is a close-up of a short area from the first picture. This shows that memory is consumed at the beginning of a Caffe run and then holds constant for the remaining of that run.

valgrind fails just after train.prototxt is loaded. Caffe does not fail without using valgrind.

valgrind --leak-check=yes /home/nnelson/Documents/caffe-master/build/tools/caffe train --solver=/voxforge/runs/test/solver.prototxt

layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
I0321 21:41:54.168203 2866 layer_factory.hpp:77] Creating layer data
I0321 21:41:54.430069 2866 net.cpp:91] Creating Layer data
I0321 21:41:54.441284 2866 net.cpp:399] data -> data
I0321 21:41:54.532873 2866 net.cpp:399] data -> label
F0321 21:41:54.643106 2876 db_lmdb.hpp:14] Check failed: mdb_status == 0 (22 vs. 0) Invalid argument

/home/nnelson/Documents/caffe-master/build/tools/caffe train --solver=/voxforge/runs/test/solver.prototxt

layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
I0321 21:42:47.929996 2893 layer_factory.hpp:77] Creating layer data
I0321 21:42:47.931532 2893 net.cpp:91] Creating Layer data
I0321 21:42:47.931649 2893 net.cpp:399] data -> data
I0321 21:42:47.931855 2893 net.cpp:399] data -> label
I0321 21:42:47.932879 2899 db_lmdb.cpp:38] Opened lmdb /voxforge/caffe/mnist_train_lmdb
I0321 21:42:47.979403 2893 data_layer.cpp:41] output data size: 64,1,28,28

Caffe appears to have a 54 megabyte memory leak for a single mnist run and valgrind, the usual tool to get at memory leaks, is not working for Caffe. Any ideas on how to get at the apparent memory leaks?

caffe_memory_used.pdf

close_up.pdf

Howard Chu

unread,

Mar 23, 2016, 11:24:17 AM3/23/16

to Caffe Users

Your study methodology is fundamentally flawed. If you only tally up the used / free columns in free's output then you're ignoring the most critical column - "shared". LMDB uses a shared memory map for its data and it will use as much RAM as is available in the system for its storage. That is not a leak though - the OS will automatically reclaim the memory when there is memory pressure from other processes.

Using the free command doesn't give you any useful data on the caffe process itself. You could use top and compare VIRT, RES, and SHR - if the majority of the memory use you think you're seeing is in the SHR (shared memory) column then that's just the database storage and there is no leak.

Neil Nelson

unread,

Mar 24, 2016, 1:06:15 AM3/24/16

to Caffe Users

Howard Chu,

Thank you for your response.

Here is the output to free.

             total       used       free     shared    buffers     cached
Mem:      16421424    4541252   11880172      91916     139572    1263480
-/+ buffers/cache:    3138200   13283224
Swap:     32767996          0   32767996

Top will show shared memory per process. Perhaps we might agree that the figure under 'shared' above includes all shared memory. In that case I have rerun the procedure and in the attached mem_usage2.csv have subtracted from 'used': shared, buffers, cached, the ram disk usage, and memory used by the program that assembles the
mem_usage2.csv,

caffe_memory_used2.pdf gives a result much the same.

Perhaps there is a way to run valgrind or similar to verify that Caffe is not leaking memory.

I can provide all the procedures here to obtain this result if anyone wants to try it.

caffe_memory_used2.pdf

mem_usage2.csv

Howard Chu

unread,

Mar 24, 2016, 7:28:04 AM3/24/16

to Caffe Users

On Thursday, March 24, 2016 at 5:06:15 AM UTC, Neil Nelson wrote:

Perhaps there is a way to run valgrind or similar to verify that Caffe is not leaking memory.

My memory leak tracer is multiple orders of magnitude faster than valgrind (or any other tracer, for that matter).
https://github.com/hyc/mleak

Valgrind has a hardcoded size limit on how large a memory map it will support. You can patch these hardcoded limits and recompile, if you really want to use it.
http://valgrind.10908.n7.nabble.com/Cannot-create-large-memory-map-on-64-bit-linux-td43287.html

Message has been deleted

Neil Nelson

unread,

Apr 19, 2016, 10:39:56 PM4/19/16

to Caffe Users

valgrind will complete if leveldb is used instead of lmdb.

Compiled with GPU and run with GPU
==4315== LEAK SUMMARY:
==4315==    definitely lost: 96 bytes in 4 blocks
==4315==    indirectly lost: 0 bytes in 0 blocks
==4315==      possibly lost: 70,778,113 bytes in 20,095 blocks
==4315==    still reachable: 114,126,748 bytes in 153,749 blocks
==4315==         suppressed: 0 bytes in 0 blocks

Compiled with GPU and run as CPU
==5232== LEAK SUMMARY:
==5232==    definitely lost: 96 bytes in 4 blocks
==5232==    indirectly lost: 0 bytes in 0 blocks
==5232==      possibly lost: 70,775,503 bytes in 20,075 blocks
==5232==    still reachable: 114,116,564 bytes in 153,675 blocks
==5232==         suppressed: 0 bytes in 0 blocks

Compiled as CPU only
==12882== LEAK SUMMARY:
==12882==    definitely lost: 0 bytes in 0 blocks
==12882==    indirectly lost: 0 bytes in 0 blocks
==12882==      possibly lost: 86,909 bytes in 1,906 blocks
==12882==    still reachable: 552,984 bytes in 6,332 blocks
==12882==         suppressed: 0 bytes in 0 blocks

This valgrind page http://valgrind.org/docs/manual/faq.html#faq.deflost gives the following for 'possibly lost'.

"possibly lost" means your program is leaking memory, unless you're doing unusual things with pointers that could cause them to point into the middle of an allocated block; see the user manual for some possible causes.

The mnist GPU run was repeated 273 times (just under 100 minutes) at which point the computer was freezing up from lack of memory. 'cat /proc/meminfo' was used to collect memory stats every 10 seconds during the sequence. Charts of the 42 stats in /proc/meminfo are combined on to a single page at https://github.com/neilnelson/misc/blob/master/meminfo.png. (Right click, select View Image, click on the image to expand.)

The interesting charts are
MemFree declines to about zero (175372 bytes) at the 219 run.
Buffers stays the same until MemFree gets to zero and then declines to zero
Cached is similar to Buffers
SwapFree declines (an increase in swap usage) when MemFree gets to zero.

Just before LEAK SUMMARY in the 'compiled with GPU and run as CPU' run instance is the following 'possibly lost' section.

==5232== 22,144,025 bytes in 2,207 blocks are possibly lost in loss record 2,543 of 2,544
==5232==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5232==    by 0x28E90F1D: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.352.63)
==5232==    by 0x28E4134C: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.352.63)
==5232==    by 0x28E5211F: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.352.63)
==5232==    by 0x28F3FCCF: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.352.63)
==5232==    by 0x28F3FFDF: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.352.63)
==5232==    by 0xE43FD2C: ??? (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232==    by 0xE432AAF: ??? (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232==    by 0xE43EDB6: ??? (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232==    by 0xE443570: ??? (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232==    by 0xE4371DB: ??? (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232==    by 0xE4256A1: ??? (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232==    by 0xE458C9E: ??? (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232==    by 0xE146C11: cudnnCreate (in /usr/local/cuda-7.0/targets/x86_64-linux/lib/libcudnn.so.4)
==5232==    by 0x51CE276: caffe::CuDNNConvolutionLayer<float>::LayerSetUp(std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&, std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&) (cudnn_conv_layer.cpp:53)
==5232==    by 0x518EC4B: caffe::Layer<float>::SetUp(std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&, std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&) (layer.hpp:71)
==5232==    by 0x51946E0: caffe::Net<float>::Init(caffe::NetParameter const&) (net.cpp:139)
==5232==    by 0x5192A76: caffe::Net<float>::Net(caffe::NetParameter const&, caffe::Net<float> const*) (net.cpp:27)
==5232==    by 0x516BB72: caffe::Solver<float>::InitTrainNet() (solver.cpp:105)
==5232==    by 0x516B395: caffe::Solver<float>::Init(caffe::SolverParameter const&) (solver.cpp:57)

This seems a little odd in that the 22,144,025 possibly lost bytes is related to Cuda libs out of caffe::CuDNNConvolutionLayer. That is, if the GPU is not being used in a CPU run, Cuda libs would not expected to be used as shown.

The 'possibly lost' figure of the 'Compiled as CPU only' LEAK SUMMARY is insignificant when compared to the figures from the Cuda compiled code.

The memory decline over the first 192 Cuda mnist runs averages to 59.8 megabytes per run. valgrind shows 67.5 megabytes 'possibly lost' for a single run.

Jan

unread,

Apr 21, 2016, 3:43:16 AM4/21/16

to Caffe Users

Mhm, I suppose that is something to discuss on github, because it is related to development, not really usage of caffe. Personally I am not really well versed in finding memory leaks, I will not be much help in the discussion here.

Jan

Reply all

Reply to author

Forward