Memory use with in-place layers

27 views
Skip to first unread message

DozyC

unread,
Nov 1, 2017, 3:10:42 PM11/1/17
to Caffe Users
Hello, I'm training a net that has in-place ReLu's. I expect for in-place layers the memory use would not increase.

However this is the kind of thing I observe on net init:


I1031 14:26:32.659144 28921 layer_factory.hpp:77] Creating layer conv2
I1031 14:26:32.659160 28921 net.cpp:106] Creating Layer conv2
I1031 14:26:32.659164 28921 net.cpp:454] conv2 <- conv1
I1031 14:26:32.659171 28921 net.cpp:411] conv2 -> conv2
I1031 14:26:32.661525 28921 net.cpp:150] Setting up conv2
I1031 14:26:32.661543 28921 net.cpp:157] Top shape: 4 32 136 241 (4195328)
I1031 14:26:32.661546 28921 net.cpp:165] Memory required for data: 1244256320
I1031 14:26:32.661554 28921 layer_factory.hpp:77] Creating layer conv2/relu
I1031 14:26:32.661563 28921 net.cpp:106] Creating Layer conv2/relu
I1031 14:26:32.661567 28921 net.cpp:454] conv2/relu <- conv2
I1031 14:26:32.661572 28921 net.cpp:397] conv2/relu -> conv2 (in-place)
I1031 14:26:32.661732 28921 net.cpp:150] Setting up conv2/relu
I1031 14:26:32.661741 28921 net.cpp:157] Top shape: 4 32 136 241 (4195328)
I1031 14:26:32.661743 28921 net.cpp:165] Memory required for data: 1261037632

Why does the memory required for data increase after an in-place ReLu layer? shouldn't conv2/relu use the same blob as conv2, so no more memory needs to be allocated?


Thanks.

Przemek D

unread,
Nov 2, 2017, 4:06:00 AM11/2/17
to Caffe Users
It looks like this is because the code that calculates memory used does not care whether the layer operates in-place or produces a new blob. Fortunately, the number you see is not an actual amount of allocated memory, but some estimation of it (as it doesn't take into account all the overhead of the object itself - barely the data blob), which seems to be faulty.

DozyC

unread,
Nov 2, 2017, 1:39:22 PM11/2/17
to Caffe Users
Too bad that code is just a simple estimation.

Is there a better way to analyze memory use by a net during training? I'm having trouble getting the batch sizes I would expect to fit on my GPU mem. I have GTX 1080 Ti with 11 GB ram, and i can only do batch size 1, and larger and I get cudaSuccess (2 vs. 0)  out of memory. I've trained other similar nets before with batch size 12.
Reply all
Reply to author
Forward
0 new messages