I was trying to learn
SegNet but when I trained ANY model with GPU, I always got the error message "Out of memory":
I0908 06:45:47.229077 3659 net.cpp:247] Network initialization done.
I0908 06:45:47.229081 3659 net.cpp:248] Memory required for data: 1043082268
I0908 06:45:47.229259 3659 solver.cpp:42] Solver scaffolding done.
I0908 06:45:47.229348 3659 solver.cpp:250] Solving VGG_ILSVRC_16_layer
I0908 06:45:47.229353 3659 solver.cpp:251] Learning Rate Policy: step
F0908 06:45:47.533745 3659 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
@ 0x7f8ff8ceadaa (unknown)
@ 0x7f8ff8ceace4 (unknown)
@ 0x7f8ff8cea6e6 (unknown)
@ 0x7f8ff8ced687 (unknown)
@ 0x7f8ff903073a caffe::SyncedMemory::mutable_gpu_data()
@ 0x7f8ff8ff2393 caffe::Blob<>::mutable_gpu_diff()
@ 0x7f8ff912eadd caffe::ConvolutionLayer<>::Backward_gpu()
@ 0x7f8ff9111efc caffe::Net<>::BackwardFromTo()
@ 0x7f8ff9112141 caffe::Net<>::Backward()
@ 0x7f8ff9106f4d caffe::Solver<>::Step()
@ 0x7f8ff910786f caffe::Solver<>::Solve()
@ 0x4086c8 train()
@ 0x406c61 main
@ 0x7f8ff81fcf45 (unknown)
@ 0x40720d (unknown)
@ (nil) (unknown)
Aborted (core dumped)
I tried to reduce the batch size in the prototxt files to 1 but this error still appear. I run
to check the status of the gpu, the output is
Thu Sep 8 06:39:13 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.63 Driver Version: 352.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro M4000M Off | 0000:01:00.0 On | N/A |
| N/A 42C P0 25W / 100W | 229MiB / 4087MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1252 G /usr/bin/X 154MiB |
| 0 2208 G compiz 59MiB |
+-----------------------------------------------------------------------------+
But if I trained with CPU, it doesn't have this problem.
Can anyone help me solve this problem?
Thank you!