Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Out of memory when training with GPU but works fine with CPU.

45 views
Skip to first unread message

Kathy Weiyi Li

unread,
Sep 8, 2016, 4:49:19 PM9/8/16
to Caffe Users
I was trying to learn SegNet but when I trained ANY model with GPU, I always got the error message "Out of memory":
I0908 06:45:47.229077  3659 net.cpp:247] Network initialization done.
I0908
06:45:47.229081  3659 net.cpp:248] Memory required for data: 1043082268
I0908
06:45:47.229259  3659 solver.cpp:42] Solver scaffolding done.
I0908
06:45:47.229348  3659 solver.cpp:250] Solving VGG_ILSVRC_16_layer
I0908
06:45:47.229353  3659 solver.cpp:251] Learning Rate Policy: step
F0908
06:45:47.533745  3659 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
   
@     0x7f8ff8ceadaa  (unknown)
   
@     0x7f8ff8ceace4  (unknown)
   
@     0x7f8ff8cea6e6  (unknown)
   
@     0x7f8ff8ced687  (unknown)
   
@     0x7f8ff903073a  caffe::SyncedMemory::mutable_gpu_data()
   
@     0x7f8ff8ff2393  caffe::Blob<>::mutable_gpu_diff()
   
@     0x7f8ff912eadd  caffe::ConvolutionLayer<>::Backward_gpu()
   
@     0x7f8ff9111efc  caffe::Net<>::BackwardFromTo()
   
@     0x7f8ff9112141  caffe::Net<>::Backward()
   
@     0x7f8ff9106f4d  caffe::Solver<>::Step()
   
@     0x7f8ff910786f  caffe::Solver<>::Solve()
   
@           0x4086c8  train()
   
@           0x406c61  main
   
@     0x7f8ff81fcf45  (unknown)
   
@           0x40720d  (unknown)
   
@              (nil)  (unknown)
Aborted (core dumped)

I tried to reduce the batch size in the prototxt files to 1 but this error still appear. I run
nvidia-smi

to check the status of the gpu, the output is

Thu Sep  8 06:39:13 2016      
+------------------------------------------------------+                      
| NVIDIA-SMI 352.63     Driver Version: 352.63         |                      
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro M4000M       Off  | 0000:01:00.0      On |                  N/A |
| N/A   42C    P0    25W / 100W |    229MiB /  4087MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1252    G   /usr/bin/X                                     154MiB |
|    0      2208    G   compiz                                          59MiB |
+-----------------------------------------------------------------------------+

But if I trained with CPU, it doesn't have this problem.

Can anyone help me solve this problem?

Thank you!



Reply all
Reply to author
Forward
0 new messages