Out of memory when training with GPU but works fine with CPU.

45 views

Skip to first unread message

Kathy Weiyi Li

unread,

Sep 8, 2016, 4:49:19 PM9/8/16

to Caffe Users

I was trying to learn SegNet but when I trained ANY model with GPU, I always got the error message "Out of memory":

I0908 06:45:47.229077  3659 net.cpp:247] Network initialization done.
I0908 06:45:47.229081  3659 net.cpp:248] Memory required for data: 1043082268
I0908 06:45:47.229259  3659 solver.cpp:42] Solver scaffolding done.
I0908 06:45:47.229348  3659 solver.cpp:250] Solving VGG_ILSVRC_16_layer
I0908 06:45:47.229353  3659 solver.cpp:251] Learning Rate Policy: step
F0908 06:45:47.533745  3659 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
    @     0x7f8ff8ceadaa  (unknown)
    @     0x7f8ff8ceace4  (unknown)
    @     0x7f8ff8cea6e6  (unknown)
    @     0x7f8ff8ced687  (unknown)
    @     0x7f8ff903073a  caffe::SyncedMemory::mutable_gpu_data()
    @     0x7f8ff8ff2393  caffe::Blob<>::mutable_gpu_diff()
    @     0x7f8ff912eadd  caffe::ConvolutionLayer<>::Backward_gpu()
    @     0x7f8ff9111efc  caffe::Net<>::BackwardFromTo()
    @     0x7f8ff9112141  caffe::Net<>::Backward()
    @     0x7f8ff9106f4d  caffe::Solver<>::Step()
    @     0x7f8ff910786f  caffe::Solver<>::Solve()
    @           0x4086c8  train()
    @           0x406c61  main
    @     0x7f8ff81fcf45  (unknown)
    @           0x40720d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)

I tried to reduce the batch size in the prototxt files to 1 but this error still appear. I run

nvidia-smi

to check the status of the gpu, the output is

Thu Sep  8 06:39:13 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 352.63     Driver Version: 352.63         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro M4000M       Off  | 0000:01:00.0      On |                  N/A |
| N/A   42C    P0    25W / 100W |    229MiB /  4087MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1252    G   /usr/bin/X                                     154MiB |
|    0      2208    G   compiz                                          59MiB |
+-----------------------------------------------------------------------------+

But if I trained with CPU, it doesn't have this problem.

Can anyone help me solve this problem?

Thank you!

Reply all

Reply to author

Forward

0 new messages