Out for Memory for testing in python interface

40 views
Skip to first unread message

lu.zh...@gmail.com

unread,
Apr 17, 2018, 11:12:17 AM4/17/18
to Caffe Users

Issue summary

I have constructed a network by myself and use it for training facial landmarks. I can successfully train the network with my own train.prototxt and solver.prototxt. But something strange happen in my testing phase. As I want to get the output of the testing images, so I write the documentations for testing images in python (like the examples: 00-classification.ipynb). When I load the caffemodel and network to test, the python displays the error: Out Of Memory!

Steps to reproduce

  1. train successfully with prototxt files
  2. Out of Memory within the same network in python for testing images

System configuration

  • Operating system: Ubuntu 16.04.4 LTS
  • CUDA version (if applicable): cuda 8
  • Python version (if using pycaffe): pycharm 2018.1.1 (python 2.7)
I write the testing documentations just like the example officially provided:
from include import *
from caffe import *
import pylab
import os

# set display defaults
plt.rcParams['figure.figsize'] = (5, 5)        # large images
plt.rcParams['image.interpolation'] = 'nearest'  # don't interpolate: show square pixels
plt.rcParams['image.cmap'] = 'gray'  # use grayscale output rather than a (potentially misleading) color heatmap

root = '/home/ga47kes/master/300w/data/python/'
test_net = root+'deploy.prototxt'
caffe_model = root+'deploy.caffemodel'
net = caffe.Net(test_net, caffe_model, caffe.TEST)

### import test images
data_dir = '/home/ga47kes/master/300w/data/image_path/test/'

transformer=caffe.io.Transformer({'data':net.blobs['data'].data.shape})
transformer.set_transpose('data',(2,0,1))


im = caffe.io.load_image('/home/ga47kes/master/300w/data/300wData/IBUG_(300-W)/rot0/96x96/img/image_0001.png')
transformed_image = transformer.preprocess('data', im)
transformed_image = transformed_image[0,:,:]
net.blobs['data'].data[...] = transformed_image

caffe.set_mode_gpu()
output = net.forward()
print("finish testing")
deploy.prototxt
test.caffemodel

Przemek D

unread,
Apr 18, 2018, 2:54:34 AM4/18/18
to Caffe Users
This is a little weird because this network uses less than 500 MB RAM on my machine. What GPU are you using?
The only obvious mistake I see in your code is that you run set_mode_gpu() after instantiating the net. This is wrong: you should set device and mode before constructing any Net object (this includes Solvers and Classifiers too). To be sure, run set_device and set_mode_gpu right after importing caffe module (also, it's better to include caffe rather than from caffe import *).
Message has been deleted

lu.zh...@gmail.com

unread,
Apr 18, 2018, 6:22:07 AM4/18/18
to Caffe Users
Hi, 
everyone! I also upload an image for testing. Can you guys help me to try, whether the testing in python go out of Memory with your computers?

在 2018年4月17日星期二 UTC+2下午5:12:17,lu.zh...@gmail.com写道:
image_0001.png

Przemek D

unread,
Apr 18, 2018, 6:24:55 AM4/18/18
to Caffe Users
1. Can you run caffe test with this network? What's the output of nvidia-smi while you test?
2. What's the output of nvidia-smi during training?
3. Did you follow my advice regarding set_mode_gpu and how did that change things?

Xun Victor

unread,
Apr 19, 2018, 7:35:43 AM4/19/18
to Caffe Users
Do you have a different batch size between training and testing?

lu.zh...@gmail.com

unread,
Apr 19, 2018, 11:24:16 AM4/19/18
to Caffe Users
Hello,
at first I took different sizes. Now I let both of them to be 1, but still out of Memory .

在 2018年4月19日星期四 UTC+2下午1:35:43,Xun Victor写道:
Message has been deleted

lu.zh...@gmail.com

unread,
Apr 19, 2018, 11:39:08 AM4/19/18
to Caffe Users
Hi, thanks a lot for your help.
1. I can test the network successfully through the command line. The the nvidia-smi showed the memeory occupation 500M when the batchsize reaches 10 for testing.
2 I can train only with batch size 1, the nvidia-smi shows correctly.
3.I have change the oder of the set_mode_gpu. For include caffe, it showed invalid syntax, so I still use import caffe.

I was wondering, it there any settings to let the output stored in binary (not ASCII)?

在 2018年4月18日星期三 UTC+2下午12:24:55,Przemek D写道:

Przemek D

unread,
Apr 23, 2018, 4:45:51 AM4/23/18
to Caffe Users
I don't think I understand you right... your test net only takes 500MB with batch 10, but you can only train with batch 1? Why is that? What GPU do you have?
Regarding point 3, I meant that you should import caffe first, then call caffe.set_device() (if needed), then caffe.set_mode_gpu() - and only after that you construct a Net and call forward.

lu.zh...@gmail.com

unread,
Apr 23, 2018, 9:04:13 AM4/23/18
to Caffe Users
Hi, thanks for your kindness!
Sorry that I have a typo error . I validate also with the same batch 1. But when I used the trained model for testing images in python interface, it shows out of MEMORY.
 So I was wondering, it there any settings to let the output stored in binary (not ASCII)?

在 2018年4月23日星期一 UTC+2上午10:45:51,Przemek D写道:

Xun Victor

unread,
Apr 23, 2018, 9:41:08 AM4/23/18
to Caffe Users
You can save the caffemodel in binary thanks to the hdf5 protocol.
In order to do it you have to use net.save_hdf5(str(filename)) instead of net.save(str(filename)) (this is if you use a pyhon wrapper when training).
If you train your model from the command line directly, you can use snapshot_format: HDF5 in solver.prototxt.

However I don't understand how you expect this to deal with your gpu memory issue. Try to reduce image size first.
Reply all
Reply to author
Forward
0 new messages