Hi everyone,
I am running into an issue that I do not know how to solve. I am trying to classify 1 image from the following python code I attached. I am using NVIDIA Caffe GPU docker containers. I have tried all of the following images. Event the official one from NGC, but still the performance for classifying one image is taking more than 1 seconds. The model is an AlexNet model that I trained with NVIDIA DIGITS. When I converted the model into CoreML model, it runs pretty fast even on mobile.
However, when I tried to run on Python, it take at least 1.9 seconds from the bvlc/caffe:gpu container.
And all other containers takes more than 3 seconds or even 5 seconds. I have already specified to turn on GPU mode.
I am wondering if there is a default initial delays when classifying a single image? Is there any way to improve the performance even I am using 3 P100s? And I have also tried running just 1 docker container.
Maybe if there is any thing done wrong with my CUDA configuration at the host?
I am desperately need help. Greatly appreciated.
Here are the specs:
Docker Containers I have tried
nvidia/caffe:latest
yangcha/caffe-gpu-conda:latest
nvcr.io/nvidia/caffe:18.01-py2
bvlc/caffe:gpu
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
nvidia-smi
Wed Feb 28 19:06:44 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:05.0 Off | 0 |
| N/A 25C P0 28W / 250W | 2289MiB / 12198MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:00:06.0 Off | 0 |
| N/A 27C P0 29W / 250W | 2289MiB / 12198MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... Off | 00000000:00:07.0 Off | 0 |
| N/A 24C P0 29W / 250W | 2289MiB / 12198MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
docker version
Client:
Version: 17.12.0-ce
API version: 1.35
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:11:19 2017
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.0-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:09:53 2017
OS/Arch: linux/amd64
Experimental: false
dpkg -l 'nvidia'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============================================================-====================================-====================================-=================================================================================================================================
ii libnvidia-container-tools 1.0.0alpha.3-1 amd64 NVIDIA container runtime library (command-line tools)alpha.3-1 amd64 NVIDIA container runtime library
ii libnvidia-container1:amd64 1.0.0
rc nvidia-384 384.111-0ubuntu0.16.04.1 amd64 NVIDIA binary driver - version 384.111
ii nvidia-390 390.30-0ubuntu1 amd64 NVIDIA binary driver - version 390.30
ii nvidia-390-dev 390.30-0ubuntu1 amd64 NVIDIA binary Xorg driver development files
ii nvidia-container-runtime 1.1.1+docker17.12.0-1 amd64 NVIDIA container runtime
pi nvidia-cuda-dev 7.5.18-0ubuntu1 amd64 NVIDIA CUDA development files
un nvidia-cuda-doc (no description available)
un nvidia-cuda-toolkit (no description available)
un nvidia-current (no description available)
un nvidia-docker (no description available)
ii nvidia-docker2 2.0.2+docker17.12.0-1 all nvidia-docker CLI wrapper
un nvidia-driver-binary (no description available)
un nvidia-legacy-340xx-vdpau-driver (no description available)
un nvidia-libopencl1 (no description available)
un nvidia-libopencl1-384 (no description available)
un nvidia-libopencl1-390 (no description available)
un nvidia-libopencl1-dev (no description available)
ii nvidia-modprobe 390.30-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files
ii nvidia-opencl-dev:amd64 7.5.18-0ubuntu1 amd64 NVIDIA OpenCL development files
un nvidia-opencl-icd (no description available)
rc nvidia-opencl-icd-384 384.111-0ubuntu0.16.04.1 amd64 NVIDIA OpenCL ICD
ii nvidia-opencl-icd-390 390.30-0ubuntu1 amd64 NVIDIA OpenCL ICD
un nvidia-persistenced (no description available)
ii nvidia-prime 0.8.2 amd64 Tools to enable NVIDIA's Prime
ii nvidia-profiler 7.5.18-0ubuntu1 amd64 NVIDIA Profiler for CUDA and OpenCL
ii nvidia-settings 390.30-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver
un nvidia-settings-binary (no description available)
un nvidia-smi (no description available)
un nvidia-vdpau-driver (no description available)
ii nvidia-visual-profiler 7.5.18-0ubuntu1 amd64 NVIDIA Visual Profiler for CUDA and OpenCL
nvidia-container-cli -V
version: 1.0.0
build date: 2018-01-11T00:16+00:00
build revision: 4a618459e8ba522d834bb2b4c665847fae8ce0ad
build compiler: gcc-5 5.4.0 20160609
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
Here is the python the code
`import argparse
import numpy as np
import time
import os
os.environ['GLOG_minloglevel'] = '2'
import caffe
import skimage
import cv2
def classify(caffemodel, deploy_file, image_files,
mean_file=None, labels_file=None, batch_size=None, use_gpu=True):
# caffe.set_mode_gpu() # caffe.set_mode_gpu()
caffe.set_mode_gpu()
caffe.set_device(0)
caffe.set_device(1)
caffe.set_device(2)
net = caffe.Net(deploy_file,caffemodel, caffe.TEST)
meanData = caffe.io.load_image(mean_file, color=True)
# print('mean shape =====>>>>>>>>>>>>', meanData.shape)
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2, 0, 1))
transformer.set_raw_scale('data', 255)
transformer.set_channel_swap('data', (2, 1, 0))
im = caffe.io.load_image(image_files, color=True)
im3 = cv2.imread(image_files)
im2 = skimage.transform.resize(im3, (256, 256))
# print('img shape =====>>>>>>>>>>>>', im3.shape)
dataaa = im2 - meanData
net.blobs['data'].data[...] = transformer.preprocess('data', dataaa)
out = net.forward()
labels = np.loadtxt(labels_file, str, delimiter='\s')
print (labels)
prob1= net.blobs['softmax'].data[0].flatten().argsort()[-1: -6: -1]
print (out['softmax'][-1])
orderProb = out['softmax'][-1].argsort()
print ('prob1: ' , prob1)
# order=prob1.argsort()[0]
highestProb = out['softmax'][-1][prob1][0]
print ('hightestProb: ' , highestProb)
if (highestProb < 0.6):
print ('UNKNOWN')
else:
print (labels[prob1][0])
# im4 = cv2.imshow('image',im2)
# cv2.waitKey()
if name == 'main':
script_start_time = time.time()
parser = argparse.ArgumentParser(description='Classification example - DIGITS')
# Positional arguments
parser.add_argument('caffemodel', help='Path to a .caffemodel')
parser.add_argument('deploy_file', help='Path to the deploy file')
parser.add_argument('image_file', help='Path[s] to an image')
# Optional arguments
parser.add_argument('-m', '--mean', help='Path to a mean jpg (*.jpg)')
parser.add_argument('-l', '--labels', help='Path to a labels file')
parser.add_argument('--batch-size', type=int)
parser.add_argument('--nogpu', action='store_true', help="Don't use the GPU")
args = vars(parser.parse_args())
classify(
args['caffemodel'],
args['deploy_file'],
args['image_file'],
args['mean'],
args['labels'],
args['batch_size'],
not args['nogpu'],
)
print ('Script took %f seconds.' % (time.time() - script_start_time,))`