OpenCL Future Support

Artyom Beilis

unread,

Aug 21, 2021, 6:05:50 PM8/21/21

to Caffe Users

Hello Caffe Users,

One of the main reasons I started using Caffe in the first place was out of box decent OpenCL support.

Unfortunately Caffe isn't developed any more. Another project that provided OpenCL support plaidml+keras was killed by Google once Keras dropped multiple-backends support.

So I started working on dlprimitives: https://github.com/artyom-beilis/dlprimitives

The project with goals

1. Create an OpenCL alternative to cuDNN that provides decent performance

2. Create an inference library with minimal dependencies

3. Provide miro-deep-learning framework as POC

4. Long shot goal: integrate OpenCL backend to existing frameworks like pytorch/tf/mxnet

It is similar in many ways to caffe: static graph, with layers that provide FW/BW functionality but with some major improvements: better memory management optimization, minimal dependencies (cblas+opencl sdk), out of box windows support, JSON formats instead of prototxt and some more.

It is early work in progress but it is already:

1. Outperforms both Caffe OpenCL and Keras OpenCL by significant margins

2. Provides similar performance to caffe+cudnn on some networks like ResNet18 while using much less memory and gives comparable performance to tensorflow.

One of the reasons caffe is a very valuable project is OpenCL support. I think this information may be interesting to Caffe Users.

Artyom Beilis

P.S.: Sorry if you think it isn't appropriate, but current situation with Caffe/OpenCL is one of the reasons I started the project after contributing several fixed to Caffe.

Artyom Beilis

unread,

Aug 29, 2021, 4:34:13 PM8/29/21

to Caffe Users

Small update.

I started integrating dlprim library as alternative to libdnn for OpenCL branch.

So far so good. These are benchmarks, time in ms.. Batch size column bs. Comparing to native

cudnn caffe and full dlprim. Currently as POC I integrated dlprim for conv layer only. I'll continue with

deconv and batch normalization since Caffe's BN implementation has very poor performance

network bs gpu caffe/cudnn caffe/libdnn caffe/dlprim dlprim
alexnet 16 gtx960 49.6 118.697 90.0764 83.942
resnet18 16 gtx960 211.372 410.811 311.831 197.387
vgg 8 gtx960 399.394 1030.95 582.993 574.845

In summary caffe+dlprim gives, 55%, 67% and 68% of performance of cudnn on alexnet, resnet18 and vgg16

and improves caffe/libdnn by 31%, 32% and 77% for these networks.

Branch is there: https://github.com/artyom-beilis/caffe/tree/opencl_dlprim

To be continued.

Artyom

Artyom Beilis

unread,

Sep 22, 2021, 3:39:30 PM9/22/21

to Caffe Users

Additional update

I tested Caffe/OpenCL with dlprimitives [1] convolutions improvements[2] on several

GPUs and compared to several frameworks including dlprimitives microframework itself.

(pytorch, tensorflow2, caffe+cudnn, caffe-opencl and caffe-opencl+dlprim, and keras/plaidml)

All times in ms. for batchsize=16.

Summary

=======

I got this performance boost by using dlprimitives convolution over caffe-opencl.

gpu alexnet resnet18 vgg16
rtx2060s 9% 26% 41%
gtx1080 23% 34% 84%
rx6600xt 16% 11% 54%

I still don't get cuda+cudnn performance but at least in some cases I reach ~75% of performance

gpu alexnet resnet18 vgg16/16
rtx2060s 42% 70% 54%
gtx1080 54% 78% 77%

I also must note that one of the issues with ResNet-like networks that make caffe much

weaker than tf/pytorch and dlprim itself is implementation of batch normalization

in caffe that is split into two independent layers.

Full Benchmarks

================

RTX 2060S alexnet/16 resnet18/16 vgg16/16
pt/cuda 11.078 30.969 148.916
tf/cuda 26.42 55.62 157.13
caffe/cuda 14.36 61.06 232.97
dlprim 35.58 66.30 421.93
caffe/dlprim 33.83 86.79 431.19
caffe/ocl 36.71 109.21 606.29
keras/plaidm 69.1 199.53 911.29

GTX 1080 alexnet/16 resnet18/16 vgg16/16
pt/cuda 15.763 38.359 125.902
tf/cuda 33.38 69.52 196.94
caffe/cuda 17.39 79.97 274.12
dlprim 30.65 69.32 353.19
caffe/dlprim 32.24 102.54 358.25
caffe/ocl 39.58 137.12 657.81
keras/plaidm 89.57 229.14 972.82

RT 6600XT alexnet/16 resnet18/16 vgg16/16
dlprim 26.760 59.881 295.190
caffe/dlprim 26.2379 85.8798 302.403
caffe/ocl 30.3641 95.5228 465.64
keras/plaidm 183.23 429.874 3612.744

Best Regards,

Artyom Beilis

[1] https://github.com/artyom-beilis/dlprimitives

[2] https://github.com/artyom-beilis/caffe/tree/opencl_dlprim

Reply all

Reply to author

Forward