GPU delegate with batch_size > 1?

329 views
Skip to first unread message

Robert Arnesson

unread,
Jul 4, 2020, 7:07:38 AM7/4/20
to TensorFlow Lite
Hello!

For models with a batch size of > 1, how are such models intended to be used with GPU delegate?

In gl_delegate.h there is the option dynamic_batch_enabled, but this header is deprecated(?) and in delegate.h there is no such option?

Testing on Android with Tensorflow 2.2.0, using the C/C++ API directly

Sachin Joglekar

unread,
Aug 10, 2020, 12:34:11 PM8/10/20
to TensorFlow Lite, Robert Arnesson, Raman Sarokin
+Raman to comment on GPU/batch specifics

Dan Parnham

unread,
Mar 25, 2021, 7:23:33 AM3/25/21
to TensorFlow Lite, Sachin Joglekar, Robert Arnesson, Raman Sarokin
What is the status of this?

We're interested in running tflite with the OpenCL backend on x64 hardware. We develop an industrial quality control system of which a small part involves defect classification. We've recently migrated from caffe to tensorflow for training, but have found that deployment of full tensorflow + CUDA involves >3GB of libraries to install on each server and there are concerns that the GPU memory usage may make it difficult for us to run multiple models on the existing hardware. 

We then tested tflite + GPU delegate which only requires ~10MB to install! The memory usage per model is also an order of magnitude smaller. 
Through benchmarking it was discovered that tflite can beat full tensorflow for speed when dealing with one or two samples at a time and since our system is often trying to classify a limited number of samples per frame (at around 8-10 fps) this could work for us.

We believe that performance would be improved further still if the GPU delegate supported batches greater than 1. It would not necessarily even need to be dynamic but perhaps allow a fixed batch size to be configured at the initialisation stage.
Reply all
Reply to author
Forward
0 new messages