Dan Parnham
unread,Mar 25, 2021, 7:23:33 AM3/25/21Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to TensorFlow Lite, Sachin Joglekar, Robert Arnesson, Raman Sarokin
What is the status of this?
We're interested in running tflite with the OpenCL backend on x64 hardware. We develop an industrial quality control system of which a small part involves defect classification. We've recently migrated from caffe to tensorflow for training, but have found that deployment of full tensorflow + CUDA involves >3GB of libraries to install on each server and there are concerns that the GPU memory usage may make it difficult for us to run multiple models on the existing hardware.
We then tested tflite + GPU delegate which only requires ~10MB to install! The memory usage per model is also an order of magnitude smaller.
Through benchmarking it was discovered that tflite can beat full tensorflow for speed when dealing with one or two samples at a time and since our system is often trying to classify a limited number of samples per frame (at around 8-10 fps) this could work for us.
We believe that performance would be improved further still if the GPU delegate supported batches greater than 1. It would not necessarily even need to be dynamic but perhaps allow a fixed batch size to be configured at the initialisation stage.