I am exploring ways to improve the performance(single-thread / multi-thread) of tesseract-ocr inference. Performance profiling tess-ocr inference (CLI) using
Intel Vtune showed very little usage of the CPU's vector registers (AVX-2 and 512).
The default inference using tesseract works on a single image (batch_size=1). I think batched inference would drive better utilization of the vector registers and thus improve the inference throughput of tess-ocr.
Is there a way to enable batched inference on tess-ocr ?
Thanks in advance
Regards
Vishnu
other details:
- system : Intel Xeon 8380 (Icelake), Ubuntu 22.04(5.15), GCC 11.3
- tesseract : built from source (5.3.0), tessdata
- configure flags : --enable-float32 --disable-opencl --disable-graphics 'CXXFLAGS=-O3 -mavx512f -mfma'
other performance observations:
- considerable openmp pause times observed for the multi threaded runs
- LSTM --> Tanh lookup times are considerably high (~ 50ms per lookup)