How to enable batched inference in tesseract-ocr

48 views

Skip to first unread message

Vishnu Madhu

unread,

Apr 27, 2023, 4:59:59 AM4/27/23

to tesseract-ocr

I am exploring ways to improve the performance(single-thread / multi-thread) of tesseract-ocr inference. Performance profiling tess-ocr inference (CLI) using Intel Vtune showed very little usage of the CPU's vector registers (AVX-2 and 512).

The default inference using tesseract works on a single image (batch_size=1). I think batched inference would drive better utilization of the vector registers and thus improve the inference throughput of tess-ocr.

Is there a way to enable batched inference on tess-ocr ?

Thanks in advance

Regards

Vishnu

other details:

- system : Intel Xeon 8380 (Icelake), Ubuntu 22.04(5.15), GCC 11.3

- tesseract : built from source (5.3.0), tessdata

- configure flags : --enable-float32 --disable-opencl --disable-graphics 'CXXFLAGS=-O3 -mavx512f -mfma'

other performance observations:

- considerable openmp pause times observed for the multi threaded runs

- LSTM --> Tanh lookup times are considerably high (~ 50ms per lookup)

Reply all

Reply to author

Forward

0 new messages