Hardware optimisation

115 views
Skip to first unread message

adamuk73

unread,
Apr 15, 2020, 3:13:03 PM4/15/20
to tesseract-ocr
I'm interested to know how hardware-sensitive Tesseract 4 is to hardware. For example is double number of cores going to double to processing speed? Or dual cpu set ups?

Does Tesseract mostly run in RAM or does it use much hdd?

Any guidance would be appreciated
. I'm looking to run an instance of Tesseract on a x86 box running Centos.

Thanks in advance.

adamuk73

unread,
Apr 16, 2020, 5:07:15 AM4/16/20
to tesseract-ocr
Looks as though Tesseract only uses 4 cores maximum but can be set to run fewer via setting OMP_THREAD_LIMIT

Tuan Ardouin

unread,
Apr 16, 2020, 6:07:36 AM4/16/20
to tesseract-ocr
I'm also interested in this question. Have you read this issue on the tesseract repository ?
https://github.com/tesseract-ocr/tesseract/issues/263

The 2 main things I can read on different issues are custom kernel configuration and single threading.

An example of custom kernel configuration : https://make-linux-fast-again.com/)
As for the single thread, the tesseract executable can use multithreading to speed up the OCR processing of a single page. The gain is not really large, it costs excessive CPU overhead, and so the suggested solution is to disable that, either at compile time (--disable-openmp) or at run time (OMP_THREAD_LIMIT=1). You then have to run as much tesseract workers as your core total number.

That's my understanding but an expert point of view on the matter would be greatly appreciated.

adamuk73

unread,
Apr 16, 2020, 2:51:40 PM4/16/20
to tesseract-ocr
That's really useful, thanks!
Reply all
Reply to author
Forward
0 new messages