Optimize tesseract

224 views
Skip to first unread message

Sergey

unread,
Jun 26, 2020, 4:04:05 AM6/26/20
to tesseract-ocr
We are creating service for text detection from document with tesseract-ocr on python.
The document splits on 4 parts , each of which  is processing in parallel with multiprocessing.Process.
Each part of document contains 4-6 textfileds. We extract this fields and send to python tesseract 4.0.

We have very huge CPU loading when we are processing 2 and more documents in one time. Can you tell us how optimize tesseract work.  

juanjo....@letsrebold.com

unread,
Jun 26, 2020, 5:48:58 AM6/26/20
to tesseract-ocr
You can activate the system environment variable: OMP_THREAD_LIMIT=1

Сергей Кузнецов

unread,
Jun 26, 2020, 5:58:44 AM6/26/20
to tesseract-ocr
Yes, i already set this environment variable  

пятница, 26 июня 2020 г. в 12:48:58 UTC+3, juanjo....@letsrebold.com:

Zdenko Podobny

unread,
Jun 26, 2020, 9:15:39 AM6/26/20
to tesser...@googlegroups.com
There is no magic command/parameter that solves issues like this.
And you did not provide enough information (e.g. what it is "python tesseract 4.0") to analyze whether you follow best practices.

If you are really interested in help, you have to provide more information (e.g. HW &OS specification, tesseract version, language model, example images with example code, your speed/CPU usage measurements, how did you measure it etc.)

Zdenko


pi 26. 6. 2020 o 10:03 Сергей Кузнецов <ages...@gmail.com> napísal(a):
We are creating service for text detection from document with tesseract-ocr on python.
The document splits on 4 parts, each of which is processing in parallel with multiprocessing.Process. Each part of document contains 4-6 textfileds.

We extract this fields and send to python tesseract 4.0. We have very huge CPU loading when we are processing 2 and more documents in one time. Can you tell us how optimize tesseract work.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3205697f-b78a-496c-8846-1b5542fc92feo%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages