Optimize tesseract

Sergey

unread,

Jun 26, 2020, 4:04:05 AM6/26/20

to tesseract-ocr

We are creating service for text detection from document with tesseract-ocr on python.
The document splits on 4 parts , each of which is processing in parallel with multiprocessing.Process.
Each part of document contains 4-6 textfileds. We extract this fields and send to python tesseract 4.0.

We have very huge CPU loading when we are processing 2 and more documents in one time. Can you tell us how optimize tesseract work.

juanjo....@letsrebold.com

unread,

Jun 26, 2020, 5:48:58 AM6/26/20

to tesseract-ocr

You can activate the system environment variable: OMP_THREAD_LIMIT=1

Сергей Кузнецов

unread,

Jun 26, 2020, 5:58:44 AM6/26/20

to tesseract-ocr

Yes, i already set this environment variable

пятница, 26 июня 2020 г. в 12:48:58 UTC+3, juanjo....@letsrebold.com:

Zdenko Podobny

unread,

Jun 26, 2020, 9:15:39 AM6/26/20

to tesser...@googlegroups.com

There is no magic command/parameter that solves issues like this.

And you did not provide enough information (e.g. what it is "python tesseract 4.0") to analyze whether you follow best practices.

If you are really interested in help, you have to provide more information (e.g. HW &OS specification, tesseract version, language model, example images with example code, your speed/CPU usage measurements, how did you measure it etc.)

Zdenko

pi 26. 6. 2020 o 10:03 Сергей Кузнецов <ages...@gmail.com> napísal(a):

We are creating service for text detection from document with tesseract-ocr on python.

The document splits on 4 parts, each of which is processing in parallel with multiprocessing.Process. Each part of document contains 4-6 textfileds.

We extract this fields and send to python tesseract 4.0. We have very huge CPU loading when we are processing 2 and more documents in one time. Can you tell us how optimize tesseract work.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3205697f-b78a-496c-8846-1b5542fc92feo%40googlegroups.com.

Reply all

Reply to author

Forward