use multi threads in tesseract

8,537 views
Skip to first unread message

nick

unread,
May 28, 2018, 3:20:01 AM5/28/18
to tesseract-ocr
hi 

I want to run tesseract on cpu with 20 cores. but tesseract uses a few core when ocr a page !!!
how could i change the setting and force tesseract use all 20 cores ?!

thanks

nick

unread,
May 28, 2018, 3:27:00 AM5/28/18
to tesseract-ocr
I found OMP_THREAD_LIMIT but i don't know to change it to 20 ?!

Jakob Salomonsson

unread,
May 28, 2018, 5:11:22 AM5/28/18
to tesseract-ocr
Hello, 

Im having roughly the same problem, but related to pytesseract (maybe the same answer can be applied to both of them). 
I have tried several things, such as stating OMP_THREAD_LIMIT=4, for example, when calling the pytesseract function or adding "OMP_THREAD_LIMIT 4" in one or several of the config files. 

But still, no changes. Maybe Im just stating or adding in a wrong manner. 


Anyone knows how to help us advance in this? It would be of great help. 

ShreeDevi Kumar

unread,
May 28, 2018, 5:16:00 AM5/28/18
to tesser...@googlegroups.com

Set the maximum number of threads using the environment variable OMP_THREAD_LIMIT.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/02d2b68d-993a-46e3-a362-4a982f4d7de5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ShreeDevi Kumar

unread,
May 28, 2018, 5:19:15 AM5/28/18
to tesser...@googlegroups.com

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Jakob Salomonsson

unread,
May 28, 2018, 5:55:58 AM5/28/18
to tesser...@googlegroups.com
Thank you for your responses ShreeDevi. 

I have read the documentation you are referring to, but still I can't manage to get my head around on how, and where, to set this variable. A concrete example would be highly appreciated. 

Jakob

Den mån 28 maj 2018 kl 11:19 skrev ShreeDevi Kumar <shree...@gmail.com>:

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, May 28, 2018 at 2:45 PM, ShreeDevi Kumar <shree...@gmail.com> wrote:

Set the maximum number of threads using the environment variable OMP_THREAD_LIMIT.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, May 28, 2018 at 2:35 PM, Jakob Salomonsson <jakob.sa...@gmail.com> wrote:
Hello, 

Im having roughly the same problem, but related to pytesseract (maybe the same answer can be applied to both of them). 
I have tried several things, such as stating OMP_THREAD_LIMIT=4, for example, when calling the pytesseract function or adding "OMP_THREAD_LIMIT 4" in one or several of the config files. 

But still, no changes. Maybe Im just stating or adding in a wrong manner. 


Anyone knows how to help us advance in this? It would be of great help. 




Den måndag 28 maj 2018 kl. 09:27:00 UTC+2 skrev nick:
I found OMP_THREAD_LIMIT but i don't know to change it to 20 ?!

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.


--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/HA_q6F1_34E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

nick

unread,
May 28, 2018, 6:11:02 AM5/28/18
to tesseract-ocr
how  and where we could change this variable ? 

Jakob Salomonsson

unread,
May 28, 2018, 6:40:25 AM5/28/18
to tesser...@googlegroups.com
Calling the help function in python through help(pytesseract.pytesseract) yields this result, among others:

DATA
    OMP_NUM_THREADS = 3
    OMP_THREAD_LIMIT = 3
    RGB_MODE = 'RGB'
    __warningregistry__ = {'version': 332, ('unclosed file <_io.BufferedWr...
    numpy_installed = True
    tesseract_cmd = '/anaconda3/envs/Work/bin/tesseract'


Im specifying tesseract_cmd (through:  pytesseract.pytesseract.tesseract_cmd = "/anaconda3/envs/Work/bin/tesseract") and it works as intended. 
But when I try to do the same with OMP_NUM_THREADS or OMP_THREAD_LIMIT (through: pytesseract.pytesseract.OMP_NUM_THREADS = 3 or pytesseract.pytesseract.OMP_THREAD_LIMIT = 3) no multi threading is happening. 



Den mån 28 maj 2018 kl 12:11 skrev nick <wcd...@gmail.com>:
how  and where we could change this variable ? 

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/HA_q6F1_34E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

nick

unread,
May 29, 2018, 1:56:05 AM5/29/18
to tesseract-ocr
hi 
I don't know , which file should change for OMP_NUM_THREADS ? or wihch command should test ?

Jakob Salomonsson

unread,
Jun 8, 2018, 5:09:45 AM6/8/18
to tesser...@googlegroups.com
After speaking to one of the tesseract contributors:

import os

os.environ['OMP_THREAD_LIMIT'] = '2'

Should do the job, as OMP_NUM_THREADS is an environment variable. However, the speed difference is very small. It might be better to process several images in parallell rather than to process one as fast as possible. 





ShreeDevi Kumar

unread,
Jun 12, 2018, 3:35:15 AM6/12/18
to tesser...@googlegroups.com


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

cosmin....@gmail.com

unread,
Jan 25, 2019, 5:56:59 AM1/25/19
to tesseract-ocr
I got the best results by using it in combination with GNU's parallel.
export OMP_THREAD_LIMIT=1
ls -U | parallel 'tesseract -l <lang> ./{} ./../extraction/{}'
Reply all
Reply to author
Forward
0 new messages