OMP_THREAD_LIMIT=1 gives improvement in 4.1 version

230 views
Skip to first unread message

Sarath C P

unread,
Sep 30, 2020, 8:05:31 AM9/30/20
to tesseract-ocr
OpenMP disabled in tesseract-ocr default. but when I am seeting OMP_THREAD_LIMIT it gives performance improvement why?

Zdenko Podobny

unread,
Sep 30, 2020, 8:09:48 AM9/30/20
to tesser...@googlegroups.com
1. OMP_THREAD_LIMIT is an environment variable so it affects "everything" not only tesseract.  
2. How did you measure performance? Provide details including OS, hw etc.  

Zdenko


st 30. 9. 2020 o 14:05 Sarath C P <mailtos...@gmail.com> napísal(a):
OpenMP disabled in tesseract-ocr default. but when I am seeting OMP_THREAD_LIMIT it gives performance improvement why?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/66b3ec4b-a9ce-4bb1-8bd8-e4f858204106n%40googlegroups.com.

Sarath C P

unread,
Oct 1, 2020, 1:53:54 AM10/1/20
to tesser...@googlegroups.com
Hi, 
Please see steps followed.

    OS: LINUX, 4 CPUS , 2 CORES
Version tesseract - 4.1

1. Our python web application running in nginx and gunicorn(3 worker and 1 thread)

2. We tried to stimulate 3 requests in parallel using script.

3. In tesseract - 4.1 (Openmp disabled by default) wthout setting   OMP_THREAD_LIMIT tesseract gave 20 seconds to process 3 request's in avareage. after setting  OMP_THREAD_LIMIT=1 , It processed 3 requests in 10 seconds. 

    Previously when used tesseract-4.0(Openmp enabled by default) was giving the worst performance. Then we thought to disable it by            OMP_THREAD_LIMIT=1 and gave a good performance.

we used pytesseract==0.3.0 for calling the tesseract-ocr.why this contradiction, please comment on this?



 
    
     

 
 
    
Can you comment on following question
If we didn't set up the OMP_THREAD_LIMIT, does it enable multithreading in Tesseract-4.1?


You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/Zo4jqQT0--w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xNrMmS6sVu8ob_tJ36HKsfthyvdntLqHb8j%2Bb5herpnA%40mail.gmail.com.

Sarath C P

unread,
Oct 1, 2020, 3:11:56 AM10/1/20
to tesser...@googlegroups.com

Also we wanted without OMP_THREAD_LIMIT=1 is tesseract-4.1 is running multi threading  or not?

shree

unread,
Oct 1, 2020, 8:46:23 AM10/1/20
to tesseract-ocr

Zdenko Podobny

unread,
Oct 2, 2020, 5:40:19 AM10/2/20
to tesser...@googlegroups.com
1. If you want to talk about tesseract performance you should get rid of all that wrappers around (you are wrapping tesseract with python library, and you put it to python app, then you all that put to webserver...)

2. Next: provide a testing case so others can check your performance measurements.

3. If you are caring about OCR performance you should use the tesseract library directly, e.g. for python https://github.com/sirfz/tesserocr. pytesseract just wrap tesseract executable, so each time you need to OCR something you waste time with init process...


Zdenko


št 1. 10. 2020 o 7:53 Sarath C P <mailtos...@gmail.com> napísal(a):
Reply all
Reply to author
Forward
0 new messages