Fasten Tesseract OCR

213 views
Skip to first unread message

vadansh kulshreshtha

unread,
Nov 29, 2023, 7:53:34 AM11/29/23
to tesseract-ocr
Hello Everyone,

I am using Tesseract OCR 5.2 and I want to speed up my OCR process so for that, could any help me with the same? It would be a great help for me. Also can anyone tell me all the parameters that affect the speed of OCR.

Thank you

Dellu Bw

unread,
Nov 29, 2023, 9:08:11 AM11/29/23
to tesser...@googlegroups.com
Using the fast model is the only parameter I am aware of to speed up ocring.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7ec9293f-798e-48e0-a742-c6ece2775165n%40googlegroups.com.

Zdenko Podobny

unread,
Nov 29, 2023, 9:34:25 AM11/29/23
to tesser...@googlegroups.com
Your request is too general e.g.  reply could be "upgrade your hardware"... ;-)

Unless you provide details about your testing environment + process of measuring speed and testing images, there is just one general advice: read the docs and issue tracker (including closed issues), there are several discussions (and hints) regarding speed. 

Zdenko


st 29. 11. 2023 o 13:53 vadansh kulshreshtha <vadansh.ku...@einnosystech.com> napísal(a):
Hello Everyone,

I am using Tesseract OCR 5.2 and I want to speed up my OCR process so for that, could any help me with the same? It would be a great help for me. Also can anyone tell me all the parameters that affect the speed of OCR.

Thank you

--

vadansh kulshreshtha

unread,
Nov 30, 2023, 8:57:38 AM11/30/23
to tesseract-ocr
I am using an i3 quad-core CPU. My scenario is that I want to process 100 images in 1 sec including the image processing and cropping images. I create an ROI crop it and do the image processing then OCR. But what happens is that sometimes the same ROI takes more than 1 sec but sometimes it does it in 150-200ms. Also, I use the best train file of Tesseract. Also, the size of my ROI is not more than the size of a word. eg. "super@145&4califragilisticexpialidocious".

For image processing, I do the thresholding, and zooming of images if required.

Please do suggest to me the ways to get a reliable OCR processing time and also ways to fasten the OCR.

Thank you

Tom Morris

unread,
Dec 1, 2023, 6:55:45 PM12/1/23
to tesseract-ocr
As was mentioned earlier, for higher performance you should consider using the "fast" models instead of the "best" models. Other than that, analyzing the performance of the outliers to see what makes them different may give you some clues, performance is always going to depend to some degree on the image content.

Tom

Zdenko Podobny

unread,
Dec 14, 2023, 2:41:15 PM12/14/23
to tesser...@googlegroups.com
A more effective approach to addressing the issue is to create a test/example case. Advanced users can then evaluate and potentially offer solutions

It would be helpful if you could provide details on how you obtain and process the input images, as well as the OCR execution method (API, wrapper, executable). Examining this could reveal opportunities for speed improvements, particularly by minimizing IO operations.

It's worth noting that there have been reported problems with OpenMP on Linux and Mac in the context of extensive OCR tasks, as outlined in these GitHub issues: [1], [2].
Investigating these and other performance related) issues may offer insights into potential optimizations.


št 30. 11. 2023 o 14:57 vadansh kulshreshtha <vadansh.ku...@einnosystech.com> napísal(a):
Reply all
Reply to author
Forward
0 new messages