Tesseract 3.x multiprocessing weird behaviour

39 views
Skip to first unread message

igna...@gmail.com

unread,
Aug 28, 2018, 2:40:51 AM8/28/18
to tesseract-ocr

I am not sure whether it is my infrastructure that does this weird stuff or the tesseract-ocr itself.


Whenever i use image_to_string in single-process environment - the tesseract-ocr works fine. But when I spawn multiple workers with gunicorn and all of them get to do some work with ocr reading - the tesseract-ocr starts reading very poorly (and not from performance-vise, but accuracy-vise). Even after the load is done - tesseract never has the same accuracy. I need to restart all the workers in order to get tesseract working well again.


This is super weird. Maybe anyone has experienced or heard of this issue?

Adrian Owen

unread,
Aug 28, 2018, 3:20:57 AM8/28/18
to tesser...@googlegroups.com

When multiprocessing using V4 (and TessAPI), I had to make multiple copies of tessdata, and give each worker with a unique tessdata.

 

Now it works okay. Hope this is helpful.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3b1859ad-5c26-4688-b5e6-ceb7ae984c8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages