I am not sure whether it is my infrastructure that does this weird stuff or the tesseract-ocr itself.
Whenever i use image_to_string in single-process environment - the tesseract-ocr works fine. But when I spawn multiple workers with gunicorn and all of them get to do some work with ocr reading - the tesseract-ocr starts reading very poorly (and not from performance-vise, but accuracy-vise). Even after the load is done - tesseract never has the same accuracy. I need to restart all the workers in order to get tesseract working well again.
This is super weird. Maybe anyone has experienced or heard of this issue?
When multiprocessing using V4 (and TessAPI), I had to make multiple copies of tessdata, and give each worker with a unique tessdata.
Now it works okay. Hope this is helpful.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tesseract-oc...@googlegroups.com.
To post to this group, send email to
tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/3b1859ad-5c26-4688-b5e6-ceb7ae984c8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.