Tesseract preprocessing -- Performance and Speed

96 views
Skip to first unread message

perito...@gmail.com

unread,
Apr 12, 2019, 11:17:12 AM4/12/19
to tesseract-ocr
Hey community,
i use tesseract for Text extraction but the i find it slow, so i have some questions to find out where can i contribute to make it faster :

- Did Tesseract process some Image traitement and preprocessing/cleanup at the start (need of Leptonica )? if it is the case what are those traitements? how much time do you think they consume ? and how could we disable them ?
- Is Tesseract convert all the image to tiff then process them ?
- Which part of Tesseract is the much time consuming ? and what are functions that you think we can remove or disable to make it faster ?
-  I find this article which propose some parallelisation in some functions to speed it up **[Performance Characterization and Parallelization of Tesseract Optical Character
Recognition on Multicore Architectures](https://pdfs.semanticscholar.org/dab1/23de2a9c25eaeaf7b6456116cea1e509f3f7.pdf)**, is it implemented ?

Thanks 
Reply all
Reply to author
Forward
0 new messages