Tesseract preprocessing -- Performance and Speed

96 views

Skip to first unread message

perito...@gmail.com

unread,

Apr 12, 2019, 11:17:12 AM4/12/19

to tesseract-ocr

Hey community,

i use tesseract for Text extraction but the i find it slow, so i have some questions to find out where can i contribute to make it faster :

- Did Tesseract process some Image traitement and preprocessing/cleanup at the start (need of Leptonica )? if it is the case what are those traitements? how much time do you think they consume ? and how could we disable them ?

- Is Tesseract convert all the image to tiff then process them ?

- Which part of Tesseract is the much time consuming ? and what are functions that you think we can remove or disable to make it faster ?

- I find this article which propose some parallelisation in some functions to speed it up **[Performance Characterization and Parallelization of Tesseract Optical Character

Recognition on Multicore Architectures](https://pdfs.semanticscholar.org/dab1/23de2a9c25eaeaf7b6456116cea1e509f3f7.pdf)**, is it implemented ?