Hey community,
i use tesseract for Text extraction but the i find it slow, so i have some questions to find out where can i contribute to make it faster :
- Did Tesseract process some Image traitement and preprocessing/cleanup at the start (need of Leptonica )? if it is the case what are those traitements? how much time do you think they consume ? and how could we disable them ?
- Is Tesseract convert all the image to tiff then process them ?
- Which part of Tesseract is the much time consuming ? and what are functions that you think we can remove or disable to make it faster ?
- I find this article which propose some parallelisation in some functions to speed it up **[Performance Characterization and Parallelization of Tesseract Optical Character
Thanks