pytesseract having high accuracy but performing very very slow

Vidya Chitragar

unread,

Mar 25, 2021, 3:49:08 AM3/25/21

to tesseract-ocr

Hi Every one.

I am using pytesseract with tesseract-ocr version 3.05.02 for conversion of scanned pdf document of 1000k pages to searchable pdf document but my code is taking more than 5 to 6 hrs to give searcable pdf document , Any suggestions are very helpful to me

Thanks,

Vidya

Shree Devi Kumar

unread,

Mar 25, 2021, 4:29:11 AM3/25/21

to tesseract-ocr

Try with newer version of tesseract.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8f2fe788-c28f-40f7-9804-99978cb44353n%40googlegroups.com.

Zdenko Podobny

unread,

Mar 25, 2021, 5:07:36 AM3/25/21

to tesser...@googlegroups.com

1 000 000 pages in one pdf? Seriously?

+ Post your code. pytesseract is not effective tool in case of multiple images (disk IO for each run/page)

Zdenko

št 25. 3. 2021 o 8:49 Vidya Chitragar <vidya.c...@lucidatechnologies.com> napísal(a):

Reply all

Reply to author

Forward