pytesseract having high accuracy but performing very very slow

179 views
Skip to first unread message

Vidya Chitragar

unread,
Mar 25, 2021, 3:49:08 AM3/25/21
to tesseract-ocr
Hi Every one.
I am using pytesseract with tesseract-ocr version 3.05.02 for conversion of scanned pdf document of 1000k pages to searchable pdf document but my code is taking more than 5 to 6 hrs to give searcable pdf document , Any suggestions are very helpful to me
Thanks,
Vidya

Shree Devi Kumar

unread,
Mar 25, 2021, 4:29:11 AM3/25/21
to tesseract-ocr
Try with newer version of tesseract.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8f2fe788-c28f-40f7-9804-99978cb44353n%40googlegroups.com.

Zdenko Podobny

unread,
Mar 25, 2021, 5:07:36 AM3/25/21
to tesser...@googlegroups.com
1 000 000 pages in one pdf? Seriously?
+ Post your code. pytesseract is not effective tool in case of multiple images (disk IO for each run/page)

Zdenko


št 25. 3. 2021 o 8:49 Vidya Chitragar <vidya.c...@lucidatechnologies.com> napísal(a):
Reply all
Reply to author
Forward
0 new messages