tesseract-ocr pdf input to searchable pdf (ocr-ed) and djvu input to searchable pdf

65 views
Skip to first unread message

tuxcrafter

unread,
Oct 21, 2019, 5:14:46 AM10/21/19
to tesseract-ocr
Hello everybody,

Our Xerox machines died again that has the option to do standalone scans to searchable pdf. We only have linux workstations. I am now looking at buying a cheaper scanning solutions and do pdf to searchable pdfs on the workstations.

Can tesseract-ocr be used to convert pdf's to searchable pdfs reliable? Does somebody have a professional case study?

Can we convert our legacy djvu files to searchable pdf's with tesseract?

I tried to use tesseract many years ago for this but no luck back then.

Kind regards,

Jelle de Jong

Zdenko Podobny

unread,
Oct 21, 2019, 5:59:10 AM10/21/19
to tesser...@googlegroups.com
Yes, tesseract can create searchable pdf (I am not sure how you define if process is reliable...).

Tesseract input must be image (or list of images in text file) so you can not directly convert pdf pr djvu files to searchable pdf.
But there are tools like OCRmyPDF[1] that can help you with converting pdf to searchable pdf (with tesseract)


Zdenko


po 21. 10. 2019 o 11:14 tuxcrafter <jon...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/475cfcbb-d319-4426-9b45-652d8fd0317c%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages