tesseract-ocr pdf input to searchable pdf (ocr-ed) and djvu input to searchable pdf

65 views

Skip to first unread message

tuxcrafter

unread,

Oct 21, 2019, 5:14:46 AM10/21/19

to tesseract-ocr

Hello everybody,

Our Xerox machines died again that has the option to do standalone scans to searchable pdf. We only have linux workstations. I am now looking at buying a cheaper scanning solutions and do pdf to searchable pdfs on the workstations.

Can tesseract-ocr be used to convert pdf's to searchable pdfs reliable? Does somebody have a professional case study?

Can we convert our legacy djvu files to searchable pdf's with tesseract?

I tried to use tesseract many years ago for this but no luck back then.

Kind regards,

Jelle de Jong

Zdenko Podobny

unread,

Oct 21, 2019, 5:59:10 AM10/21/19

to tesser...@googlegroups.com

Yes, tesseract can create searchable pdf (I am not sure how you define if process is reliable...).

Tesseract input must be image (or list of images in text file) so you can not directly convert pdf pr djvu files to searchable pdf.

But there are tools like OCRmyPDF[1] that can help you with converting pdf to searchable pdf (with tesseract)

[1] https://github.com/jbarlow83/OCRmyPDF

Zdenko

po 21. 10. 2019 o 11:14 tuxcrafter <jon...@gmail.com> napísal(a):

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/475cfcbb-d319-4426-9b45-652d8fd0317c%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages