Michael Kadziela
unread,Aug 3, 2022, 11:59:13 PM8/3/22Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tesseract-ocr
Hey all, and thanks for assisting.
I'm currently working on a pipeline that takes in PDFs, converts them to images, feeds them to Tesseract, and outputs a combined PDF at the end with a readable text layer.
I'm up to the Tesseract part, and I'm stuck with the API and unsure how to continue. Essentially I want to give Tesseract an image from memory, such as a Pix from Leptonica. This works currently for outputting a text string, but I can't find in the API any sort of method that uses the image that was given to the Tesseract instance, and can render a PDF output. They all seem to want a filepath rather than using the set image for the Tesseract instance.
Is there an API somewhere for this, or a work around?
Thanks!