Aug 3, 2022, 11:59:13 PMAug 3
Hey all, and thanks for assisting.
I'm currently working on a pipeline that takes in PDFs, converts them to images, feeds them to Tesseract, and outputs a combined PDF at the end with a readable text layer.
I'm up to the Tesseract part, and I'm stuck with the API and unsure how to continue. Essentially I want to give Tesseract an image from memory, such as a Pix from Leptonica. This works currently for outputting a text string, but I can't find in the API any sort of method that uses the image that was given to the Tesseract instance, and can render a PDF output. They all seem to want a filepath rather than using the set image for the Tesseract instance.
Is there an API somewhere for this, or a work around?