Hello.
I've got some input document input.pdf. This comes straight from a scanner and thus I do some preprocessing to improve accuracy (i.e., unpaper, black/white, increased contrast), which yields preprocessed.png.
When using the command
tesseract preprocessed.png output pdf
I receive a document, which has the ocr'ed text embedded. Great! However: Can I tell tesseract to use the original document input.pdf as the background (i.e., the one without preprocessing) of the generated PDF while still performing ocr on the preprocessed input?
Thanks,
Jonas