Hello,
for some documents it would make sense to create a text-only PDF with tesseract (cf. -c textonly_pdf=1) and merge it with an image-only PDF; as described in
https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#integrate-original-image-file-and-detected-text-into-pdf and the linked github issue comment.
Use-case: let tesseract do its OCR on very high-quality images but put some post-processed images into the resulting PDF file. Thus, you get high quality OCR results and a relatively small PDF file.
So the ansatz described in the FAQ/issue sounds nice, but how do I actually merge the 2 PDF files (on Linux)?
When googling for PDF merge tools I just find ones for concatenating PDF files ...
For the above merge the 2 PDF files have to be merged 'on top' of each other, i.e. the number of pages of the resulting PDF doesn't change, it 'just' gets the text layer added.
Best regards
Georg