Why output pdf of textonly_pdf=1 does not contain any data?

257 views
Skip to first unread message

Sharp Subbu

unread,
Apr 29, 2021, 2:57:47 PM4/29/21
to tesseract-ocr
Dear Friends,

Kindly find the attached pdf file "TextOnlyPDF_NoData.pdf".
This pdf file is created using the Tesseract OCR v5.0.0. using the below command:
Command: tesseract Invoice.tiff TextOnlyPDF_NoData -l eng -c textonly_pdf=1 pdf

But, this pdf does not contain any data. It is empty.

Kindly let us know is there any bug/issue present in Tesseract OCR v5.0.0.0 latest source which generates above output pdf file with textonly_pdf=1.

NOTE:
For your reference, we are attaching a text only pdf file "Invoice--Adobe-PDF-(ABBYY-OCR).pdf" generated by ABBYY OCR.
We are trying to generate similar text only pdf file using Tesseract OCR v5.0.0.

Kindly help us to fix the above textonly pdf issue from Tesseract OCR v5.0.0. side.

Thank you very much in advance.

Regards,
Subramanyam
TextOnlyPDF_NoData.pdf
Invoice--Adobe-PDF-(ABBYY-OCR).pdf

Zdenko Podobny

unread,
Apr 30, 2021, 6:36:11 AM4/30/21
to tesser...@googlegroups.com
I am not sure what is your problem: file is not empty and tesseract gave you output exactly what you asked for[1], [2].

št 29. 4. 2021 o 20:57 Sharp Subbu <sharp...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2740afaf-47ff-4518-b829-5c69f9e94457n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages