pdf to HTML conversion using tesseract

226 views
Skip to first unread message

Madhu

unread,
Jun 17, 2021, 9:51:18 AM6/17/21
to tesseract-ocr
Hi all,
I am able to convert pdf into images and after that, I am using tesseract to convert jpg images into HOCR, but output HOCR doesnot have any CSS. Is there is any way to get the exact copy of the image as an HOCR output file? I am using pytesseract for the conversion

Thanks in advance
Madhu

This message contains information that may be privileged or confidential and is the property of the Quantiphi Inc and/or its affiliates. It is intended only for the person to whom it is addressed. If you are not the intended recipient, any review, dissemination, distribution, copying, storage or other use of all or any portion of this message is strictly prohibited. If you received this message in error, please immediately notify the sender by reply e-mail and delete this message in its entirety
Reply all
Reply to author
Forward
0 new messages