hocr and resizing image

56 views
Skip to first unread message

Proctor MacBelle

unread,
Aug 15, 2014, 6:19:51 PM8/15/14
to tesser...@googlegroups.com
When I scan a document for ocr, tesseract requires that the image is high dpi. However I do not require such a high dpi in my target PDF file and using such a high dpi in my final PDF files seems like a waste of disk space since I do not need the same resolution image in order to read it as tesseact does. Therefore I am imagining a scenario where I increase the resolution of the original image for tesseract to do ocr on, but subsequently apply the hocr information to the original (lower resolution) image. I however cannot seem to find a way to accomplish this, as the hocr information references the image size with the increased resolution so the text in the image and in the hocr data no longer are aligned.
It would be great to be able to apply the hocr layer properly fitted to the original document. Is there some way to do this that I have missed, or do you think this would be a useful addition to the program?

Sincerely,
Proctor
Reply all
Reply to author
Forward
0 new messages