Supplying a different DPI param per page

40 views
Skip to first unread message

Alec McLean

unread,
Mar 5, 2020, 12:41:57 PM3/5/20
to tesseract-ocr
If you have a multipage tif file is it possible that each page has a different DPI? If so, is there a way to extract the DPI values and then supply those as parameters for each page when you need to OCR images then produce an hOCR?

Zdenko Podobny

unread,
Mar 9, 2020, 4:21:27 AM3/9/20
to tesser...@googlegroups.com
Just quick replay (I did not test it :-) ):
  • tiff is"container of images" and AFAIK each image can have its own resolution (DPI is just information for correct printing/displaying of image)
  • tesseract should read multi-page tiff  image-by-image and process it individually (including DPI information)
=> If each page/image in multi-page tiff have DPI info, tesseract should process it correctly
I am not sure how "--dpi" option would work in this case - IMO multi-page tiff scenario was not considered.

Zdenko


št 5. 3. 2020 o 18:41 Alec McLean <mclea...@gmail.com> napísal(a):
If you have a multipage tif file is it possible that each page has a different DPI? If so, is there a way to extract the DPI values and then supply those as parameters for each page when you need to OCR images then produce an hOCR?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a2adad49-6207-4a30-b73f-c9c2ecf18ffc%40googlegroups.com.

Alec McLean

unread,
Mar 12, 2020, 11:35:27 AM3/12/20
to tesseract-ocr
Thanks for confirming that! Thats what the documentation hints at that but it was a little unclear to me, so I wanted to reach out.


On Monday, March 9, 2020 at 4:21:27 AM UTC-4, zdenop wrote:
Just quick replay (I did not test it :-) ):
  • tiff is"container of images" and AFAIK each image can have its own resolution (DPI is just information for correct printing/displaying of image)
  • tesseract should read multi-page tiff  image-by-image and process it individually (including DPI information)
=> If each page/image in multi-page tiff have DPI info, tesseract should process it correctly
I am not sure how "--dpi" option would work in this case - IMO multi-page tiff scenario was not considered.

Zdenko


št 5. 3. 2020 o 18:41 Alec McLean <mclea...@gmail.com> napísal(a):
If you have a multipage tif file is it possible that each page has a different DPI? If so, is there a way to extract the DPI values and then supply those as parameters for each page when you need to OCR images then produce an hOCR?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages