SetRectangle change?

CraigLandrum

unread,

Aug 1, 2023, 2:40:47 PM8/1/23

to tesseract-ocr

We use tesseract in our document imaging app - first started with version 2.x and recently upgraded from 3.05 to 5.3.1, and something broke. We supply images to tesseract using SetImage and then SetRectangle. In one of our apps, we often OCR the top third of invoices to gather info on a vendor. This worked fine in 3.05 but not in 5.3.1. If I specify the full image dimensions in SetRectangle (as provided to SetImage), all works fine, but if I specify dimensions in SetRectangle to just do the top third of the image, I get total garbage back. We are providing one-bit B&W images to SetImage (white = 1)and specify the target area in pixels. Something changed between 3.05 and 5.3.1 to make this not work. Is there something I missed in the interim? Perhaps SetRectangle(x,y,w,h) wants dimensions that start on 8-bit bounds or something equally restrictive? Any suggestions welcome.

Zdenko Podobny

unread,

Aug 1, 2023, 4:23:49 PM8/1/23

to tesser...@googlegroups.com

Yes, there is a problem with SetRectangle or there is a mismatch between other API functions (e.g. GetThresholdedImage).

It could be demonstrated with the attached simple code.

According to API [1] SetRectangle(left, top, width, height) e.g. SetRectangle(left, top, width, height *.3) should OCR the first 30% of the image. Indeed GetThresholdedImage provides it correctly.

But GetUTF8Text() OCRed "last" 30% of the image (e.g. it acts like SetRectangle(left, bottom, width, height)

IMO safer solution is to use the cropped image for SetImage.

[1] https://github.com/tesseract-ocr/tesseract/blob/0768e4ff4c21aaf0b9beb297e6bb79ad8cb301b0/include/tesseract/baseapi.h#L340

Zdenko

ut 1. 8. 2023 o 20:40 CraigLandrum <cra...@mindwrap.com> napísal(a):

We use tesseract in our document imaging app - first started with version 2.x and recently upgraded from 3.05 to 5.3.1, and something broke. We supply images to tesseract using SetImage and then SetRectangle. In one of our apps, we often OCR the top third of invoices to gather info on a vendor. This worked fine in 3.05 but not in 5.3.1. If I specify the full image dimensions in SetRectangle (as provided to SetImage), all works fine, but if I specify dimensions in SetRectangle to just do the top third of the image, I get total garbage back. We are providing one-bit B&W images to SetImage (white = 1)and specify the target area in pixels. Something changed between 3.05 and 5.3.1 to make this not work. Is there something I missed in the interim? Perhaps SetRectangle(x,y,w,h) wants dimensions that start on 8-bit bounds or something equally restrictive? Any suggestions welcome.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3959f739-c152-4526-93bc-3ea63b9e088an%40googlegroups.com.

SetRectangle_test.cpp

Tom Morris

unread,

Aug 2, 2023, 2:37:08 PM8/2/23

to tesseract-ocr

On Tuesday, August 1, 2023 at 4:23:49 PM UTC-4 zdenop wrote:

IMO safer solution is to use the cropped image for SetImage.

That's a good workaround suggestion, but it clearly sounds like a bug (and something not covered by the unit tests).

Tom

CraigLandrum

unread,

Aug 2, 2023, 3:03:57 PM8/2/23

to tesseract-ocr

I appreciate getting confirmation that an issue exists. ZDenko's observation that SetRectangle is acting like it is treating the "top" argument as "bottom" is likely a clue and has to do with the image coordinate system. I looked at tesseract API doc and could not find where the internal coordinate system was discussed, but SetRectangle - as it was defined - suggests that internally, tesseract considers the top/left corner of the image to be the x=0,y=0 point in the image, with x=width, y=height being the lower right corner of the image. This is how the Apple Mac world originally defined its image coordinate system, but when they moved to OS X/Darwin, they adopted the coordinate system that says the x=0,y=0 point is the lower left corner of the image with x=width, y=height being the upper right corner. Because of these coordinate system differences, our Mac developers have to translate coordinates when making tesseract API calls that require them. If a new developer were to come into the tesseract team and contribute a SetRectangle code change, it would be easy to confuse the two coordinate systems and get it wrong. And yes, elsewhere we use cropped portions of a full image and have no problem with recognition, so we may decide not to use the damaged SetRectangle call at all. Thanks guys!

Reply all

Reply to author

Forward