Accuracy decreases when a Region of Interest is used

61 views

Skip to first unread message

ilochray

unread,

Aug 2, 2017, 10:47:32 AM8/2/17

to tesseract-ocr

I am using Tesseract 3.0.4 with the Tesseract-OCR .Net wrapper. I am reading a page which contains account numbers and payment amounts along with other data.

If I read the entire page using ...

var Page2 = Engine.Process(PixPage, PageSegMode.Auto)

, the account numbers and payment amounts are 100% accurate.

If I select and read the zones on the page where they appear by using ...

var Page = Engine.Process(PixPage, Region, PageSegMode.Auto)

the accuracy drops to about 85%.

Is there a way round this?

Isaias Barroso

unread,

Aug 12, 2017, 1:52:24 PM8/12/17

to tesseract-ocr

Hi.

I think you can try some things like:

1 - Set Segmentation Mode to PSM_SINGLE_LINE. I don't know the wrapper but maybe a rectangle is got before apply OCR process.

2 - Get the image for you interest area and save it to verify if the coordinates are correct or if the isolated area are skewed for example. In that case you can get get the are save it and execute tesseract -psm 7 go see the results.