OCR various fields of bank check in TIFF format

53 views

Skip to first unread message

Keith Smith

unread,

Aug 8, 2023, 12:56:33 PM8/8/23

to tesseract-ocr

Hello,

I have several X9.37 files and would like to use tesseract to OCR the check images in TIFF format and compare the OCR results with those fields in the X9.37 file. If the results of my tesseract OCR do not match the values in the X9.37 file, then I'd like to flag the check for manual review.

The exact fields which I would like to OCR from the TIFF image are:

* the MICR line fields including routing number, On Us, and Auxiliary On Us;

* the legal check amount (in cursive);

* the courtesy check amount.

I have tried running tesseract 5 as follows:

tesseract --tessdata-dir tessdata input output

where my "tessdata" directory contains eng.traineddata, mcr.traineddata, and ocr.traineddata,

and "input" contains some of my tiff formatted check files.

I have the following questions:

1. This of course simply prints some free-form text to the "*.ocr.txt" file. Is there a standard way of generating output in JSON format similar to:

{

"onUs": "...",

"auxiliaryOnUs": "...",

"legalAmount": "...",

"courtesyAmount": "..."

}

2. Is there a standard way of converting the "legalAmount" to a numeric value?

3. The results that I am getting for the MICR line fields are horrible. What is recommended for best results? These checks are E13B format.

4. If I need to do my own training, what is the best way to create the ground truth for my use case?

Thank you in advance,

Keith

Reply all

Reply to author

Forward

0 new messages