bank card OCR

Olivier Demin

unread,

Feb 15, 2018, 7:03:44 AM2/15/18

to tesseract-ocr

Hi all. I'm completely new to tesseract, so please apologise for potential "dummy" questions. You're free to make "dummy" answers as well :-)

I would like to OCRize pictures of bank cards in order to extract bank account numbers. I can post-process easily the recognized text with regular expressions in order to extract the bank account number, but the quality of the OCR is not good enough with the default parameters of tesseract. Here is an example of an original picture, and the resulting image after processing by tesseract. I want to extract the number starting with "BE19 3770", but tesseract returns "$193,770 7513" instead. Is it possible to improve this by tuning the tesseract parameters, or do I need another image processing library to prepare my images before tesseract ?

Mateusz Dudek

unread,

Feb 16, 2018, 11:30:06 AM2/16/18

to tesseract-ocr

Hello Olivier, Could You tell me which file I should use to have results like You?

ada...@turningcloud.com

unread,

Feb 21, 2018, 5:05:28 AM2/21/18

to tesseract-ocr

Oliver
You would need to change your image processing before sending the image to tesseract. This image can't be read by Tesseract any better by training as the pixels around the text are noise that needs to be removed before sending them to Tesseract.

ada...@turningcloud.com

unread,

Feb 22, 2018, 12:15:55 AM2/22/18

to tesseract-ocr

Hi Olivier

I would like to refer to the fact that you need to process the image in a different way. You need to Binarize the Image, convert it into black and white and then run Erode and Dilate on it so that the proper text is extracted. You might also need to correct the skewing angle. The text output then can be expected to be okay.
Keep up the Good work,

Regards
Adarsh

On Thursday, February 15, 2018 at 5:33:44 PM UTC+5:30, Olivier Demin wrote:

Reply all

Reply to author

Forward