using tesseract for a credit card reader

2,781 views
Skip to first unread message

mamoos1

unread,
Dec 27, 2011, 7:11:07 AM12/27/11
to tesseract-ocr
Hi,

I want to take tesseract and use it in order to extract name/credit
number for credit card photoes.
I have tried to train it with 3 credit card pictures (seems like
little to me, but I have no idea) - and then use the traineddata in
order to re-scan them and check if it can now extract the data from
them.
The result was even worse than doing it with the original
eng.traineddata

my questions are:

1. Do you think tesseract is even able to do such a thing with proper
training etc. ? or is this task simply not what tesseract was designed
to do?

if the answer is yes:

2. What type of training and what amount do you believe I should do
before I get good results?

Thank you very much!
Roy.

La Monte H. P. Yarroll

unread,
Dec 27, 2011, 4:49:11 PM12/27/11
to tesser...@googlegroups.com

I'm thinking that there is an image processing problem to solve first. Extracting raised printing from a largely arbitrary background is hard, and isn't really an ocr problem.


--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Sven Pedersen

unread,
Dec 27, 2011, 12:24:09 PM12/27/11
to tesser...@googlegroups.com
Hi Roy,
I think tesseract could do it, but you'll need to correctly process
the image so the pixel height and contrast of the characters is in
range. Then you should be able to train with just the recommended
number of trials. However, you may need post-processing to clear
things up. Tesseract has trouble with single words, which credit cards
would generally appear to be.
--Sven

> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en

--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

zdenko podobny

unread,
Dec 28, 2011, 1:51:36 AM12/28/11
to tesser...@googlegroups.com
http://code.google.com/p/tesseract-ocr/issues/detail?id=574&can=1&q=card 

Maybe I am wrong - but I can not imagine legal reason to OCR credit cards...
For legal reason I guess there are solutions ready...

Zdenko

mamoos1

unread,
Dec 28, 2011, 5:02:04 AM12/28/11
to tesseract-ocr
Hi,

Thank you all for your answers.
Zdenko, there is no need to worry, the need is legitimate and legal (a
small startup company) and does not involve any fraud or illegal
activity.
For the other solutions that are available today, I have only been
able to find solutions that cost money per extraction of data.

Sven / La Monte,

I am fairly new to image processing, to the best of your knowledge,
can you please recommend any method of doing such pre-image-
processing? (an application or known method of some sort).

Thank you!
Roy.

On 28 דצמבר, 08:51, zdenko podobny <zde...@gmail.com> wrote:
> http://code.google.com/p/tesseract-ocr/issues/detail?id=574&can=1&q=card
>
> Maybe I am wrong - but I can not imagine legal reason to OCR credit cards...
> For legal reason I guess there are solutions ready...
>
> Zdenko
>
> On Tue, Dec 27, 2011 at 6:24 PM, Sven Pedersen <sven.peder...@gmail.com>wrote:
>
>
>
> > Hi Roy,
> > I think tesseract could do it, but you'll need to correctly process
> > the image so the pixel height and contrast of the characters is in
> > range. Then you should be able to train with just the recommended
> > number of trials. However, you may need post-processing to clear
> > things up. Tesseract has trouble with single words, which credit cards
> > would generally appear to be.
> > --Sven
>
> >http://groups.google.com/group/tesseract-ocr?hl=en-הסתר טקסט מצוטט-
>
> -הראה טקסט מצוטט-

Sven Pedersen

unread,
Dec 28, 2011, 2:27:30 PM12/28/11
to tesser...@googlegroups.com
Hi Roy,
Image enhancement usually means adjusting contrast and the clarity or
smoothness of the text. ImageMagick (free) or Photoshop are often
mentioned on this discussion list. You should show us an example image
so we can see what issues you will have to deal with. Since you are
dealing with sensitive images, just a fragmentary segment of numbers
would be fine. Since you can use checksums (readily available in open
source software) to determine if the card number is correct, you could
easily eliminate possibilities.
--Sven
Reply all
Reply to author
Forward
0 new messages