how to train tesseract?

196 views
Skip to first unread message

Cenk KIZILDAĞ

unread,
Jun 29, 2015, 4:20:48 PM6/29/15
to tesser...@googlegroups.com
Hi,

I would like to recognize only digits with the code sample below:

try
            {
               
 
             System.Drawing.Bitmap imagee=new Bitmap(pictureBox1.Image);
             System.Drawing.Bitmap imagee2=AForge.Imaging.Image.Clone(imagee,System.Drawing.Imaging.PixelFormat.Format24bppRgb);
             //GET IMAGE FROM FILE
 
             //CONVERT IMAGE TO TEXT
             tessnet2.Tesseract ocr = new tessnet2.Tesseract(); 
             ocr.SetVariable("tessedit_char_whitelist", "0123456789");
             ocr.Init(@"C:\Users\197199\Documents\Visual Studio 2013\Projects\OCR\OCR\bin\Debug\tessdata", "eng", true); 

              List<tessnet2.Word> res=ocr.DoOCR(imagee2,Rectangle.Empty);
 
             foreach(tessnet2.Word word in res)
             {
             textBox1.Text=textBox1.Text+word.Text+Environment.NewLine;
             }
             //CONVERT IMAGE TO TEXT
            }
            catch( Exception ex)
            {
                textBox1.Text = ex.Message;
            }

Here is the image that I would like to recognize:


And the outcome is:

88
12
18
28
41
48

How can I fix this? Any help please.

Thanks in advance & Best Regards.

Dmitri Silaev

unread,
Jun 29, 2015, 4:39:33 PM6/29/15
to tesser...@googlegroups.com
As the first mandatory step you need to do perspective correction, e.g. using paper sheet boundaries (is it a lottery ticket?)

Then depending on how it goes further with Tesseract you may need either to:
- Train for this particular font
- Blur a bit to make characters more "fleshy"
- Scale down vertically by a factor of 1.5 to match closer to standard trained fonts

Each step in turn is a multi-step process. PM me if you're interested.

Best regards,
Dmitri Silaev
www.CustomOCR.com





--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2e7c2183-63e9-49da-ac7e-c1079273199e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Cenk KIZILDAĞ

unread,
Jun 30, 2015, 2:19:03 AM6/30/15
to tesser...@googlegroups.com
Hi Dmitri,

Thanks for your reply. I need your asistance. Would you please help me?

Thanks in advance & Best Regards.

29 Haziran 2015 Pazartesi 23:39:33 UTC+3 tarihinde Dmitri Silaev yazdı:

Dmitri Silaev

unread,
Jun 30, 2015, 2:04:33 PM6/30/15
to tesser...@googlegroups.com
Hi Cenk,

Sure, do not hesitate to contact me directly.

-Dmitri






Reply all
Reply to author
Forward
0 new messages