Numeric recognition accuracy

Vasa Serafin

unread,

Apr 4, 2016, 2:42:45 AM4/4/16

to tesseract-ocr

Hi community,

I have been playing around with the engine and have found some issues with some pictures, I am using bitmaps generated by the computer on diagrams that I create that then change regularly.

The issue I have is that the text, which is numeric in nature, is not being identified, or is identified wrong (not by much, but enough).

Attached is an example image, the image shows 13.00%, this is sometimes identified as I3.00% or I 3.00X, or I3.0096.

I can understand why this occurs as they are similar to the engine, but when I increase the image size, it works better, which is expected and supported by the optimization documentation, optimal size is 300DPI.

I would like some guidance as to any flags or the like, or even an advanced numeric trainingdata that can help in this regard.

Any advice or tips or even a guide to better utilization of the engine would be appreciated.

Thanks.

PS. Current code:

engine = new TesseractEngine(@"./tessdata", "eng", EngineMode.TesseractOnly, "config");

private string Decypher_add_entries(Bitmap bitmap, int blowupW, int blowupH)
        {
            bitmap = ResizeImage(bitmap, bitmap.Width * blowupW, bitmap.Height * blowupH);

            string text = "";

            //var i = 1;
            using (var page = engine.Process(bitmap))
            {
                text = page.GetText();
            }

            return text;
        }

I might not be utilizing all the available commands that can assist me, thats all the code I use for implementation which is a fairly simple 3-4 lines of code.

image_example.JPG

Vasa Serafin

unread,

Apr 4, 2016, 4:30:18 AM4/4/16

to tesseract-ocr

Anyone help me please?

Vasa Serafin

unread,

Apr 4, 2016, 5:39:25 AM4/4/16

to tesseract-ocr

Also just as a side not is there a way to change the default accuracy of the TesseractEngine?

Alex Szeto

unread,

Apr 4, 2016, 10:05:33 AM4/4/16

to tesseract-ocr

you should limit the tesseract not to output character but only number and % sign

Vasa Serafin

unread,

Apr 4, 2016, 11:45:31 AM4/4/16

to tesseract-ocr

Still does not work well at all, only shows 3/10, and not very accurate.

Reply all

Reply to author

Forward