Simple Tesseract OCR in .NET 4+?

Cetor Notorious

unread,

Feb 28, 2017, 2:37:47 AM2/28/17

to tesseract-ocr

Hi everybody,

I was wondering if anyone had a tutorial / example code that is really simple.
It just needs to recognize text from a webimage, and return the recognized text.

I would like to make it where I can have this entire piece in one DLL so it's easy to use.

Is anyone able to help me?

Have a wonderful day :)

Quan Nguyen

unread,

Feb 28, 2017, 8:43:21 AM2/28/17

to tesseract-ocr

Check out .NET wrapper for Tesseract:

https://github.com/charlesw/tesseract

Cory Blissitte

unread,

Feb 28, 2017, 2:15:54 PM2/28/17

to tesseract-ocr

Building on Quan Nguyens suggestion:

CharlesWs Tesseract wrapper project is a pretty easy thing to work with. In its simplest form you can do the following provided the image file is saved to your filesystem:

                var engine = new TesseractEngine(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, @"tessdata"), "eng",
                    EngineMode.TesseractOnly)
                {
                    DefaultPageSegMode = PageSegMode.AutoOsd
                };

                var pageOutput = engine.Process(Pix.LoadFromFile(fileName));

                var hOcr = pageOutput.GetHOCRText(0);
                var imageText= pageOutput.GetText();

The hOcr string is an HTML document that contains the text and placement of that text on the page (most useful for incorporation into searchable PDFs. The imageText string is just the recognized text from the image.

Cory

Reply all

Reply to author

Forward