Simple Tesseract OCR in .NET 4+?

299 views
Skip to first unread message

Cetor Notorious

unread,
Feb 28, 2017, 2:37:47 AM2/28/17
to tesseract-ocr
Hi everybody,

I was wondering if anyone had a tutorial / example code that is really simple.
It just needs to recognize text from a webimage, and return the recognized text.

I would like to make it where I can have this entire piece in one DLL so it's easy to use.

Is anyone able to help me?

Have a wonderful day :)

Quan Nguyen

unread,
Feb 28, 2017, 8:43:21 AM2/28/17
to tesseract-ocr
Check out .NET wrapper for Tesseract:

Cory Blissitte

unread,
Feb 28, 2017, 2:15:54 PM2/28/17
to tesseract-ocr
Building on Quan Nguyens suggestion:

CharlesWs Tesseract wrapper project is a pretty easy thing to work with.  In its simplest form you can do the following provided the image file is saved to your filesystem:

                var engine = new TesseractEngine(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, @"tessdata"), "eng",
                    EngineMode.TesseractOnly)
                {
                    DefaultPageSegMode = PageSegMode.AutoOsd
                };

                var pageOutput = engine.Process(Pix.LoadFromFile(fileName));

                var hOcr = pageOutput.GetHOCRText(0);
                var imageText= pageOutput.GetText();

The hOcr string is an HTML document that contains the text and placement of that text on the page (most useful for incorporation into searchable PDFs.  The imageText string is just the recognized text from the image.


Cory
Reply all
Reply to author
Forward
0 new messages