You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tesseract-ocr
Hi everybody,
I was wondering if anyone had a tutorial / example code that is really simple. It just needs to recognize text from a webimage, and return the recognized text.
I would like to make it where I can have this entire piece in one DLL so it's easy to use.
Is anyone able to help me?
Have a wonderful day :)
Quan Nguyen
unread,
Feb 28, 2017, 8:43:21 AM2/28/17
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tesseract-ocr
Building on Quan Nguyens suggestion:
CharlesWs Tesseract wrapper project is a pretty easy thing to work with. In its simplest form you can do the following provided the image file is saved to your filesystem:
var engine = new TesseractEngine(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, @"tessdata"), "eng", EngineMode.TesseractOnly) { DefaultPageSegMode = PageSegMode.AutoOsd };
var pageOutput = engine.Process(Pix.LoadFromFile(fileName));
var hOcr = pageOutput.GetHOCRText(0); var imageText= pageOutput.GetText();
The hOcr string is an HTML document that contains the text and placement of that text on the page (most useful for incorporation into searchable PDFs. The imageText string is just the recognized text from the image.