Tesseract 3 doesn't recognize portion of the image with one word inside

233 views
Skip to first unread message

thakobyan

unread,
Jun 5, 2014, 6:10:27 AM6/5/14
to tesser...@googlegroups.com
Trying to OCR the portion of the image. For some reason if I cut only one word (see Fail.png and Fail2.png attached) it returns empty string.
But when I cut both words together it works fine. Do not think it is some kind of limitation on the word length because it recognizes fine even shorter portions on the same image.

Please help.
Fail.png
Fail 2.png
Success.png

zdenko podobny

unread,
Jun 5, 2014, 7:52:00 AM6/5/14
to tesser...@googlegroups.com
Please provide more detail - e.g. exact version of tesseract, how did you run OCR (API, executable, parameters etc..)

Zdenko


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0ae5a975-765d-4ce9-ba55-2c046a18da28%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nick White

unread,
Jun 5, 2014, 11:07:03 AM6/5/14
to tesser...@googlegroups.com
On Thu, Jun 05, 2014 at 01:51:24PM +0200, zdenko podobny wrote:
> On Thu, Jun 5, 2014 at 12:10 PM, 'thakobyan' via tesseract-ocr
> tesser...@googlegroups.com> wrote:
>>
>> Trying to OCR the portion of the image. For some reason if I
>> cut only one word (see Fail.png and Fail2.png attached) it
>> returns empty string.
>
> Please provide more detail - e.g. exact version of tesseract, how did you run
> OCR (API, executable, parameters etc..)

Indeed, give us more information on how you get the failure. Using
the latest SVN version works well for me:

; tesseract Fail.png stdout
FEMALE
; tesseract 'Fail 2.png' stdout
SINGLE

thakobyan

unread,
Mar 19, 2015, 12:41:47 PM3/19/15
to tesser...@googlegroups.com
I use tesseract-ocr 3 in .Net application.

Here are the settings I have before INIT 
m_tesseract.SetVariable("load_system_dawg", "0");
m_tesseract.SetVariable("load_freq_dawg", "0");
m_tesseract.SetPageSegMode(ePageSegMode.PSM_AUTO);

Here is the fragment of the code:

private TesseractProcessor m_tesseract = null;

private const string m_path = @"data\";
private const string m_lang = "eng";

        private void InitOCR()
        {
            m_tesseract = new TesseractProcessor();

            m_tesseract.SetVariable("load_system_dawg", "0");
            m_tesseract.SetVariable("load_freq_dawg", "0");
            //m_tesseract.SetVariable("tessedit_char_whitelist", "0123456789");
            //m_tesseract.SetVariable("tessedit_pageseg_mode", ((int)TesseractPageSegMode.PSM_AUTO).ToString());
            m_tesseract.SetPageSegMode(ePageSegMode.PSM_AUTO);

            bool succeed = m_tesseract.Init(m_path, m_lang, (int)TesseractEngineMode.DEFAULT);
            if (!succeed)
            {
                MessageBox.Show("Tesseract initialization failed. The application will exit.");
                Application.Exit();
            }



            //System.Environment.CurrentDirectory = System.IO.Path.GetFullPath(m_path);
        }

        private string Ocr(Image image)
        {
            string retVal = string.Empty;
            sw.Reset();
            sw.Start();

            m_tesseract.Clear();
            m_tesseract.ClearAdaptiveClassifier();

            retVal = m_tesseract.Recognize(image);

            sw.Stop();
            label1.Text = string.Format("Elapsed time: {0}", sw.ElapsedMilliseconds);
            return retVal;
        }

четверг, 5 июня 2014 г., 19:07:03 UTC+4 пользователь Nick White написал:
Reply all
Reply to author
Forward
0 new messages