Tesseract completely fails to recognize consolas font from high resolution image

135 views
Skip to first unread message

Are

unread,
Apr 28, 2023, 1:18:05 PM4/28/23
to tesseract-ocr
Hello,

I have this simple Tesseract code which takes the attached image and prints the result to the console.
I cropped the image to only include the neccessary information (the full document has sensitive information). Either way, using the cropped image or the full one, it successfully reads most of the text, except for the text with the consolas font.

The output I get from the attached image is: ">BUWVveAmæUw >» >> U U"
Although, when I use the full image, it is able to read the bot

I'm using the nor.traineddata, but the result is very similar with eng.traineddata also.



Here's my code:

using System;
using Tesseract;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var engine = new TesseractEngine(@"./tessdata", "nor", EngineMode.Default))
            {
                using (var img = Pix.LoadFromFile(@"./images/unnamed2.jpg"))
                {
                    using (var page = engine.Process(img))
                    {
                        var text = page.GetText();
                        Console.WriteLine(text);
                    }
                }
            }
        }
    }
}



Here's the image:

unnamed2.jpg

Zdenko Podobny

unread,
May 1, 2023, 5:10:27 AM5/1/23
to tesser...@googlegroups.com
  1. Try to use the tesseract executable if there are any problems when using API/tesseract wrappers
  2. Did you try image processing (as suggested by tesseract documentation?
  3. Did you try custom image segmentation? Your image seems like a table and the tesseract layout analyze has a problem with tables.

Zdenko


pi 28. 4. 2023 o 19:18 Are <arej...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/329a8635-723f-4664-957a-0ef952094912n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages