Tesseract not working for some single examples.

126 views
Skip to first unread message

Filip Bry

unread,
Jul 30, 2024, 10:45:56 AM7/30/24
to tesseract-ocr

I'm trying to use a tesseract in project wrote in C#. I have a problem with reading text from a part of an image. I'm trying to find this 4 signs (in example 0000) and number after "e". Additionally, for some examples it is working perfectly but for some others its printing "Empty page!!!". Difference between examples is color of the background but whole image processing is the same for every try. What should I do to minimize probability of error?


Thats the image where ocr is working correctly:

working.jpg

and here is not working: 

not working.jpg



Part of code in c#:


public static class Sign
{
    public static void Verify()
    {
        string imagePath = "path.bmp";
        Mat imageSign = new Mat(imagePath);

        int h = imageSign.Rows;
        int w = imageSign.Cols;
        int point1 = (int)(0.01 * w);
        int point2 = (int)(0.6 * h);
        int point3 = (int)(0.3 * w);
        int point4 = (int)(0.9 * h);
        OpenCvSharp.Point start_point = new OpenCvSharp.Point(point1, point2);
        OpenCvSharp.Point end_point = new OpenCvSharp.Point(point3, point4);
        imageSign = new Mat(imageSign, new OpenCvSharp.Rect(point1, point2, point3 - point1, point4 - point2));
        Cv2.Resize(imageSign, imageSign, new OpenCvSharp.Size(), 2, 2);
        imageSign.SaveImage(imagePath);
        
        using (Bitmap bitmap = (Bitmap)Image.FromFile(imagePathE))
        {
            using (Bitmap newBitmap = new Bitmap(bitmap))
            {
                string imagePathA = "2nd image path.bmp";
                newBitmap.SetResolution(300, 300);
                newBitmap.Save(imagePathA);
            }
        }




        string imagePathB = " "2nd image path.bmp " ;
        var pixFromFile = Pix.LoadFromFile(imagePathB);
        string customConfig = "--psm 10 --oem 3";
        using (var engine = new TesseractEngine(@"C:\Program Files\Tesseract-OCR\tessdata", "eng", EngineMode.Default))
        {

            engined.SetVariable("tessedit_char_whitelist", "0123456789");
            using (var page = engined.Process(pixFromFile, customConfig))
            {
                string text = page.GetText();
                Console.Write(text);

                string[] lines = text.Split('\n');
                bool linijka = false;

                foreach (string line in lines)
                {
                    if (line.Length == 4 || line.Length == 5)
                    {
                        Console.WriteLine("Oznaczenie e5: ");
                        Console.WriteLine(line);
                        linijka = true;
                    }
                    if (line.Length == 1)
                    {
                        Console.WriteLine("e_:");
                        Console.WriteLine(line);
                    }
                }

               
                Cv2.ImShow("koniec", imageSign);
                Cv2.WaitKey(0);
            }
        }


I tried cropping an image and for some reason when i making it bigger or smaller than it is now, it adversely affects on results. Additionally I tried some other tesseract psm configurations and changed dpi of image to 300.

Danny

unread,
Aug 4, 2024, 8:36:51 PM8/4/24
to tesseract-ocr
If you can, try pre-processing and inverting the image so it is black text on a white background.  I found that recognition works much better with the preprocessing (probably since the models were trained with that kind of input)
Reply all
Reply to author
Forward
0 new messages