How recognize text with background

42 views
Skip to first unread message

Andrus Moor

unread,
Aug 24, 2025, 8:41:14 AM (12 days ago) Aug 24
to tesseract-ocr
Tried https://github.com/Sicos1977/TesseractOCR  and Leptonica to convert jpg receipt slip to text:

    using TesseractOCR;
    using TesseractOCR.Enums;

     byte[] imageStream = <jpg image>;
     var img = TesseractOCR.Pix.Image.LoadFromMemory(imageStream);
    Engine osdengine = new(@"./tessdata",Language.Osd,EngineMode.Default);
    using TesseractOCR.Page osdpage = osdengine.Process(img,PageSegMode.AutoOsd);
    int orie = -1;
    float conf = 0;
    osdpage.DetectOrientation(out orie,out conf);
    if(orie != 0)
      img = img.Rotate(ConvertDegreesToRadians(360 - orie));
   
    Engine engine = new(@"./tessdata",Language.Estonian ,EngineMode.Default);
    using TesseractOCR.Page page = engine.Process(img,PageSegMode.SingleBlock);
    Console.WriteLine("Result " + page.Text);

Receipt image contains background:

https://i.sstatic.net/XXlaWJcg.jpg

Recognized text contains random characters. If background is removed manually:

https://i.sstatic.net/wimOc8CY.png

text is mostly recognized but VAT sum 18,37

https://i.sstatic.net/Um7RLlmE.png

 is not recognized.

How to properly digitalize this receipt? How to remove background from image or force OCR to ignore background?
What pre-processing should applicated to receipt slips beforre OCR?

taust.jpg
Reply all
Reply to author
Forward
0 new messages