Extracting black & white text from image

706 views
Skip to first unread message

Edoardo Conti

unread,
Aug 23, 2019, 11:42:48 PM8/23/19
to tesseract-ocr

I am using tesseract to extract a bunch of sparse numbers from an image for a Poker application I am working on. I have tweaked the settings a bit and am getting decent results, but am still missing several numbers from the image that I'd need. Specifically, I am missing all the player numbers (the 1 - 6 labels in the small circles), and the small $ values ($0.05, $0.15, $0.37, etc.). I think the issue is that the image contains both black and white text.


Any advice on preprocessing I could do to improve this or settings to change in tesseract would be appreciated.


Code below:

from PIL import Image
import pytesseract


img = Image.open(path).convert('L')

print(pytesseract.image_to_string(img, lang='eng', \
    config='--psm 11 -c tessedit_char_whitelist=0123456789$.'))


And output:

$ python test.py
08

$0.02$0.05

$1.50

$4.12

$2.56

3

$2.39

$4.33

$1.52



Clint William Theron

unread,
Aug 24, 2019, 10:24:17 AM8/24/19
to tesser...@googlegroups.com
Didi you try inverting the image? Like the attached image. Maybe grey scale too like so:

function solutionOCR_2() { 
                    var imageData = outputCtx.getImageData(0, 0, cameraSensor.width, cameraSensor.height); //take away the .data

                    let data = imageData.data;
                    var r, g, b, a;
                   
                    for(var i = 0; i < data.length; i+=4) {
                        r = imageData.data[i];
                        g = imageData.data[i + 1];
                        b = imageData.data[i + 2];
                       
                        if (r > 100) { //increse/ decrease with this value
                            imageData.data[i] = 255;
                            imageData.data[i+1] = 255;
                            imageData.data[i+2] = 255;
                        } else {
                            imageData.data[i] = 0;
                            imageData.data[i+1] = 0;
                            imageData.data[i+2] = 0;
                        }
                    }
                   
                    cameraSensor2.width = imageData.width;
                    cameraSensor2.height = imageData.height;
                    outputCtx2.putImageData(imageData, 0, 0);
                   
                    ocrNumbers(cameraSensor2.toDataURL('image/png'));
                }

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/82355411-a164-4864-8b0f-5dd1ce08fa83%40googlegroups.com.
PS6AR.png
Reply all
Reply to author
Forward
0 new messages