Removing colors

108 views
Skip to first unread message

Deepak Sharma

unread,
Jan 6, 2021, 11:43:20 AM1/6/21
to tesseract-ocr
I am trying to preprocess resumes for building an OCR model. Please refer to the reference image attached in this message. 
As you can see, under the skills section, all the skills are surrounded by bluish green patch. I need help with how to remove those colors from the image?
Ideally, after preprocessing, the image should be just white(background) with black text
des_resume3.jpeg

Balasundaram Chinnaiyan

unread,
Jan 6, 2021, 1:33:19 PM1/6/21
to tesser...@googlegroups.com
convert the image to grayscale  and remove the gray .

or use HSV colour code to remove it.

Regards,
Bala

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/bc43973f-a2fb-40d7-af07-792fbebe04bdn%40googlegroups.com.

Zdenko Podobny

unread,
Jan 6, 2021, 3:12:28 PM1/6/21
to tesser...@googlegroups.com
try to play with the leptonica pixAutoPhotoinvert function[1].
quick test with following C code snippets provided attached result:

pix = leptonica.pixRead("des_resume3.png");
pix1 = leptonica.pixThresholdToBinary(pix, 170);
autoinverted = pixAutoPhotoinvert(pix1, thresh, NULL, NULL);

st 6. 1. 2021 o 17:43 Deepak Sharma <dee...@intellectfaces.co.in> napísal(a):
I am trying to preprocess resumes for building an OCR model. Please refer to the reference image attached in this message. 
As you can see, under the skills section, all the skills are surrounded by bluish green patch. I need help with how to remove those colors from the image?
Ideally, after preprocessing, the image should be just white(background) with black text

--
autoinverted.png

Deepak Sharma

unread,
Jan 7, 2021, 6:39:06 AM1/7/21
to tesseract-ocr
can you suggest me with an alternate for leptonica for "python & windows"

Zdenko Podobny

unread,
Jan 7, 2021, 1:50:29 PM1/7/21
to tesser...@googlegroups.com
Unfortunately I am not aware of (maintained) python leptonica support (any volunteers?), but you can directly use leptonica&tesseract via cffi in python.
See some examples :

št 7. 1. 2021 o 12:39 Deepak Sharma <dee...@intellectfaces.co.in> napísal(a):

Deepak Sharma

unread,
Jan 8, 2021, 9:12:02 AM1/8/21
to tesseract-ocr
are there any equivalent function in OpenCV which can do similar executions like what you did with leptonica?
Reply all
Reply to author
Forward
0 new messages