Problem with Binarization

79 views
Skip to first unread message

davejaw

unread,
Sep 15, 2014, 3:23:24 PM9/15/14
to tesser...@googlegroups.com
Hello All,

I am trying to parse a reasonable looking text image.  After a bit of searching/reading, I suspect the problem is due to difficulty binarizing the image.  

I'm playing around with black/white threshold levels using GIMP but getting very different results with even small changes in thresholds (none of them very accurate.)

If anyone has some insight/experience with this issue, I'd really appreciate a nudge in the right direction.

I've attached the original "cruciate.png" with different black/white thresholds and their corresponding tesseract results. 


Thanks,
Dave
cruciate.PNG
CruciateTH160.txt.txt
Cruciate.txt.txt
CruciateTH102.PNG
CruciateTH102.txt.txt
CruciateTH127.PNG
CruciateTH127.txt.txt
CruciateTH150.PNG
CruciateTH150.txt.txt
CruciateTH160.PNG

Albrecht Hilker

unread,
Sep 16, 2014, 9:11:35 PM9/16/14
to tesser...@googlegroups.com
Tesseract requires test to be 300 DPI.

Resize your images with factor 3 and try again.
The built-in thresholder should be enough for your samples. (No Gimp required)

Tom Morris

unread,
Sep 18, 2014, 12:23:04 PM9/18/14
to tesser...@googlegroups.com
Or better yet, rescan at the higher resolution so that the program doesn't need to deal with characters which touch, loops with their centers filled, etc. 
Reply all
Reply to author
Forward
0 new messages