
Hello All,
I am trying to extract characters from a PNG image using tesseract. The attached image is a screenshot of a program written in VS2012. Next, I am cropping the code editor section and saving it . I am using the tesseract from command prompt, along with the makebox parameter so as to retrieve the individual character bounding box dimension. The output which I am getting is as below.
# startcolumn startrow endcolumn endrow
However, the desired output is given below.
3 startcolumn startrow endcolumn endrow
# startcolumn startrow endcolumn endrow
I have tried to change the font in VS2012 and also tried by saving the screenshot in TIFF format. Still the problem persists. Tesseract is not able to detect the line numbers and all the characters correctly. Is it due to cropping of the image file reducing pixel depth? If so then how to increase it so that all characters are extracted correctly.