Tesseract not able to recognize characters even for a high quality Image

138 views
Skip to first unread message

ninad mundalik

unread,
Jul 24, 2014, 5:26:04 AM7/24/14
to tesser...@googlegroups.com
I am doing the process of cleaning up and image using leptonica and then passing it to tesseract for OCR.However it is not able to recognize the characters even though the image is of high quality.The image specifications are as follows.

1 bpp, uncompressed, 1280 * 960 , 300dpi horizontal and vertical resolution

Following are the image processing operations I carry out in sequence using leptonica
pixConvertTo8
pixBackgroundNormSimple
pixOtsuAdaptiveThreshold
pixContrastTRC {Regarding this - I am passing high values like 1.0 or even 5.0 but image doesnt really change}
pixFindSkew
pixRotate { rotate by angle found by pixFindSkew}
pixRotate90 {do this 4 times to read image in all 4 orientations}
pixClipRectangle {crop image}
Finally tesseract command.
I get garbage characters in the output.A sample Input Image is as follows

The output that i get is as follows
Final K-1
II]
s h d | K-1 ,.,
(F°o.~?n‘i&1) 5/>.©12 mm E2‘;
Deparlrnenl of tho Treasury , ,
I 1 I l I
‘mama, Ravenuo SGMW For cnlundm your 201), ‘ " °F°$ "'100fTIO
or lax yum boqmnnnq 7 _ 20\Q_
‘ 7660
and ondmg _  W vv I go
Beneï¬ciary's Share of Income, Deductions,
cl'editS, etc. F 800 buck 01 loam nnd lnstruoflons»
___lnformatI0n About mo Estate or Trust
‘ Ordmary d|v|dm
i 12113
 _
‘; Quahfmd dlVIdG
\ 8132
3 1
Net shun-term
A Estate's at trust's omgiuym ldonnlmnluon numbol
56-0987654
B Estate's u trust‘: namo
ESTATE OF MARTHA SMITH
0 Fiduc§ary's name, address, clly, smlu‘ and /IP codo
N01 long~lerm c
\ 24043 
u 
‘ 28% vale gann
Ti
Unreptumd 5
Omar porfloho 4
nonbuslness lfll
/\..4........ L. ._.._ ,.

What Should i do to improve the accuracy.

Part 2:

I tried to follow this link.And created a eng.user-words.traineddata file and bazaar.train file and tried to run with "bazaar" as additional parameter.but i get "read_params_file: can't open bazaar error". Any suggestions?

Reply all
Reply to author
Forward
0 new messages