Need help for this OCR processing

Eggo chen

unread,

Nov 23, 2016, 4:12:56 PM11/23/16

to tesseract-ocr

Hi All,
I am new to tesseract and need to process a pdf file using tesseract. I convert the pdf file into png (shw in below)and run through tesseract but the result is not perfect. How can I make this to work? Thank you very much in advance.

I am using Tesseract Open Source OCR Engine v3.04.00 with Leptonica.

Auto Generated Inline Image 1

Reinaldo Crespo

unread,

Nov 23, 2016, 6:10:44 PM11/23/16

to tesseract-ocr

Have you tried different page segmentation modes?

Execute tesseract from command prompt and take a took at -psm parameter options. Report here what works best.

Eggo chen

unread,

Nov 28, 2016, 9:20:52 AM11/28/16

to tesseract-ocr

I used -psm with option from 0 through 5 and only 1, 3, 4 and 5 produce output file. With option 1 and 3 produce the most readable outcome as following.

KFS :Procuﬂment Card Imps://kfs-prod.adminspps.comell.=d|l/kfs/ﬁnxncizl.1’mcmment€ard.d .

mm," 1W m Hm

Pmnmmt cml IE

mam as am: mammal/241mm
explm .. whpu .u
, ' mama ma
Dou-lull m V m N
on"... m...

- mm: mmummmmmlm \

Reply all

Reply to author

Forward