Need help for this OCR processing

96 views
Skip to first unread message

Eggo chen

unread,
Nov 23, 2016, 4:12:56 PM11/23/16
to tesseract-ocr
Hi All,
     I am new to tesseract and need to process a pdf file using tesseract.  I convert the pdf file into png (shw in below)and run through tesseract but the result is not perfect.  How can I make this to work?  Thank you very much in advance.
 
I am using Tesseract Open Source OCR Engine v3.04.00 with Leptonica.




Auto Generated Inline Image 1

Reinaldo Crespo

unread,
Nov 23, 2016, 6:10:44 PM11/23/16
to tesseract-ocr
Have you tried different page segmentation modes?

Execute tesseract from command prompt and take a took at -psm parameter options.  Report here what works best.

Eggo chen

unread,
Nov 28, 2016, 9:20:52 AM11/28/16
to tesseract-ocr
I used -psm  with option from 0 through 5 and only 1, 3, 4 and 5 produce output file. With option 1 and 3 produce the most readable outcome as following.

KFS :Procuflment Card Imps://kfs-prod.adminspps.comell.=d|l/kfs/finxncizl.1’mcmment€ard.d .



mm," 1W m Hm



Pmnmmt cml IE













mam as am: mammal/241mm
explm .. whpu .u
, ' mama ma
Dou-lull m V m N
on"... m...



- mm: mmummmmmlm \
Reply all
Reply to author
Forward
0 new messages