Invoice processing

208 views
Skip to first unread message

Eduardo Peña

unread,
Jul 30, 2015, 9:14:55 AM7/30/15
to tesseract-ocr
Hello,

I've been trying some pre-processing techniques, etc. to get the best result with tesseract ocr. But I'm getting some errors on parts of data.
I'm using AForge libraries for C#.

The image I'm using is this one:


These are the steps I'm doing right now:

1) Detect where the text starts and crop to avoid any issues when I binarize the image

2) Apply a Median filter (processing square size of 3)

3) Apply gaussian sharpen (0.6 sigma)

4) Apply brightness correction (6 adjustValue)

5) Apply contrast stretch

6) Apply contrast correction (3 factor)

7) Apply saturation correction

8) Convert image to grayscale

9) Apply gamma correction (0.85 gamma)

10) Apply a bradley local thresholding

11) Get skew angle
12) Apply opening filter

13) Rotate image with angle obtained from step 11

14) OCR


Result:

CAFETERIA ESCLAT

AV.LLuis Companys,s/n.— T]f.934 772 965 j

FCO.GONZALEZ CASTRO

SANT JOAN D' ESPI

NIF:09747552Z-IVA NCLUIDO

TICKET 617431

VIE 24 JUL 2015 16:07

Cant Descripcion P.U. Tatil

2 MENU iiiiiiiii 10.00 20.00

l CAFE CON LEUHE 1.30 1.30

1 CAFE 1.10 1.10

TOTAL 22.40

EFECTIVO 22.40

**BASE IMPONIBLE 20.36 IVA 10% 2.04 **

... GRACIAS POR SU VISITA ...

JERU CAJA 1




Is there anything I could do to improve the output?

Thanks

Anshul Maheshwari

unread,
Aug 12, 2015, 10:40:58 AM8/12/15
to tesseract-ocr

you can do Lense Correction.

sriranga(82yrsold)

unread,
Aug 14, 2015, 3:08:28 AM8/14/15
to tesseract-ocr

 @ Nohinn,
Regarding "binarize the image"  - will you kindly let me know the step by step procedure to be followed   to binarize the image for my knowledge - for which I shall be thankful to you.
sriranga(82yrs old)


On Thursday, July 30, 2015 at 6:44:55 PM UTC+5:30, Nohinn wrote:
Reply all
Reply to author
Forward
0 new messages