Hi ,
I am building a small application on the server side to process business card images and extract text. I am using tesseract for OCR. Attached are 2 business cards
If I use tesseract directly on original image, i don't get any text. So I am using ImageMagick to improve the image quality. I follow following steps
Step 1. Increase image resolution >
magick bas_eng_sm.jpg -colorspace RGB -alpha off -units PixelsPerInch -resample 600 bas_eng_sm_resize.tiff
Step 2
. Convert to gray colorspace >
magick bas_eng_sm_resize.tiff -colorspace gray bas_eng_sm_gray.tiff
Step 3
. Apply OCR >
tesseract bas_eng_sm_gray.tiff bas_eng_sm_gray
Output
: I get some output
Peter M. Btcining
Pr:-sauna: 8 CEO
pmbilbasmodlcalacom
(650) 235-4000 (direct)
QBAS
Mf:DI(‘.»‘\l.
I660 S. Amphletl Blvd. 82(1)
San M8100. CA 94402-2525
Main Pb: (650) 235-41)!
cl’-‘ax: (650) 2-I0-«KID
www.bnsmedical.comI need at least name , phone no and mailid should come up proper. What else can I do either in ImageMagick or tesseract to improve the results