Effectiveness of Image Pre-Processing with Google Cloud Vision OCR

4,555 views
Skip to first unread message

Tanner Jones

unread,
Mar 15, 2017, 9:22:34 AM3/15/17
to Google Cloud Developers
We use GCV OCR to extract text from receipts - and the picture quality and receipt print quality vary a lot from image to image.  

Would doing some image pre-processing before submitting the image to GCV OCR increase accuracy levels?  I'm asking because I'm guessing there is some form of pre-processing that already happens within GCV OCR and I don't know if our own pre-processing would just be redundant.

Alex (Cloud Platform Support)

unread,
Mar 15, 2017, 3:34:45 PM3/15/17
to Google Cloud Developers

Hi Tanner,


You’re right, it is in fact advised to inspect and pre-process any images submitted to the Cloud Vision API in order to improve efficiency, accuracy and response time. A good set of recommendations are provided on this GCV Best Practices page.


Here’s a list of the main guidelines that would apply to using the OCR feature of this service:

  1. It is recommended that your submitted image be of any non lossy file types among the supported GCV image types.

  2. In case the image is submitted via a JSON request, it would be necessary to encode the image file into base64 prior sending it to the service.

  3. The image sizing section advises the use of images with at least 1024 x 768 pixels.

  4. In this last section, it is also recommended that images be of a sufficient size so that important features within the request can be easily distinguished


Additionally, as part of Cloud Vision 1.1 (Beta) API Features, a new Crop Hints feature was introduced and could effectively be applied to crop the images around their dominant object (Possibly the receipt in your use case). Still, note that this is a Beta release of Google Cloud Vision API Crop Hints. This feature might be changed in backward-incompatible ways and is not subject to any SLA or deprecation policy. This feature is not intended for real-time usage in critical applications.

Tanner Jones

unread,
Mar 21, 2017, 5:42:17 PM3/21/17
to Google Cloud Developers
Alex, we'll make sure to follow the best practices guidelines you linked to... But by pre-processing, I meant something more along the lines of this https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality - does image binarization, deskewing, etc already happen on the Vision OCR side, or would doing it on our side improve accuracy?

Alex (Cloud Platform Support)

unread,
Mar 24, 2017, 1:51:03 PM3/24/17
to Google Cloud Developers

Sorry for the delay, image preprocessing is a good practice on the client side with regards to the large set of possible image settings that can be submitted to the API. In fact, it would be particularly important to remove as much noise as possible. In the meantime, I would recommend testing them in order to establish the best set of preprocessing steps and so, make your receipts images return uniformly the most optimal results with the API’s OCR feature.


Note that, we don’t have any procedures set regarding this and that I have submitted your concerns to the backline team. I will update this thread as soon as more information becomes available.

Alex (Cloud Platform Support)

unread,
Mar 31, 2017, 5:54:14 PM3/31/17
to Google Cloud Developers

I received feedback from the backline team and no recommendations for image pre-processing were evoked, as these steps (including deskewing) were already performed internally. Still a recommendation was given regarding making sure that the characters to be detected figure among the dominant objects on the image.


Hope this information helps and feel free to share your observations in case you would want to test any pre-processing steps.


Regards,

Alex

tanner...@gmail.com

unread,
Apr 3, 2017, 7:23:08 PM4/3/17
to Google Cloud Developers
Awesome - thanks for the feedback, we'll give the 'characters as dominant objects' suggestion a shot!

Erik Christiansen

unread,
Nov 27, 2017, 9:04:12 AM11/27/17
to Google Cloud Developers
Hi Alex,

I'm a little confused here. Are you saying that deskewing is handled internally by the Cloud Vision API? My initial experiments with API made it seem like skew really threw off the results, but now (2 months later) it is performing much better. Also, can you give any guidance as to whether image thresholding (binarization) would improve OCR results, or if that is handled internally?
Reply all
Reply to author
Forward
0 new messages