[google cloud vision] How to detect the area's of a invoice

r/ Wobben

unread,

Oct 21, 2016, 11:44:52 AM10/21/16

to Google Cloud Developers

Hello,

I like to use the google api for a financial app in haskell which I have in mind.

One of the features Im thinking of is scanning of invoices instead of manually enter all the data.

For the detecting I can use the google vision api.

But the problem is that I recieve all the text where I only need the sender of the invoice and the amount I have to pay.

Is there a smart way to put only these data of any particular invoice ?

Roelof

Adam Rafuse

unread,

Oct 23, 2016, 5:21:56 PM10/23/16

to Google Cloud Developers

The Cloud Vision API is limited in that it doesn't allow you to train your own models. There's only a single method, images.annotate, which can perform a specific set of detection tasks.

One option is to implement some additional heuristics on the result. For TEXT_DETECTION, the resulting EntityAnnotation will contain boundingPolys for each detected word which can help with this. For example, the text containing a currency symbol spatially closest to the word 'Total' is most likely to be the total due. If you're scanning a lot of invoices that use the same format, you could add logic to only submit the relevant areas of the invoice as images to the Vision API (basically defining your own image template).

On the more complex side of things, if you want to learn TensorFlow, you could use the more generalized Cloud Machine Learning API to train your own models, though that may be overkill for what you need to do.

Vish Kish

unread,

Mar 16, 2017, 10:11:12 PM3/16/17

to Google Cloud Developers, wobb...@gmail.com

I have some questions on this please email me your number

On Friday, October 21, 2016 at 11:44:52 AM UTC-4, r/ Wobben wrote:

Vivek Mahesh Sharma

unread,

Dec 18, 2017, 11:45:58 AM12/18/17

to Google Cloud Developers

Hi Vish,

Can you please let me know if you were able to resolve the issue. I have a similar requirement for scanning invoice documents using google vision API, the documents will be in a fixed format. If you have found a solution, can you please elaborate on it.