Google Vision and OCR in PDFs

David Gossett

unread,

Nov 12, 2018, 6:24:12 PM11/12/18

to cloud-vision-discuss

Very sorry if this has been asked and answered numerous times. I did try to search on various keywords.

Is JSON still the only output format for Google Vision? https://cloud.google.com/vision/docs/request#json_response_format

Based on this post from 11/7 > https://stackoverflow.com/questions/52343909/ocr-pdf-files-using-google-cloud-vision/53195020#53195020 < it looks like a no go.

We are wanting to OCR some PDFs and then use https://tabula.technology/ to grab the tables. No other OCR capability comes close to Google Vision, in our opinion.

If we can't OCR the PDFs (in place) with Google Vision, does anyone have any ideas about how to convert the JSON back into a table? We get the x,y is in JSON, but that is for every single word.

We were kind of hoping there was some software out there that will take JSON outputs and recreate the input image, with structure that we can then scrape using a different technology.

Again, sorry if this has been asked and answered a bunch. You might have a link to the exact post that will answer my question(s).

Thanks! And congrats on Google Vision. The OCR output is really stunning!

Duane Chen

unread,

Nov 12, 2018, 6:50:57 PM11/12/18

to david....@gmail.com, cloud-visi...@googlegroups.com

Hi David,

We have a PDF/TIFF API which can do OCR on PDFs. https://cloud.google.com/vision/docs/pdf

This still gives JSON x,y word positions, but we are working on capabilities to parse more interesting structures like tables and forms from the PDF. Coming soon!

Best,

- Duane

--
© 2018 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043

Email preferences: You received this email because you signed up for the Google Cloud Vision Discussion Google Group (cloud-visi...@googlegroups.com) to participate in discussions with other members of the Google Cloud Vision community and the Google Cloud Vision Team.
---
You received this message because you are subscribed to the Google Groups "cloud-vision-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-vision-dis...@googlegroups.com.
To post to this group, send email to cloud-visi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-vision-discuss/cefd1fae-f039-40f5-a797-f859518f85f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Gossett

unread,

Nov 12, 2018, 8:11:41 PM11/12/18

to cloud-vision-discuss

Many thanks for your reply. I think this is going to be a game changer when ready. Will put the entire OCR industry on notice. We tested Google Vision against many OCR providers and it left our mouths hanging open. Let's put it this way, Azure was a very short test!

I should have put this in my original post, but found Dustin's post really intriguing. https://stackoverflow.com/questions/51972479/get-lines-and-paragraphs-not-symbols-from-google-vision-api-ocr-on-pdf/52086299#52086299

While patiently waiting, do you think there is anything we could do with the JSON to tease out more structure than just every word with x and y?

Thanks!

Duane Chen

unread,

Nov 13, 2018, 3:02:36 PM11/13/18

to cloud-vision-discuss

Hi David,

Right now the JSON also exposes paragraphs and blocks as higher level structures to words. Do those help with your use case?