Google Vision OCR API parse images differently in Linux and Windows 10

42 views
Skip to first unread message

unni mana

unread,
May 26, 2022, 6:58:25 AM5/26/22
to cloud-vision-discuss
Hi All,

I am calling a Google Vision OCR Java API to parse bills from pdf files. It is working fine when I make requests from my Windows 10 laptop. But when the same code is deployed in Linux environment and tries to parse the same PDF files, it is not parsing properly. Some of the parsed text is missing.

Any idea why is this happening?

Thanks

Unni

Krishnanunni K

unread,
May 26, 2022, 7:00:51 AM5/26/22
to cloud-vision-discuss

Are you converting the PDF to images and doing OCR ?

unni mana

unread,
May 26, 2022, 8:08:05 AM5/26/22
to cloud-vision-discuss
Yes I am using Apache pdf box to convert the pdf into images.

Krishnanunni K

unread,
May 26, 2022, 8:58:45 AM5/26/22
to cloud-vision-discuss
Most probably the problem will be, that you might be missing the fonts required in the Linux environment. See , if the images looks ok

unni mana

unread,
May 26, 2022, 9:14:44 AM5/26/22
to Krishnanunni K, cloud-vision-discuss

The images are looking good. How can we know what fonts to be installed?

--
© 2018 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Cloud Vision Discussion Google Group (cloud-visi...@googlegroups.com) to participate in discussions with other members of the Google Cloud Vision community and the Google Cloud Vision Team.
---
You received this message because you are subscribed to the Google Groups "cloud-vision-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-vision-dis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-vision-discuss/197b307a-dd24-47bb-91b5-d022f0caa027n%40googlegroups.com.


--

unni mana

unread,
May 27, 2022, 1:15:42 AM5/27/22
to Krishnanunni K, cloud-vision-discuss
I am getting the following errors in Windows and Linux environments:

Rendering of type3 fonts isn't supported in PDFBox 1.8.x. It will be available in the 2.0 version!.

Then I upgraded the PDFBox to 2.0.25 version, but I am still getting the above message in both environments.
--
Reply all
Reply to author
Forward
0 new messages