PermissionDenied: 403 Error opening file: gs://myocrbucket-v/2.pdf.

1,840 views
Skip to first unread message

V tangutoori

unread,
May 7, 2018, 2:29:19 PM5/7/18
to cloud-vision-discuss

Hi,
I am trying to prepare some NLP test data by performing OCR on a bunch of pdf documents using google cloud vision OCR and to test it out on a single file i have created a bucket in google cloud, uploaded a pdf file to it and tried to OCR that file using the below code, however i am getting the permissiondenied error, i tried my best to find documentation on how to properly provide permissions and followed the instructions on google documentation by activating the service account as a bearer. Can anyone please look at my code and tell me if i am missing something here.

##I activated the service key using following command line argument.
gcloud auth activate-service-account --key-file C:\Python27\GoogleOCR\key.json

import argparse
import re
import os
from google.cloud import storage
from google.cloud import vision_v1p2beta1 as vision
from google.protobuf import json_format

##i have taken sections of code from google documentation on PDF, TIFF file formats.
mime_type = 'application/pdf'
batch_size = 2
client = vision.ImageAnnotatorClient()

feature = vision.types.Feature(type=vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION)

gcs_source = vision.types.GcsSource(uri='gs://myocrbucket-v/2.pdf')(uri="gs://{}/{}".format('Buckets/myocrbucket-v','2.pdf'))

input_config = vision.types.InputConfig(gcs_source=gcs_source, mime_type=mime_type)

gcs_destination = vision.types.GcsDestination(uri='gs://myocrbucket-v/2.pdf')(uri="gs://{}/{}".format('Buckets/myocrbucket-v','2.pdf'))
output_config = vision.types.OutputConfig(gcs_destination=gcs_destination, batch_size=batch_size)

async_request = vision.types.AsyncAnnotateFileRequest(features=[feature],input_config=input_config,output_config=output_config)
operation = client.async_batch_annotate_files(requests=[async_request])

print(operation.result)

Duane Chen

unread,
May 7, 2018, 3:45:15 PM5/7/18
to vamsidhar....@gmail.com, cloud-visi...@googlegroups.com
Hi,

Can you try hitting the REST API directly and let me know if you get the same error? (See "https://cloud.google.com/vision/docs/pdf" section "Protocol" under sample code)

Just want to narrow down the Python code itself as a potential cause.

Thanks,

Duane

--
You received this message because you are subscribed to the Google Groups "cloud-vision-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-vision-dis...@googlegroups.com.
To post to this group, send email to cloud-visi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-vision-discuss/6c3ebe89-1151-4df3-8316-3861cb189ee9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

V tangutoori

unread,
May 8, 2018, 9:43:48 AM5/8/18
to cloud-vision-discuss
Hi Duane,
I have tried making a curl request from command line, I do not have knowledge on the REST API's so i apologize in advance if this was not what you were suggesting, i ave attached my code and response below, from the looks of it, this did not work too.

curl: (6) Could not resolve host: POST
<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 404 (Not Found)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>404.</b> <ins>ThatΓÇÖs an error.</ins>
  <p>The requested URL <code>/v1p2beta1/files:asyncBatchAnnotate</code> was not found on this server.  <ins>ThatΓÇÖs all we know.</ins>

C:\WINDOWS\system32>Authorization: Bearer ACCESS_TOKEN{"requests":[{"inputConfig": {"gcsSource": {"uri": "gs://myocrbucket-vamsi/2017 Form 3W-2.pdf"},"mimeType": "application/pdf"},"features": [{"type": "DOCUMENT_TEXT_DETECTION"}],"outputConfig": {"gcsDestination": {"uri": "gs://myocrbucket-vamsi/"},"batchSize": 2}}]}
'Authorization:' is not recognized as an internal or external command,
operable program or batch file.

Duane Chen

unread,
May 8, 2018, 2:09:47 PM5/8/18
to V tangutoori, cloud-visi...@googlegroups.com
Hi,

Your curl parameters are not quite right, but I'm not familiar enough with Windows command line to give much advice.

There is a useful Chrome extension for REST testing called Advanced REST client: https://chrome.google.com/webstore/detail/advanced-rest-client/hgmloofddffdnphfgcellkdfbfbjeloo?hl=en-US

If you use firefox, there is Poster: https://addons.mozilla.org/en-US/firefox/addon/poster/

Let me know if those work out for you.

Thanks,

Duane


--
You received this message because you are subscribed to the Google Groups "cloud-vision-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-vision-dis...@googlegroups.com.
To post to this group, send email to cloud-visi...@googlegroups.com.

V tangutoori

unread,
May 8, 2018, 2:41:33 PM5/8/18
to Duane Chen, cloud-visi...@googlegroups.com
Hi Duane,
Thanks for your patience, i tried the extension you provided and tried the REST API call, i am still getting the below error.

{
  "error": {
    "code": 403,
    "message": "Error opening file: gs://myocrbucket-v/2.pdf.",
    "status": "PERMISSION_DENIED"
  }
}


On Tue, May 8, 2018 at 2:09 PM, Duane Chen <du...@google.com> wrote:
Hi,

Your curl parameters are not quite right, but I'm not familiar enough with Windows command line to give much advice.

There is a useful Chrome extension for REST testing called Advanced REST client: https://chrome.google.com/webstore/detail/advanced-rest-client/hgmloofddffdnphfgcellkdfbfbjeloo?hl=en-US

If you use firefox, there is Poster: https://addons.mozilla.org/en-US/firefox/addon/poster/

Let me know if those work out for you.

Thanks,

Duane

To unsubscribe from this group and stop receiving emails from it, send an email to cloud-vision-discuss+unsub...@googlegroups.com.
To post to this group, send email to cloud-vision-discuss@googlegroups.com.



--
Tangutoori Vamsidhar.

Duane Chen

unread,
May 8, 2018, 2:54:30 PM5/8/18
to V tangutoori, cloud-visi...@googlegroups.com
Hi,

Thanks for the info. Now that we ruled out the Python code as a potential issue, the next thing to check is whether your service account has access to your GCS file.

Try using the GCS REST API to read the file in question using the same credentials: https://cloud.google.com/storage/docs/json_api/v1/ 

Thanks,

- Duane

To unsubscribe from this group and stop receiving emails from it, send an email to cloud-vision-dis...@googlegroups.com.
To post to this group, send email to cloud-visi...@googlegroups.com.



--
Tangutoori Vamsidhar.

V tangutoori

unread,
May 9, 2018, 10:15:45 AM5/9/18
to Duane Chen, cloud-visi...@googlegroups.com
Hi Duane,
I tried the get bucket method and got the below error.

# Imports the Google Cloud client library
from google.cloud import storage
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="C:/Python27/GoogleOCR/key1.json"

# Instantiates a client
storage_client = storage.Client()

# The name for the new bucket
bucket_name = 'myocrbucket-vamsi'

# Creates the new bucket
bucket = storage_client.get_bucket(bucket_name)


On Tue, May 8, 2018 at 2:54 PM, Duane Chen <du...@google.com> wrote:
Hi,

Thanks for the info. Now that we ruled out the Python code as a potential issue, the next thing to check is whether your service account has access to your GCS file.

Try using the GCS REST API to read the file in question using the same credentials: https://cloud.google.com/storage/docs/json_api/v1/ 

Thanks,

- Duane
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-vision-discuss+unsub...@googlegroups.com.
To post to this group, send email to cloud-vision-discuss@googlegroups.com.



--
Tangutoori Vamsidhar.



--
Tangutoori Vamsidhar.

Duane Chen

unread,
May 11, 2018, 2:36:18 PM5/11/18
to V tangutoori, cloud-visi...@googlegroups.com
Hi,

It appears that your service account does not access to this file, so you need to set up proper permissions to use the PDF API (or any other API involving this file).


Thanks,

- Duane


On Wed, May 9, 2018 at 7:15 AM V tangutoori <vamsidhar....@gmail.com> wrote:
Hi Duane,
I tried the get bucket method and got the below error.

# Imports the Google Cloud client library
from google.cloud import storage
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="C:/Python27/GoogleOCR/key1.json"

# Instantiates a client
storage_client = storage.Client()

# The name for the new bucket
bucket_name = 'myocrbucket-vamsi'

# Creates the new bucket
bucket = storage_client.get_bucket(bucket_name)

On Tue, May 8, 2018 at 2:54 PM, Duane Chen <du...@google.com> wrote:
Hi,

Thanks for the info. Now that we ruled out the Python code as a potential issue, the next thing to check is whether your service account has access to your GCS file.

Try using the GCS REST API to read the file in question using the same credentials: https://cloud.google.com/storage/docs/json_api/v1/ 

Thanks,

- Duane
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-vision-dis...@googlegroups.com.
To post to this group, send email to cloud-visi...@googlegroups.com.



--
Tangutoori Vamsidhar.



--
Tangutoori Vamsidhar.

V tangutoori

unread,
May 14, 2018, 3:24:39 PM5/14/18
to cloud-vision-discuss
Hey Duane,
Sorry for the late response, I have been busy with some important work, i will try to read through the document this week and correct the permissions, i will post a reply to you as soon as i am done or if i have any further questions. Thank you so much for your help.


On Monday, May 7, 2018 at 2:29:19 PM UTC-4, V tangutoori wrote:

V tangutoori

unread,
May 17, 2018, 2:44:53 PM5/17/18
to cloud-vision-discuss
Hey Duane,
Thank you for helpful reply i was able to solve the permission issues and read the file and perform OCR on it, now i am getting a encoding error, can you please give me directions on how to debug it. I am fairly new to python and still figuring out different errors and causes.
# Once the request has completed and the output has been
# written to GCS, we can list all the output files.
storage_client = storage.Client()

match = re.match(r'gs://([^/]+)/(.+)', 'gs://myocrbucket-v/2.pdf')
bucket_name = match.group(1)
prefix = match.group(2)

bucket = storage_client.get_bucket(bucket_name=bucket_name)

# List objects with the given prefix.
blob_list = list(bucket.list_blobs(prefix=prefix))
print('Output files:')
for blob in blob_list:
   print(blob.name)

# Process the first output file from GCS.
# Since we specified batch_size=2, the first response contains
# the first two pages of the input file.
output = blob_list[0]

json_string = output.download_as_string()
response = json_format.Parse(json_string, vision.types.AnnotateFileResponse())

# The actual response for the first page of the input file.
first_page_response = response.responses[0]
annotation = first_page_response.full_text_annotation

# Here we print the full text from the first page.
# The response contains more information:
# annotation/pages/blocks/paragraphs/words/symbols
# including confidence scores and bounding boxes
print(u'Full text:\n{}'.format(annotation.text))

Error:

UnicodeDecodeError
: 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte


--
You received this message because you are subscribed to the Google Groups "cloud-vision-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-vision-discuss+unsub...@googlegroups.com.
To post to this group, send email to cloud-vision-discuss@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-vision-discuss/01fb62f0-26a0-4e19-950a-a8078da8c67c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Tangutoori Vamsidhar.
Reply all
Reply to author
Forward
0 new messages