Unable to read csv file uploaded on google cloud storage bucket.

4,248 views
Skip to first unread message

Hiral Mehta

unread,
Aug 20, 2016, 12:56:12 PM8/20/16
to gce-dis...@googlegroups.com
Hi,

Goal - To read csv file uploaded on google cloud storage bucket.

Environment - Run Jupyter notebook using SSH instance on Master node. Using python on Jupyter notebook trying to access a simple csv file uploaded onto google cloud storage bucket.

Approaches - 

1st approach - Write a simple python program

Wrote following program

import csv
f = open('gs://python_test_hm/train.csv' , 'rb' ) 
csv_f = csv.reader(f)
for row in csv_f
     print row

Results - Error message "No such file or directory"

2nd Approach - Using gcloud Package tried to access the train.csv file. The sample code is shown below. Below code is not the actual code. The file on google Cloud storage in my version of code was referred to "gs://<bucket Name>/Filename.csv"
Results - Error message "No such file or directory"

Load data from CSV

import csv

from gcloud import bigquery
from gcloud.bigquery import SchemaField

client = bigquery.Client()

dataset = client.dataset('dataset_name')
dataset.create()  # API request

SCHEMA = [
    SchemaField('full_name', 'STRING', mode='required'),
    SchemaField('age', 'INTEGER', mode='required'),
]
table = dataset.table('table_name', SCHEMA)
table.create()

with open('csv_file', 'rb') as readable:
    table.upload_from_file(
        readable, source_format='CSV', skip_leading_rows=1)

3rd Approach -

import csv
import urllib



response = urllib.urlopen(url)
cr = csv.reader(response)
print cr

for row in cr:
    print row

Results - Above code doesn't result in any error but it displays the XML content of the google page as shown below. I am interested in viewing the data of the train csv file.

['<!DOCTYPE html>']
['<html lang="en">']
['  <head>']
['  <meta charset="utf-8">']
['  <meta content="width=300', ' initial-scale=1" name="viewport">']
['  <meta name="google-site-verification" content="LrdTUW9psUAMbh4Ia074-BPEVmcpBxF6Gwf0MSgQXZs">']
['  <title>Sign in - Google Accounts</title>']


Can someone throw some light on what could be possibly wrong here and how do I achieve my goal? Your help is highly appreciated.

Thanks,
Hiral Mehta

George (Google Cloud Support)

unread,
Aug 22, 2016, 1:58:24 PM8/22/16
to gce-dis...@googlegroups.com
Hello Hiral,

This can possibly be a permission issue on your bucket. 

You can control who has access to your Cloud Storage buckets and objects as well as what level of access they have. Below is a summary of the access control options available to you, along with links to learning more about each:

  • Identity and Access Management (IAM) permissions: Grant access to all of a project's buckets and objects. IAM permissions give you broad control over your projects, but not fine-grained control over individual buckets or objects.

  • Access Control Lists (ACLs): Grant read or write access to users for individual buckets or objects. In many cases, you can use IAM permissions instead of ACLs. Use ACLs only when you need fine-grained control over individual resources. To learn how to use ACLs, see Create and Manage Access Control Lists.

  • Signed URLs (query string authentication): Give time-limited read or write access to an object through a URL you generate. Anyone with whom you share the URL can access the object for the duration of time you specify, regardless of whether or not they have a Google account. Learn how to create signed URLs:

  • Signed Policy Documents: Specify what can be uploaded to a bucket. Policy documents allow greater control over size, content type, and other upload characteristics than signed URLs, and can be used by website owners to allow visitors to upload files to Google Cloud Storage.

These options are not mutually exclusive. For example, you can use ACLs to generally give private access to a bucket, but then create a signed URL or policy document that allows anyone you choose to access a resource within the bucket, bypassing the ACL mechanism.

For examples of sharing and collaboration scenarios that involve setting bucket and object ACLs, see Sharing and Collaboration.


I hope this helps.


Sincerely,

George



anindya basu

unread,
May 18, 2017, 9:19:52 AM5/18/17
to gce-discussion
Hi George,

I am having a similar problem.

I have verified my code on a local trainer and it works fine.

Scenario:
- big tar file on GCS at gs://<bucket_name>/<filename>.tar. It is within the same project as my jobs
- I load the tar file in python with tarfile.open('gs://<bucket_name>/<filename>.tar', 'r') as f:
- I get error No such file or directory: 'gs://<bucket_name>/<filename>.tar'

I assume that since I am the owner of the project and have submitted the job from the same project, there should not be any permission issue.
Is it an issue with the tar file? What other options do I have to access the tar file from my code? If it is an option, how do I copy the file to a local disk before loading?

Any help will be much appreciated.

Regards,
Anindya

Carlos (Cloud Platform Support)

unread,
May 18, 2017, 4:46:47 PM5/18/17
to gce-discussion
Hi Anindya,

Can you provide additional information on which components you are using? i.e. Are you using GCE, GAE, BQ? Where is your code running? Are you using any kind of authentication? Let me know what you are trying to achieve.
Reply all
Reply to author
Forward
0 new messages