How do I push different file types in Google Cloud Storage to the client browser?

2 views
Skip to first unread message

user2744119 via StackOverflow

unread,
Nov 9, 2015, 2:54:08 PM11/9/15
to google-appengin...@googlegroups.com

I'm trying to create a document management system using Google Cloud Storage (GCS), Python2.7 and Ferris framework. I'm able to upload many types of files into cloud storage and I'm able to programmatically push CSV and TXT to the clients browser for download with no problem. But if the file is a Microsoft Word Document or a PDF or any other mime-type I keep getting the following error:

'ascii' codec can't decode byte 0xe2 in position X

The following example works if the user is trying to download a CSV file:

    @route
    def test_get_csv_file(self):
        # the file in google cloud storage
        thefilename = '/mydomain.appspot.com/my_csv_file.csv'
        try:
            with gcs.open(thefilename, "r") as the_file:
            self.response.headers["Content-Disposition"] = "'attachment'; filename=my_csv_file.csv"
            return the_file.read(32*1024*1024).decode("utf-8")
        except gcs.NotFoundError:
            return "it failed" 

The following is an example of trying to push a Word doc which fails with the aforementioned error:

@route
def test_get_word_file(self):
    # the file in google cloud storage
    thefilename = '/mydomain.appspot.com/my_word_file.doc'
    try:
        with gcs.open(thefilename, "r") as the_file:
            self.response.headers["Content-Disposition"] = "'attachment'; filename=my_word_file.doc"
            return the_file.read(32*1024*1024).decode("utf-8")
    except gcs.NotFoundError:
        return "it failed" 

Access to the files has to be restricted to the domain account so I can't set the default ACL of the bucket to public-read, otherwise I would just use the storage.googlapis.com/yadda/yadda URL as the serving url and be done with it. I also tried changing the decode value to Latin-1 but that just rendered a blank file. I don't understand why this works with CSV files but not anything else. I appreciate any assistance. Thanks



Please DO NOT REPLY directly to this email but go to StackOverflow:
http://stackoverflow.com/questions/33617044/how-do-i-push-different-file-types-in-google-cloud-storage-to-the-client-browser

Fábio Uechi via StackOverflow

unread,
Nov 13, 2015, 9:24:04 AM11/13/15
to google-appengin...@googlegroups.com

It doesnt't actually solve your problem. But an alternative approach is to use signed urls. The files would then be served directly from Cloud Storage and the generated url would be valid for a limited time.

I use the python module below. It has some utility methods and classes for url signing.

import datetime
import time
import urllib
from urlparse import urlparse

__author__ = 'fabio'
__all__ = ['sign', 'PolicyDocument', 'CloudStorageURLSigner']

from google.appengine.api import app_identity
from base64 import b64encode
import json


def sign(string_to_sign):
    signing_key_name, signature = app_identity.sign_blob(string_to_sign)
    return b64encode(signature)


class PolicyDocument:
    """Represents a policy.

    Attributes:
        content_type:
        success_action_redirect:
        key:
        bucket:
        expiration:
        acl:
        success_action_status:
    """
    ACL = "acl"
    SUCCESS_ACTION_REDIRECT = "success_action_redirect"
    SUCCESS_ACTION_STATUS = "success_action_status"
    KEY = "key"
    BUCKET = "bucket"
    CONTENT_TYPE = "content-type"
    ACL_PUBLIC_READ = "public-read"
    ACL_PROJECT_PRIVATE = "project-private"

    def __init__(self, content_type=None, success_action_redirect=None, key=None, bucket=None, expiration=None,
                 success_action_status=201, acl=ACL_PROJECT_PRIVATE):
        self.content_type = content_type
        self.success_action_redirect = success_action_redirect
        self.key = key
        self.bucket = bucket
        self.expiration = expiration
        self.acl = acl
        self.success_action_status = success_action_status

    def as_dict(self):
        conditions = [{self.ACL: self.acl},
                      {self.BUCKET: self.bucket},
                      {self.KEY: self.key},
                      {self.CONTENT_TYPE: self.content_type},
                      ["starts-with", "$content-type", 'image/'],
        ]

        # TODO investigate why its not working
        if self.success_action_redirect:
            conditions.append({self.SUCCESS_ACTION_REDIRECT: self.success_action_redirect})
        else:
            conditions.append({self.SUCCESS_ACTION_STATUS: str(self.success_action_status)})

        return dict(expiration=self.expiration, conditions=conditions)

    def as_json_b64encode(self):
        return b64encode(self.as_json())

    def as_json(self):
        return json.dumps(self.as_dict())


class CloudStorageURLSigner(object):
    """Contains methods for generating signed URLs for Google Cloud Storage."""

    DEFAULT_GCS_API_ENDPOINT = 'https://storage.googleapis.com'

    def __init__(self, gcs_api_endpoint=None, expiration=None):
        """Creates a CloudStorageURLSigner that can be used to access signed URLs.
    Args:
      gcs_api_endpoint: Base URL for GCS API. Default is 'https://storage.googleapis.com'
      expiration: An instance of datetime.datetime containing the time when the
                  signed URL should expire.
    """
        self.gcs_api_endpoint = gcs_api_endpoint or self.DEFAULT_GCS_API_ENDPOINT
        self.expiration = expiration or (datetime.datetime.now() +
                                         datetime.timedelta(days=1))
        self.expiration = int(time.mktime(self.expiration.timetuple()))
        self.client_id_email = app_identity.get_service_account_name()

    def __make_signature_string(self, verb, path, content_md5, content_type):
        """Creates the signature string for signing according to GCS docs."""
        signature_string = ('{verb}\n'
                            '{content_md5}\n'
                            '{content_type}\n'
                            '{expiration}\n'
                            '{resource}')
        return signature_string.format(verb=verb,
                                       content_md5=content_md5,
                                       content_type=content_type,
                                       expiration=self.expiration,
                                       resource=path)

    def signed_url(self, verb, path, content_type='', content_md5=''):
        """Forms and returns the full signed URL to access GCS."""
        base_url = '%s%s' % (self.gcs_api_endpoint, path)
        signature_string = self.__make_signature_string(verb, path, content_md5,
                                                        content_type)
        signature = urllib.quote_plus(sign(signature_string))
        return "{}?GoogleAccessId={}&Expires={}&Signature={}".format(base_url, self.client_id_email,
                                                                     str(self.expiration), signature)

    def signed_download_url(self, url):
        if self.is_stored_on_google_cloud_storage(url):
            parsed_url = urlparse(url)
            return self.signed_url('GET', parsed_url.path)
        return url

    @staticmethod
    def is_stored_on_google_cloud_storage(url):
        return "storage.googleapis.com" in url


Please DO NOT REPLY directly to this email but go to StackOverflow:
http://stackoverflow.com/questions/33617044/how-do-i-push-different-file-types-in-google-cloud-storage-to-the-client-browser/33694772#33694772
Reply all
Reply to author
Forward
0 new messages