Uploading large image files and video to Google Cloud Storage

8,900 views
Skip to first unread message

Richard Cheesmar

unread,
Feb 20, 2017, 10:47:26 AM2/20/17
to Google App Engine
I am using the standard python app engine environment and currently looking at how one goes about uploading multiple large media files to Google Cloud Storage (Public Readable) using App Engine or the Client directly (preferred).

I currently send a bunch of smaller images (max 20 - between 30 and 100k on average), at the same time directly via a POST to the server. These images are provided by the client and put in my projects default bucket. I handle the requests images using a separate thread and write them one at a time to the cloud and then associate them with an ndb object. This is all fine and dandy when the images are small and do not cause the request to run out of memory or invoke a DeadlineExceededError.

But what is the best approach for large image files of 20mb+ a piece or video files of up to 1GB in size? Are there efficient ways to do this from the client directly, would this be possible via the Json api ,a resumable upload, for example? If so, are there any clear examples of how to do this purely in javascript on the client? I have looked at the docs but it's not intuitively obvious at least to me.

I have been looking at the possibilities for a day or two but nothing hits you with a clear linear description or approach. I notice in the Google Docs there is a way using PHP to upload via a POST direct from the client...https://cloud.google.com/appengine/docs/php/googlestorage/user_upload...Is this just relevant to using PHP on app engine or is there an equivalent to createUploadUrl for python or javascript?


Anyway, I'll keep exploring but any pointers would be greatly appreciated.


Cheers


Adam (Cloud Platform Support)

unread,
Feb 20, 2017, 6:41:32 PM2/20/17
to Google App Engine
The simplest upload option is the POST object call to the XML API, which accepts form data and allows you to upload a file without any JavaScript using just a <form> tag with the 'action=' attribute set to the Cloud Storage API Endpoint, eg.


You can of course use an XHR in JavaScript to post the form as well. Authentication for this type of upload is done using a policy document, which is described in the Usage and Examples section of the documentation along with Python code examples and an example HTML form.

A cleaner approach is to use the Google API JavaScript Client Library to upload the file using the JSON API. There's a JavaScript example on the googlearchive GitHub page called storage-getting-started-javascript which I still use as a reference, and is still updated. This will show you how to do a multipart upload, which is sufficient for most files under 5MB. Authentication for this is done using OAuth and is handled by the GAPI client.

If you need resumable uploads, it's not too much work to adapt the concepts in Performing a Resumable Upload, which give raw HTTP examples, to the above code example. If you need to do some kind of authentication outside of Google OAuth, you can look into Signed URLs. The documentation also provides Python code samples for generating them. Uploading to a signed URL works mostly the same way as for regular GCS URLs, with some differences that are covered in the docs.

Richard Cheesmar

unread,
Feb 21, 2017, 5:55:15 AM2/21/17
to Google App Engine
Thanks, Adam, yes seen the POST object call to the XML API, had issues testing this from the localhost, is that possible, if so will look into it again.

The issue for me with the JSON APi is that i have to use a resumable upload so it gets a little complicated there. Is there a good Javascript example of this?
Also, I noted somewhere that the Google docs suggest using the client cloud storage apis over the JSOn api, does that mean the JSON api will be deprecated at some point?

Can you tell me why there is not an equivalent to the https://cloud.google.com/appengine/docs/php/googlestorage/user_upload for other languages

Cheers

Simon Green

unread,
Feb 21, 2017, 12:21:18 PM2/21/17
to Google App Engine
Just to add my vote for the resumable upload approach.

If you have large files, you don't want to use POST because any failure and you need to restart the entire thing.

With the resumable upload your AppEngine instance signs the request and gives you a URL that you can then upload to - your file is then going to GCE directly and you can restart from where it got up to in the event of a failure.

It's more efficient than doing a form post (IMO) and much simpler to code if you're on any semi-modern browser.

- Simon

Richard Cheesmar

unread,
Feb 23, 2017, 6:54:45 AM2/23/17
to Google App Engine
Simon, can you point me to a good example in javascript of coding the resumable upload.


On Monday, February 20, 2017 at 5:47:26 PM UTC+2, Richard Cheesmar wrote:

Simon Green

unread,
Feb 23, 2017, 10:15:37 AM2/23/17
to Google App Engine
I'm pretty sure the YouTube uploading uses this (my upload code ended up very similar), so this example should give you something you can adapt:


(a newer version could be a bit simpler with newer web api's like fetch, promises etc...)

Richard Cheesmar

unread,
Feb 23, 2017, 12:12:52 PM2/23/17
to Google App Engine
Tanks, Simon, I'll check it out.


On Monday, February 20, 2017 at 5:47:26 PM UTC+2, Richard Cheesmar wrote:

Richard Cheesmar

unread,
Feb 24, 2017, 11:03:49 AM2/24/17
to Google App Engine
It looks like JSON api for javascript require some form of client oauth for uploads and this is exactly what I don't want. I want any user using my system to be able to upload direct to my GCS without the necessity to have a Google account.

Is this possible?


On Monday, February 20, 2017 at 5:47:26 PM UTC+2, Richard Cheesmar wrote:

Simon Green

unread,
Feb 24, 2017, 11:09:48 AM2/24/17
to Google App Engine
Yes, your app creates the signed URL which can be based on whatever authentication you want or already have.

That's the point of the signed URLs - the app is giving authorization to upload a file, the user doesn't need to authenticate to the storage system at all.

Richard Cheesmar

unread,
Feb 24, 2017, 11:15:38 AM2/24/17
to Google App Engine
Ok, looks like you can only do this with signed requests


On Monday, February 20, 2017 at 5:47:26 PM UTC+2, Richard Cheesmar wrote:

Adam (Cloud Platform Support)

unread,
Feb 24, 2017, 1:57:36 PM2/24/17
to Google App Engine
To answer your other questions:

1) Does that mean the JSON API will be deprecated at some point?
The documentation mentions using the Google Cloud Storage Client libraries, which themselves use the JSON API, so no.

2) Why there is not an equivalent to the CloudStorageTools file upload for other languages?
The best answer to this is that the PHP standard runtime was developed last, after GCS had a mature API, so better support for GCS was baked in. It was never back-ported to the other runtimes.

The docs I linked to originally also mentioned signed URLs, and yes this is the only way to upload to GCS directly using an auth scheme that is not not Google OAuth (unless you want everything to be public writable).

Richard Cheesmar

unread,
Feb 25, 2017, 2:06:59 AM2/25/17
to Google App Engine
Ok, thank you guys, time to just get on with it.




On Monday, February 20, 2017 at 5:47:26 PM UTC+2, Richard Cheesmar wrote:

Richard Cheesmar

unread,
Feb 28, 2017, 11:40:33 AM2/28/17
to Google App Engine
Update:

Ok, so after much deliberation about how to simplify the approach, finally I get a method together that I think I can live with, but there is still one problem.

What I am doing to upload a video from the client direct (via localhost) is:

1. I generate an access token and build a POST request for the initial upload request to the JSON api.
2. I make the POST call which successfully returns a location.
3. I return the location to the client via a json object
4. I make the PUT call in the returned location which starts the upload - this is done using standard ajax via Jquery.
5. The video is actually arriving at the bucket but the ajax call returns with an error at the end of the upload

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://www.googleapis.com/upload/storage/v1/b/my-bucket/o?uploadType=resumable&name=test-vid&upload_id=generate-id. (Reason: CORS header ‘Access-Control-Allow-Origin’ missing).


Seems strange to me as the video actually gets to the bucket...

For the time being I have added the * to CORS headers on the bucket by using the Google Cloud shell and some instructions from here https://bitmovin.com/faq/how-do-i-set-up-cors-for-my-google-cloud-storage-bucket/
and this all seems fine. Well it must be otherwise the PUT wouldn't put the video, if the CORS headers were not in place.

I have no idea why this is, can someone enlighten me?

Adam (Cloud Platform Support)

unread,
Feb 28, 2017, 2:04:33 PM2/28/17
to google-a...@googlegroups.com
Have a look at this Stack Overflow answer for more details. This is an issue specific to the JSON API. In your case, does the 'Origin: ' header differ from the initial request and subsequent PUT requests?

Richard Cheesmar

unread,
Mar 1, 2017, 2:14:27 AM3/1/17
to Google App Engine
You have to be kidding me. Why isn't this plastered all over the page for the JSON API? Or better still fixed if it's an issue.

I have spent valuable time and energy on this and now I have to start again. This is not amusing, in fact it is near damn right negligence on Googles part.

Adam, you wrote this in your first reply... "A cleaner approach is to use the Google API JavaScript Client Library to upload the file using the JSON API."

This makes me want to shout and scream. Can't a company that makes driver less cars get a itzy bitzy api that touts resumeable uploads to work with CORS?



On Monday, February 20, 2017 at 6:47:26 PM UTC+3, Richard Cheesmar wrote:

Richard Cheesmar

unread,
Mar 1, 2017, 2:24:30 AM3/1/17
to Google App Engine
Oh, and look at the date on the stackoverflow post, it's not like you guys haven't had time to fix the issue.



On Monday, February 20, 2017 at 6:47:26 PM UTC+3, Richard Cheesmar wrote:

Kaan Soral

unread,
Mar 1, 2017, 2:35:13 AM3/1/17
to Google App Engine
I'm not commenting on the issue in general, as things got a bit edgy, however, my .02:

I use App Engine, or other cloud products, as they make things easy for me, for multi-part uploads, my expectancy is having wrapper libraries that make multi-part uploads dead-simple (JS, Mobile: iOS, Java, Cordova, possibly others)

In fact, while you are at it, it could even be a large/inclusive library for all high level things combined into one package, and possibly, one could hand-pick certain things and get a smaller library if size is a concern

One example was the old Channel JS library, while I haven't used it myself, design-wise it was a nice library, tackled a complex system in a simple way

From a cloud users' perspective, multi-part uploads is indeed a bit complex, and icky challenges like these consume a lot of time, and in the end, when you look back, you realise you spent a considerable amount of time on these kind of integrations/challenges and on a larger scale, it probably has a high impact

Richard Cheesmar

unread,
Mar 1, 2017, 3:36:12 AM3/1/17
to Google App Engine
Kaan,

You're spot on regarding impact. You spend most of your time trying to get over hurdles like these, whilst you should be spending time on the core of the product functionality. I personally think that the web and the cloud have become way to complex in this regard and feel it's up to companies like Google to make functionality like this way simpler.

I agree that the cloud products such as Googles, especially the app engine, ndb... have made my project simpler in many ways. However, if what you gain is then taken away in other ways then you have to consider the time spent as a whole, not to mention the stress that developers are under for timelines and costs etc. etc. This particular problem is taking way too much of my time at the moment.


On Monday, February 20, 2017 at 6:47:26 PM UTC+3, Richard Cheesmar wrote:

Simon Green

unread,
Mar 1, 2017, 12:22:53 PM3/1/17
to Google App Engine
As with any platform, experience and knowledge is extremely important and has a huge impact on how long it takes to do things. It can be frustrating to run into issues and stressful if we've promised a deliverable based on a tech stack or service we might not be completely familiar with.

I think it's unfair to be quite so critical given the prompt help that has been provided first for the question about general approach and then for this specific technicality. The docs do mention it and there are a few answers that come up when you search (another one here: http://stackoverflow.com/questions/27281825/google-storage-api-resumable-upload-ajax-cors-error). The solution is very simple so IMO it's no show-stopper.

Resumable uploads do have some significant advantages and always seem much simpler than the chunked upload and reassembly that was the previous solution to large file uploading.

Just my 0.02c ...
Message has been deleted

Richard Cheesmar

unread,
Mar 1, 2017, 3:28:08 PM3/1/17
to Google App Engine
Simon and Adam,  I deleted my initial reply, kinda got tetchy. Yes, you are correct I did get rapid responses, this forum has improved dramatically over time. I also accept that i did get pretty tetchy, however, I'm going to write that off as tech stress.

I am sure that Google support staff are all trying their hardest, but the docs and lack of concrete examples exasperate problems and one doesn't always have the time to spend hours on stackoverflow day after day. Given that,  I concur that not being as familiar as one would like with all the different aspects of Googles Cloud offerings and apis, which are numerous to say the least, is a drawback when it comes to getting things done in certain time frames. Hence my request for more recipe examples. It's so much easier to understand how things work with coded examples. Yes you can find them here and there but invariably they're not what your looking for or incomplete etc. etc.

Anyway, I have since changed the code to use the XML API - still doing the POST request from the server as the docs indicate that doing this and then passing the location to the client negates the requirement for a signed url on as the location in the POST is thus signed anyway. This works in as much as the preflight OPTIONS request and the POST request both return 200. However, the latter, for some reason, still returns an error -

CORS header Access-Control-Allow-Origin missing

So that is not working. My next step is to sign a POST request on the server pass it to the client and send it from there. That will have to wait until the morning, it's late and the wine and zanax are calling me.



On Monday, February 20, 2017 at 6:47:26 PM UTC+3, Richard Cheesmar wrote:

Simon Green

unread,
Mar 1, 2017, 4:05:54 PM3/1/17
to Google App Engine
If you had everything working other than that JSON + CORS issue (i.e. signing a URL on the server and doing a PUT for the resumable upload) then all you need to do to make it work should be to pass the origin header from the client in that initial signing request.

As a general principle, I'd try and avoid the "thrashing" approach of changing the whole system instead of solving the last problem. Don't get me wrong, something an approach is just "wrong" and it's better not to invest any more time on it but I've found jumping about between solutions does burn a lot of time - it's easy to end up re-implementing sections that were otherwise "done" rather than building on the previous effort (and learning).

I can't remember if you posted what language you're using, if it's Go then I probably have some example code that will help. I might have some Python too but not 100% sure.

Simon Green

unread,
Mar 1, 2017, 4:12:12 PM3/1/17
to Google App Engine
Ha, then I remembered I could scroll up (duh!)

So for Python, you'll have something like this:

# Get the origin header that the client sends to your API when it wants to initiate an upload:
origin = request.headers.get('origin')

# Include it as a header in the GCS signing request:
headers = {
    'Origin': origin,
    'X-Upload-Content-Type': type,
    'X-Upload-Content-Length': size,
    'Content-Type': 'application/json',
    'Authorization': 'Bearer ' + access_token
}

# Call GCS
response = urlfetch.fetch(
    url=url,
    method=urlfetch.POST,
    headers=headers
)

# Return the signed URL to the client
location = response.headers['location']
return jsonify({'url': location})

The client then uses the url that it got from the API call and does a PUT of the file. If you're using the newer promise-based client APIs like fetch, this part can be pretty simple.

Richard Cheesmar

unread,
Mar 1, 2017, 4:46:49 PM3/1/17
to Google App Engine
Was just about to turn the computer off after winding down reading some non tech stuff.

I had everything the origin idea. Wow, see what blind crazy fuzzy stress does to the beautiful mind...Messes it right up.

Thanks, appreciated. It works. phew.


On Monday, February 20, 2017 at 6:47:26 PM UTC+3, Richard Cheesmar wrote:

Adam (Cloud Platform Support)

unread,
Mar 4, 2017, 3:15:39 PM3/4/17
to google-a...@googlegroups.com
Glad to hear you got it working. Regarding my last response, it's not an issue in that its a bug that needs to be fixed, I meant that it's an issue that people encounter when doing resumable uploads with the JSON API. It isn't actually supposed to work when the Origin: is different between the preflight and subsequent requests. The behavior is mentioned in the callout in the CORS documentation as well as in point 8 in the troubleshooting section, though perhaps it can be more clear - for example mentioning that it expects an OPTIONS preflight request which is normally gets sent by the browser for the initial upload, and for this reason (and the fact that the origins will be different) it isn't possible to make the initial request on the server and pass the resulting location back to the client.

Attila-Mihaly Balazs

unread,
Apr 26, 2017, 2:34:47 AM4/26/17
to Google App Engine
Just a quick note: we're using FileStack for the uploads (https://www.filestack.com/) and it works pretty well - they have a Google Storage integration and I do believe they also have a mobile solution (though we're not using that ATM). My only gripe is that they require "storage admin" access to your project for the Google Storage integration to work. http://stackoverflow.com/questions/40802157/what-are-the-minimum-permissions-needed-by-filepicker-io-when-integrating-with-g/40810336#40810336

Attila

jayshekar harkar

unread,
May 23, 2017, 2:14:14 PM5/23/17
to Google App Engine
Re: Resumable Uploads, 

2.To get a resumable session URI, sent a post request to the signed URL (without the Authorization header) and it responds with an HTTP 401 error 
  but when I include the authorization header with a valid auth token, it works.

So I wonder what's the point of the signed URL in this particular case? Signing the URL and also providing the authorization token seems redundant to me unless my workflow is incorrect. Please shed some light on this.

Simon Green

unread,
May 23, 2017, 2:32:16 PM5/23/17
to Google App Engine
The signed URL that you pass back to the client provides a way for an unauthenticated user (to GCS) to upload files directly to a GCS bucket. Without this, every user would need an account and permissions setup or all the files would have to go through your own service instead.

jayshekar harkar

unread,
May 25, 2017, 2:09:17 PM5/25/17
to Google App Engine
Thanks for the clarification. I am trying to prepare a signed URL on the server side so that the client can initiate the request. 
I have come up with the following Java code but when the client makes the request to the signed URL, it fails. Could you/anyone please take a quick look and tell me if I missed anything obvious?

String stringToSign = "POST" + "\n" + 
  "" + "\n" + 
  "" + "\n" + 
  expiration + "\n" + 
  "x-goog-resumable:start" + "\n" +
  "/upload/storage/v1/b/{myBucket}/o?uploadType=resumable&name="+{myObject};
  
byte[] rawSignature = SHA256RSA.signSHA256RSA(stringToSign, myPrivateKey);
String encodedSignedString = new String(Base64.encodeBase64(rawSignature, false), "UTF-8");

//myObject is sent from the client that he/she wants to upload
String googleAccessStorageId = {myGoogleServiceAccount};

String queryParams =  "&GoogleAccessId=" + googleAccessStorageId +
 "&Expires=" + expiration + 
 "&Signature=" + URLEncoder.encode(encodedSignedString, "UTF-8"); 

String fullUrl = baseUrl + queryParams;

return fullUrl;

jayshekar harkar

unread,
May 25, 2017, 2:19:42 PM5/25/17
to Google App Engine
btw here is the client side code to give you a full picture.

  var xhr = new XMLHttpRequest();

  xhr.open("POST", {url_obtained_from_the_server}, true);
  
  xhr.onload = function(e) {
    if (e.target.status < 400) {
      var location = e.target.getResponseHeader('Location');
      this.url = location;
      this.sendFile_();
    } else {
      this.onUploadError_(e);
    }
  }.bind(this);
  xhr.onerror = this.onUploadError_.bind(this);
  xhr.send();
Reply all
Reply to author
Forward
0 new messages