Is it possible to pipeline a file upload with a download in Google Cloud Storage?

126 views
Skip to first unread message

Hector Montaner

unread,
Apr 19, 2016, 1:11:41 PM4/19/16
to Google App Engine
What I want to achieve is one client uploads a file to Google Cloud Storage (several GB) and another client starts to download it before the upload is finished, in a pipelined way.
Is it possible? If not, what alternatives do I have?

Nick (Cloud Platform Support)

unread,
Apr 21, 2016, 5:53:24 PM4/21/16
to Google App Engine
Hey Hector,

While this isn't possible in the way you described exactly, since you can only begin fetching an object once it's been uploaded successfully, there is a solution which accomplishes essentially the same thing, although it will require a slight bit of sophistication in the uploading client to ensure things go well:

Have the uploading client send chunks of the file to the Cloud Storage API as a multipart upload while, at the same time, sending these chunks to the other client (or sends them to a location where it will be available to the other client upon request).

The complexity arises when you need to perform 2 multi-part upload processes, with retry / back-off logic for the two streams being independent. 

I hope this is clear enough in my explanation, let me know if you have any questions.

Cheers,

Nick
Cloud Platform Community Support

Nick (Cloud Platform Support)

unread,
Apr 25, 2016, 9:33:07 AM4/25/16
to Google App Engine
Hey Hector,

Just thinking a bit more about your situation, you might find, depending on the scaling of this pattern, that the best way might be to upload to an instance that will process the upload chunks while sending the chunks into a Cloud Pub/Sub channel. The Pub/Sub infrastructure will then ensure that any subscribers to that channel will receive the chunked message at least once. This is a different system to the one described in my last comment, which will have a different approach in terms of building reliability and retry / recovery logic. You can read about Pub/Sub in the documentation.

Cheers!


Nick
Cloud Platform Community Support 

On Tuesday, April 19, 2016 at 1:11:41 PM UTC-4, Hector Montaner wrote:

Hector Montaner

unread,
Apr 26, 2016, 9:01:31 AM4/26/16
to Google App Engine
Hi Nick,

Thanks for answering! Your solution has one inconvenient for me: the downloading client needs also a bit of sophistication, not only the uploading client.

If I understood correctly, the downloading client will download one chunk after another. If this is the case, the browser would have to download all chunks, keep them in memory, assemble them into one blob, and then trigger the "download" from memory to the user file system. This is a problem if the file size is several GBs.

I know there are some browsers like Chrome that allow writing files to internal filesystem so that the assemble process can be done in disk. However, I need this process specially for those browsers that do not support FileSystem writing :)

The best solution I can think of is to have an instance machine with for example Apache+PHP that accepts an HTTP Get request for the download. The PHP would iterate over the chunks and output them to the client in the body of the only one HTTP Get request, and wait when the rest of chunks have not yet been uploaded. I think this will work, but it would not be cheap, as I would have to pay for the instance hours and for the bandwidth consumed to send chunks from GAE to the client.

What is your opinion?

Nick (Cloud Platform Support)

unread,
Apr 26, 2016, 12:04:37 PM4/26/16
to Google App Engine
Hey Hector,

Ah, yes, I was operating with the assumption that the "client" machine was a server machine itself, although if it's a user's machine and a browser, the ability to download files in a chunked manner is somewhat different. The PHP scheme envisioned would work if the PHP machine were a Compute Engine instance, although be aware that the PHP App Engine runtime, due to the App Engine serving infrastructure, buffers whole responses until the request-handler terminates, so a chunked download from a PHP App Engine module wouldn't function here. 

As for network costs, I think it's impossible to avoid network transfer of data from the upload machine to the machine that the user's browser will connect to for a chunked download. One stream of data is going to Cloud Storage, and another is going to a location that it will be possible to forward the chunks to the user from. 

As to the question of filesystem writing, really the only cross-browser way to do this is with file downloads.

At this point, if you don't want to stream to both Cloud Storage and the machine the client connects to before the upload is completed, it seems possible that you could just cut losses and wait for the upload to complete before initiating a download from the client? Is there any particular reason that's undesirable, other than the time offset? Perhaps the uploads and downloads will be very large files? Maybe there's a way to structure the program so that the upload-wait period isn't damaging to the user experience / system behaviour?

If you'd like, you could make a Feature Request in the Public Issue Tracker for the Cloud Platform to have some means of chunked-downloading an object while it's being chunked-uploaded. We always encourage people to make Feature Requests, it helps improve the platform!

I hope these thoughts have been helpful; let me know your thoughts.

Sincerely,


Nick
Cloud Platform Community Support

Hector Montaner

unread,
Apr 28, 2016, 9:53:54 AM4/28/16
to Google App Engine
Hi Nick,

yes, I would have to use a Compute Engine instance, mainly because an App Engine Endpoint has a time limit that would be exceeded by large files uploading/downloading. On the other hand, I didn't know that thing you mention about the buffering of the whole response in App Engine, it is good to know, thanks!

Waiting for the upload process to finish before starting the download process is what I was trying to avoid, that's the hole point :)

Thanks for the tip, I will send a Feature Request.

You have been very helpful. Thanks again!

Nick (Cloud Platform Support)

unread,
Apr 28, 2016, 11:43:18 AM4/28/16
to Google App Engine
Hey Hector,

Glad to hear I've been of help! We'll keep an eye out for the feature request and it should be acknowledged quickly.

Best of luck!


Nick
Cloud Platform Community Support

Reply all
Reply to author
Forward
0 new messages