On Fri, Mar 23, 2012 at 1:18 PM, Evan Worley <ev...@dnanexus.com> wrote:
> Very cool stuff, all. I'm wondering about the multi-threaded support. All
> the documentation leads me to believe that it only applies to a large number
> of files, and not a single large file. I've got some 100GB+ files to upload
> to Google Cloud Storage, and the transfer rate is far below the available
> bandwidth. Have you considered parallelism on the single file level? We've
> run some tests, and have found that we can achieve must better throughput
> with concurrent connections on a decent network (I've observed up to 6-8X
> throughput).
FWIW, I did some work along the same lines, but the need for dealing
with the large files vanished and I never put the work into
production. IIRC, I found a significantly greater than 6-8x
performance increase, but I am working within a production data-center
with pretty good connectivity. As I recall, I was getting a 10G file
up (and also down) in around 30 seconds where any other form of single
stream transfer (rsync, scp, etc) would take many times
that...particularly a transcontinental transfer.
The approach I took was to split the file and store it in parts in
Google Storage, and adhering to a convention I defined. The download
part of the code knew how to interpret the convention to fork
downloads and re-assemble the file. This is somewhat cumbersome of
course. It would be great if there were some way to build support for
such an operation into the Google Storage infrastructure itself.
Thanks,
- Tom
Google Cloud Storage does support parallel *reads* within a single file, in the form of range GETs. But we don't support parallel *writes* within a single file.
--
You received this message because you are subscribed to the Google Groups "Google Cloud Storage" group.
To post to this group, send email to gs-dis...@googlegroups.com.
To unsubscribe from this group, send email to gs-discussio...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/gs-discussion?hl=en.