new gsutil features: streaming & parallel transfers

157 views
Skip to first unread message

Mike Schwartz (Google Storage Team)

unread,
Jul 22, 2011, 5:15:37 PM7/22/11
to gs-an...@googlegroups.com, gsutil-...@googlegroups.com, Google Storage Team
Hi,

We're pleased to announce a couple of nice feature additions to the gsutil command:
  • Streaming transfers (contributed by Google Storage user Vineeth Pillai). A common situation where this is useful is uploading the output of a computational pipeline without first buffering on the local disk.
  • Parallel transfers (implemented by Google). A common situation where this is useful is if you have a fast network connection and want to upload or download a large number of objects. In this situation parallel transfers can improve throughput.
The latest version of gsutil can be downloaded here.

Please see the gsutil reference documentation for further details, or just try:
gsutil cp - gs://yourbucket/yourobject < somefile
gsutil cp -m dir_containing_many_files/* gs://yourbucket

Thanks,

Mike Schwartz and the Google Storage team

Mike Schwartz (Google Storage Team)

unread,
Jul 28, 2011, 7:44:02 PM7/28/11
to gsutil-...@googlegroups.com
Hi,

We released a new version of gsutil today that adds support for multi-threaded object remove. Since it's likely over time more gsutil commands will have multi-threading support added, we moved the "-m" option from being on the individual commands to gsutil itself. So, where previously you'd do:

gsutil cp -m dir/* gs://bucket

now you would do:

gsutil -m cp dir/* gs://bucket

And similarly:

gsutil -m rm gs://bucket/*

Thanks,

Mike Schwartz and the Google Storage Team



On Jul 22, 2:15 pm, "Mike Schwartz (Google Storage Team)" <gs-t...@google.com> wrote:
> Hi,
> We're pleased to announce a couple of nice feature additions to the gsutil
> command:
>    - Streaming transfers (contributed by Google Storage user Vineeth
>    Pillai). A common situation where this is useful is uploading the output of
>    a computational pipeline without first buffering on the local disk.
>    - Parallel transfers (implemented by Google). A common situation where
>    this is useful is if you have a fast network connection and want to upload
>    or download a large number of objects. In this situation parallel transfers
>    can improve throughput.
> The latest version of gsutil can be downloaded
> .
> Please see the gsutil reference

KevinC

unread,
Aug 11, 2011, 10:07:22 PM8/11/11
to gsutil-discuss
Mike, can you please look at this thread - This has happened again in
the last 24 hours and I'm wondering if something was changed? My image
streaming suddenly no longer works:

http://groups.google.com/group/gs-discussion/browse_thread/thread/f2b5f63846ac6c63/225c88ef2c574e23?lnk=gst&q=broke#225c88ef2c574e23

Thanks,
Kevin

On Jul 28, 7:44 pm, "Mike Schwartz (Google Storage Team)" <gs-
t...@google.com> wrote:
> Hi,
>
> We released a new
> version<http://gsutil.googlecode.com/files/gsutil_07-27-2011.tar.gz>of

Mike Schwartz (Google Storage Team)

unread,
Aug 11, 2011, 10:40:01 PM8/11/11
to gsutil-...@googlegroups.com, Google Storage Team
Kevin,

Can you please provide the bucket and object name of an image object that no longer serves correctly, and a date+time when the failure happened?

If you prefer not to post this info on the public list you can send to gs-...@google.com.

Thanks,

Mike

Mike Schwartz (Google Storage Team)

unread,
Aug 11, 2011, 10:45:28 PM8/11/11
to gsutil-...@googlegroups.com, Google Storage Team
Also, the best thing would be if you can reproduce this problem using gsutil -D, making sure to strip the Authorization header out in the log you send - please see http://code.google.com/apis/storage/docs/pricingandterms.html#support

Thanks,

Mike

KevinC

unread,
Aug 12, 2011, 9:05:14 AM8/12/11
to gsutil-discuss
Hi Mike-

A sample item is bucket 'baffle' and object name is
'd979a2caf4827dd15e4e99b72c39b31729473d0a.jpg'

I can't pinpoint exactly when it stopped working but I know it worked
on Monday, the 8th and it was not working sometime yesterday (the
11th).

I am working on getting gsutils working but the instructions no longer
match the user interface.

Thanks for your help,
Kevin

p.s., just to reiterate, I can get to the object no problem, I can no
longer read it from app and stream it to the user. the data is
somehow being corrupted.


On Aug 11, 10:40 pm, "Mike Schwartz (Google Storage Team)" <gs-
t...@google.com> wrote:
> Kevin,
>
> Can you please provide the bucket and object name of an image object that no
> longer serves correctly, and a date+time when the failure happened?
>
> If you prefer not to post this info on the public list you can send to
> gs-t...@google.com.
>
> Thanks,
>
> Mike
>
>
>
>
>
>
>
> On Thu, Aug 11, 2011 at 7:07 PM, KevinC <kevinch...@gmail.com> wrote:
> > Mike, can you please look at this thread - This has happened again in
> > the last 24 hours and I'm wondering if something was changed? My image
> > streaming suddenly no longer works:
>
> >http://groups.google.com/group/gs-discussion/browse_thread/thread/f2b...

Mike Schwartz (Google Storage Team)

unread,
Aug 12, 2011, 12:57:30 PM8/12/11
to gsutil-...@googlegroups.com, Google Storage Team
Kevin,

Can you please either grant read access to this object (e.g., by making it publicly readable, or making a READ grant to gs-...@google.com); or else could you provide the output of gsutil -D, showing the failure?

One other thing that would help: since you're saying you can access the object directly but not from your app, can you modify your app to log the complete HTTP request (including verb, URI, and all headers) and send that to us (stripping out the Authorization header)?

Thanks,

Mike

Google Storage Team

unread,
Aug 12, 2011, 1:40:07 PM8/12/11
to gsutil-...@googlegroups.com, gs-...@google.com
Kevin,

While you're gathering the info we asked for earlier, I had one other
thought: Is the content-type set correctly on this object to display as
jpg data?

Mike


Original Message Follows:
------------------------
From: "Mike Schwartz (Google Storage Team)" <gs-...@google.com>
Subject: Re: new gsutil features: streaming & parallel transfers
Date: Fri, 12 Aug 2011 09:57:30 -0700

KevinC

unread,
Aug 12, 2011, 7:31:47 PM8/12/11
to gsutil-discuss
Hi Mike-

I'm definitely sending it as jpg/data. Here's my code (ROR)

(data is the data returned from google storage)

send_data data,
:type => "image/jpeg",
:disposition => 'inline'

And again, to reiterate, I haven't touched this code in months.
Something just stopped being sent properly from GS.

I will get the work on the debug info. I believe that this image is
already publicly available.

Thanks again,
Kevin




On Aug 12, 1:40 pm, "Google Storage Team" <gs-t...@google.com> wrote:
> Kevin,
>
> While you're gathering the info we asked for earlier, I had one other
> thought: Is the content-type set correctly on this object to display as
> jpg data?
>
> Mike
>
> Original Message Follows:
> ------------------------
> From: "Mike Schwartz (Google Storage Team)" <gs-t...@google.com>
> Subject: Re: new gsutil features: streaming & parallel transfers
> Date: Fri, 12 Aug 2011 09:57:30 -0700
>
>
>
>
>
>
>
>
>
> > Kevin,
>
> > Can you please either grant read access to this object (e.g., by making
> it
> > publicly readable, or making a READ grant to gs-t...@google.com); or
> else
> > could you provide the output of gsutil -D, showing the failure?
>
> > One other thing that would help: since you're saying you can access the
> > object directly but not from your app, can you modify your app to log
> the
> > complete HTTP request (including verb, URI, and all headers) and send
> that
> > to us (stripping out the Authorization header)?
>
> > Thanks,
>
> > Mike
>

Mike Schwartz (Google Storage Team)

unread,
Aug 12, 2011, 7:46:57 PM8/12/11
to gsutil-...@googlegroups.com, Google Storage Team
Kevin,

You're right, the image is publicly readable. I can see the mime type set on the object is text/plain:

gsutil ls -L gs://baffle/d979a2caf4827dd15e4e99b72c39b31729473d0a.jpg 
gs://baffle/d979a2caf4827dd15e4e99b72c39b31729473d0a.jpg:
Object size: 70189
Last mod: Fri, 09 Jul 2010 02:01:49 GMT
Cache control: public, max-age=3600
MIME type: text/plain
Etag: 1597ef55a828eecf5c5af97988684d76

Also,  when I download it using gsutil, I'm able to view the image (it looks like the peel of an apple), and the md5 agrees with the Etag.

I'm not a Ruby programmer so I'm not going to be much help looking at your code. But given that the MIME type listed on the object is not correct for displaying image data it would be helpful to see a trace of the request/response to confirm you really are setting the MIME type as needed. One other possibility worth investigating is whether there's some behavior that has changed on our end with respect to content-disposition.

The only way I can see to troubleshoot from here is to see the HTTP request/response conversation. If you know how to use tcpdump (and can filter out other traffic, so you don't send us anything besides this one conversation), that would work (send me a binary tcpdump and I can investigate from there). Or if you have a way to log the conversation from your app, that would also work. I really do need to see the full HTTP conversation though - the request and all the headers, and the response and all the headers.

Thanks,

Mike

Kevin Chugh

unread,
Aug 12, 2011, 7:50:41 PM8/12/11
to gsutil-...@googlegroups.com
Mike, if you're seeing the peeled apple, then there's something else that is going wrong and I have incorrectly assumed it was the same problem as before.  Before you spend any more time on this, let me investigate some other possibilities.  Thanks so much for your fast response, and I will follow up with you with either the header info (I'm working on it) or if I find out it was something else.

Thanks again,
kevin

KevinC

unread,
Aug 24, 2011, 6:58:41 AM8/24/11
to gsutil-discuss
Mike thanks for your help. Just so others may benefit, it turns out
my VPS crashed and ntp wasn't started so there was a time disparity
between my server and GS and GS was throwing that error.

Kevin

Mike Schwartz (Google Storage Team)

unread,
Aug 24, 2011, 10:38:51 AM8/24/11
to gsutil-...@googlegroups.com, Google Storage Team
Thanks for the follow-up, Kevin.

I believe you're talking about the fact that with HMAC signatures the timestamp on the request can't be more than 15 minutes old (with respect to the GS server's clock). Please note that if you use OAuth2 to authenticate (the default setup if you create a new config file with a recent version of gsutil), you won't see this failure mode (OAuth2 access tokens are opaque to the client, and the client's clock is irrelevant as far as their expiry is concerned).

Mike
Reply all
Reply to author
Forward
0 new messages