ENHANCEMENT: Callback in boto.s3.key.Key.send_file and get_file

157 views
Skip to first unread message

Daniel Goodman

unread,
Apr 30, 2007, 3:29:03 PM4/30/07
to boto-users
After uploading and downloading large files I realized that it would
really help to have a progress callback for uploading and downloading
files. It just means that if we build a GUI around it, we can have
some real progress data being shown to the user.

lets assume that the format of the callback is
callback(progress,file_size), then we could add Key.setCallback(fn)
and make a change like (horrible pseudo code) - (taken from
Key.send_file)

cnt = 0
if None != self.cb:
self.cb(cnt,self.size)
try:
l = fp.read(4096)
while len(l) > 0:
http_conn.send(l)
if None != self.cb:
cnt = cnt+len(l)
self.cb(cnt,self.size)
l = fp.read(4096)
response = http_conn.getresponse()
body = response.read()
except Exception, e:
self.bucket.connection.make_http_connection()
print 'Caught an unexpected exception'
raise e

Mitchell Garnaat

unread,
May 1, 2007, 8:05:01 AM5/1/07
to boto-...@googlegroups.com
Good suggestion.  I'll try to incorporate something like this this week.

Thanks,

Mitch

mitch

unread,
Jun 4, 2007, 5:08:25 PM6/4/07
to boto-users
I kind of lost track of this one. This is now entered as an issue so
hopefully I won't lose it again.

http://code.google.com/p/boto/issues/detail?id=66&can=2&q=

Mitch

Alexey Melchakov

unread,
Jun 5, 2007, 4:39:46 AM6/5/07
to boto-users

I suggest its better to provide callback function right to send_file
and get_file, not setting it to Key object.

Mitchell Garnaat

unread,
Jun 5, 2007, 7:18:33 AM6/5/07
to boto-...@googlegroups.com
I agree. Adding the callback is easy.  Trying to come up with sane approach for when the callback gets called is a little trickier.  I don't think you just want to call it each time through the loop because for large files that's just way too much information.  For small files, you probably don't want it called at all.

Mitch

On 6/5/07, Mitchell Garnaat <mi...@garnaat.com> wrote:
I agree. Adding the callback is easy.  Trying to come up with sane approach for when the callback gets called is a little trickier.  I don't think you just want to call it each time through the loop because for large files that's just way too much information.  For small files, you probably don't want it called at all.

Mitch

mitch

unread,
Jun 5, 2007, 11:34:41 AM6/5/07
to boto-users
Here's what I've implemented in my dev directory. It seems to work
fine so I'll probably check it in today.

I added a 'cb' parameter to send_file and get_file and all of the
higher-level methods that end up calling those two. The parameter has
a default value of None so if you don't specify anything it will act
just like it used to. If you do pass a value for cb it should be a
function like this:

def mycb(bytes_so_far, total_bytes):
<do whatever you need to do>

And then you would call a relevant method like this:

>>> k.set_contents_from_filename('foo.txt', cb=mycb)

To calculate when to call this callback, I use this formula:

file_size / buffer_size / 10

buffer_size is currently 4096 so if the file is smaller than
buffer_size*10, your callback will never be called. If it's larger,
it will be called a total of 10 times, basically whenever another
1/10th of the file has been transferred. I have actually
parameterized the number of callbacks on the lower level methods but
I'm not sure it's useful to bubble that up to all of the high-level
methods or not. Would you ever want your callback called more than 10
times in a transfer? I suppose for larger files you might.

Comments?

Mitch

On Jun 5, 7:18 am, "Mitchell Garnaat" <mitch.garn...@gmail.com> wrote:
> I agree. Adding the callback is easy. Trying to come up with sane approach
> for when the callback gets called is a little trickier. I don't think you
> just want to call it each time through the loop because for large files
> that's just way too much information. For small files, you probably don't
> want it called at all.
>
> Mitch
>

> On 6/5/07, Mitchell Garnaat <m...@garnaat.com> wrote:
>
>
>
> > I agree. Adding the callback is easy. Trying to come up with sane
> > approach for when the callback gets called is a little trickier. I don't
> > think you just want to call it each time through the loop because for large
> > files that's just way too much information. For small files, you probably
> > don't want it called at all.
>
> > Mitch
>

Daniel Goodman

unread,
Jun 5, 2007, 3:04:22 PM6/5/07
to boto-...@googlegroups.com
Here is another way that may be simpler logic.

Why not call the cb before and after you start transferring. That way you are always guaranteed to receive a start (0%) and end (100%) cb. Then send updates as each block is sent. This is independent of the block size and is unaffected even if the file size is smaller then the block size. If I was implementing a GUI this would make my life much easier, otherwise I may receive a 70% and then 100% message for small files, but no start message based upon the logic below.

It makes no difference how often the cb is called. If you only want it 10 times, then you can handle it in your own cb and then call another. If you want fine grained then you would get that also.

The more I think about it, we would defiantly want it more then 10 updates. If I am uploading a 5GB file, then receiving an update every 500MB would make it look like it has frozen. I vote to send with the finest granularity possible and let us decide when to act on it.

regards

Daniel

Mitchell Garnaat

unread,
Jun 5, 2007, 3:39:07 PM6/5/07
to boto-...@googlegroups.com
I like your idea of always calling at the start and finish.  That makes sense.

But do you really want your callback called every 4096 bytes for a 5GB file?  That's like, 1.2 million function calls or something.  I'm struggling to see how that could be useful.  Way too much information... 8^)

Mitch

Daniel Goodman

unread,
Jun 5, 2007, 4:02:50 PM6/5/07
to boto-...@googlegroups.com
It depends on how you look at it. In my case I would be using it to trigger animation to show the user the transfer is continuing, updating the average transfer rate and updating the progress bar.

The issue here is the data frequency, not the total amount of data. By this I mean that if I upload 1x1GB file for 10x100MB file to 1000x1MB files it should give me the same total number of callbacks, because the user expects the same responsiveness of the GUI/update independent of the current file size.

If we assume that my upload/download rate is consistent, then I would expect to get callbacks consistently regardless of my file size. This is not going to cause undue load on the app as the number of callbacks/sec remains constant regardless of file size.

If this still worries you, then may I suggest that you add an additional parameter that allows the user to specify either -
1) how many bytes must be transferred
2) total updates e.g. 100 would mean send update every 1%, 1000 every 0.1% and 0/None means every block

regards

Daniel G
--
regards

Daniel Goodman, C.I.S.S.P
goo...@gmail.com

Mitchell Garnaat

unread,
Jun 5, 2007, 4:53:53 PM6/5/07
to boto-...@googlegroups.com
Your option #2 is what I have already implemented although I didn't bubble the num_cb parameter all the way out to the get_contents_* and set_contents_* methods.  I just did that and I'm testing it out a bit more.  I'll check this version in later tonight and then, perhaps, you and others could give it a try and see if it does what you want.

Thanks for all of the input.  I appreciate it.

Alexey Melchakov

unread,
Jun 6, 2007, 7:18:04 AM6/6/07
to boto-users

If you want to control amount of callback calls lets change callback
interface:

n = callback(bytes_so_far, total_bytes)

callback may return None or int. If None is returned - callback is
called on next transferred block. If int is returned callback is
called if total_amount_of_transferred_bytes >
counter_returned_from_callback.

Callback may calculate counter basing on total size of file and
download speed.

Or recieve None, this way is called as usial.

On Jun 6, 12:02 am, "Daniel Goodman" <goo...@gmail.com> wrote:
> It depends on how you look at it. In my case I would be using it to trigger
> animation to show the user the transfer is continuing, updating the average
> transfer rate and updating the progress bar.
>
> The issue here is the data frequency, not the total amount of data. By this
> I mean that if I upload 1x1GB file for 10x100MB file to 1000x1MB files it
> should give me the same total number of callbacks, because the user expects
> the same responsiveness of the GUI/update independent of the current file
> size.
>
> If we assume that my upload/download rate is consistent, then I would expect
> to get callbacks consistently regardless of my file size. This is not going
> to cause undue load on the app as the number of callbacks/sec remains
> constant regardless of file size.
>
> If this still worries you, then may I suggest that you add an additional
> parameter that allows the user to specify either -
> 1) how many bytes must be transferred
> 2) total updates e.g. 100 would mean send update every 1%, 1000 every

> 0.1%and 0/None means every block
>
> regards
>
> Daniel G

Reply all
Reply to author
Forward
0 new messages