How to send Content MD5 with multipart uploads

1,072 views
Skip to first unread message

Paul Wiseman

unread,
May 23, 2012, 12:46:33 PM5/23/12
to boto-...@googlegroups.com
I have a wrapper on all the file objects I give to boto so I can control the global upload speed. The read method on the file object is overridden and the size read is checked against token bucket which will make the file object wrapper sleep for a certain amount of time if there's no tokens spare in the bucket to stop the upload speed getting exceeded.

In order to get this to work properly I need to pass in the md5 parameter because when boto reads through a file to digest the checksum, it messes up my upload speeds because I can't tell the difference between data being read for a checksum and being read for uploading.

My issue is I can't seem to work out how to specify md5 for multi part uploads and get s3 to accept it. I presume you need to do it per part as at no point does boto know about the whole file, but when I've tried I get the following:

S3ResponseError: S3ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>BadDigest</Code><Message>The Content-MD5 you specified did not match what we received.</Message><ExpectedDigest>pMpaS87PIEvv8cZiv+1nvQ==</ExpectedDigest><CalculatedDigest>1B2M2Y8AsgTpgAmY7PhCfg==</CalculatedDigest>

I was passing the md5 and the base64 digest through on upload_part_from_file, what am I doing wrong? as the above works for normal set_contents_from_file

Paul Wiseman

unread,
May 23, 2012, 4:59:40 PM5/23/12
to boto-...@googlegroups.com
Not too sure what I've changed but I've fixed this now, I seek back to the start of the file after I get the md5 which might have been the reason why it wasn't working. 

Thomas O'Dowd

unread,
May 23, 2012, 8:21:45 PM5/23/12
to boto-...@googlegroups.com
Hi Paul,

What version of boto are you using? The reason I ask is that recent
versions of boto use the file pointer differently in that they don't try
to auto-rewind it for you. That said though, the behavior is better in
that it does what you tell it.

The file pointer you pass in to upload_part_from_file() should be
pointing exactly where you want it to start reading your data from. If
you are calculating the MD5 yourself, it may be that you are the file
pointer is still pointing at the end of the data or something which is
why the data doesn't match the md5sum the server calculates for your
upload.

That said, most of the tools such as compute_md5() should do the right
thing for you and leave the fp exactly as you gave it to them. So the
following should be something like what you need.

hd, cm, cl = boto.utils.compute_md5(fp)
md5 = (hd, cm)
mpu.upload_part_from_file(fp, part_num=mypart, md5=md5)

Tom.
> --
> You received this message because you are subscribed to the Google
> Groups "boto-users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/boto-users/-/b5YJ2VHwZJEJ.
> To post to this group, send email to boto-...@googlegroups.com.
> To unsubscribe from this group, send email to boto-users
> +unsub...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/boto-users?hl=en.

--
Gemini Mobile Technologies - http://geminimobile.com/
S3 REST API Compliant Cloud Storage with Cloudian®

Reply all
Reply to author
Forward
0 new messages