Python SDK: providing MD5 checksum for chunks in bulk put request

138 views
Skip to first unread message

GergelyM

unread,
Oct 31, 2019, 6:01:18 AM10/31/19
to Spectra Logic S3 SDKs, APIs, and RioBroker
Hi,

We're implementing a solution to send data to our Black Pearl via the DS3 Python 3 SDK.
The issue is with writing (bulk) data to buckets which have CRC end-to-end option enabled (*required).

We're using the (slightly customised) sample solution published on your GitHub page at https://github.com/SpectraLogic/ds3_python_sdk/blob/master/samples/bulk_with_prefix.py
But we receive error, when

- Sending bulk data without supplying MD for the chunks

Traceback (most recent call last):
...
ds3
.ds3network.RequestFailed: Return Code: Expected [200] - Received 400
Code=BadRequest
HttpError=400
BadRequest[400]: End-to-end CRC is required for this bucket. You are required to calculate and transmit a MD5 to ensure data integrity.


- The API code/docs doesn't give any clue where and how the MD5 hash should be supplied with the request. I assume the API expects the MD5 hash of the current chunk and it supposed to be sent in the header as Content-MD5. 
when I send the data (regardless of the number of files in the batch 1 or more)

Traceback (most recent call last):
...
ds3
.ds3network.RequestFailed: Return Code: Expected [200] - Received 403
Code=InvalidSecurity
HttpError=403
InvalidSecurity[403]: Authorization signature is invalid.
Caused by IllegalArgumentException: Authorization digest from client was incorrect. Valid string to sign was:PUT\n\napplication/octet-stream\nWed, 30 Oct 2019 15:00:09 -0000\nx-amz-meta-content-md5:kkEfdcoYfPtXbWMRpKwXwA==\n/e2-**********/Data/**********%20data/Final_BSCardBackup1/FILE_AA.BSW

The credentials for BlackPearl are correct and we can read files|meta - in fact, using the same client created to read the content of the whole library before attempt to write.

The function to get the MD5 for a chunk

def get_MD5_of_chunk(data_chunk):
    hasher
= hashlib.md5()
    hasher
.update(repr(data_chunk).encode('utf-8'))     # to overcome: TypeError: object supporting the buffer API required
   
return base64.b64encode(hasher.digest()).decode("utf-8")

and the part of the customised sample bulk-put (linked above) where the Content-MD5 is being added to the request-header

...
    ds3_put_object_request
= ds3.PutObjectRequest(
        bucket_name
=ds3_target_bucket,
        object_name
=ds3_obj['Name'],
        length
=ds3_obj['Length'],
        stream
=object_data_stream,
        offset
=int(ds3_obj['Offset']),
        job
=bulk_put_result.result['JobId'])
   
# as an attempt to sort out end-to-end CRC issue with the DS3 API here we generate
   
# the MD5 checksum of the file chunk, and append it to the HTTP header
    chunk_MD5
= get_MD5_of_chunk(ds3_put_object_request.body)
    ds3_put_object_request
.headers['Content-MD5'] = chunk_MD5
    client_handle
.put_object(ds3_put_object_request)
...

I have the feeling that the solution is simple but it's not obvious to us.
Please advise.







jeffbr

unread,
Nov 1, 2019, 8:32:44 PM11/1/19
to Spectra Logic S3 SDKs, APIs, and RioBroker
It appears that you are doing everything right. Could you please log a ticket with our Support Team (https://support.spectralogic.com)? They can look at the logs from the BlackPearl to better assess what is going wrong. Thanks.
Reply all
Reply to author
Forward
0 new messages