Hi golang-dev,
My team is building an application where clients upload content to a digest-addressed store. Clients push up the data in multiple chunks with multiple HTTP requests, then finalize the upload by telling the server what the SHA2 digest of the content should be. The server verifies this then places the content in the store, keyed by that SHA2 digest. The problem that we have is that final step where the server has to verify the digest. It has to read all of the content again from the backend and compute the digest to compare it with what the client computed. This is very slow for large uploads.
A performance optimization that we've experimented with is for the server to use a resumable version of SHA2 which allows you to snapshot the current digest state at the end of each upload chunk request and store the state of the digest so that a hash object can be restored from that state on the next chunk upload. When finalizing the upload, the server only has to restore the digest state, call `Sum(nil)`, and compare it with the digest from the client, this saves us the slow/expensive read from our storage backend (AWS S3).
I was able to do this by forking `crypto/sha256` and `crypto/sha512` and add 3 new methods to implement a new ResumableHash interface:
type ResumableHash interface {
// ResumableHash is a superset of hash.Hash
hash.Hash
// Len returns the number of bytes written to the Hash so far.
Len() uint64
// State returns a snapshot of the state of the Hash.
State() ([]byte, error)
// Restore resets the Hash to the given state.
Restore(state []byte) error
}
Of course, I didn't modify any of the SHA2 logic, I put all of my changes in a new file [1] which uses `encoding/gob` to marshal/unmarshal the `digest` struct, and added a new test file [2] which compares it with the stdlib Hash to ensure that State() and Restore() work correctly. Any other changes are only to imports or return types.
I know that there are very few use cases for this, but it is a huge performance improvement for our application. Is there any chance of getting something like this implemented upstream in the Go standard library?
- Josh