Hi Andrew,
Great discussion. See comments inline below.
On 10/26/18 10:12 AM, Andrew Hankinson wrote:
> Thanks for reading and giving feedback, Stefano!
>
> The initial choice of SHA512 was not intended to limit the use of newer or better algorithms, but to limit the use of older and insecure (but still widely adopted) ones. We intended to set a 'baseline' with the selection of SHA512 as the hashing function of choice.
>
> The hashing functionality forms a core part of the content addressability function and versioning with OCFL. Limiting it to a known algorithm was intended primarily to support the implementation of OCFL versioning (that is, to support the assignment of a content hash with a path). In that sense it is 'just' an implementation detail. That it can also function as fixity is (almost) just a side-effect, albeit a highly useful one.
I agree. Of course, if the implementer were free to use an algorithm of
their choice, they should be using the same across the system.
>
> It's true that we don't know what future problems may be discovered in SHA-512, but the same could be said with any (or all! [1]) choices. I think we have to make the best choice we can, given the available information.
Right, but why not leave that decision to the implementer? How about a
wording along the lines of: "OCFL Objects SHOULD use a cryptographic
hashing algorithm responding to modern security standards [reference to
non-normative list of suggestions]" (and similarly for the following
MUST sentence)?
One more thing to consider is interoperability: SHA2 is extremely
popular, BLAKE2 not as much but still has many programming language
implementations; more niche solutions with fewer bindings may limit the
technology choices to retrieve and manipulate an OCFL archive (but may
still be fine for an institution which controls its technology and wants
to take advantage of a particular algorithm).
>
> In addition, we also have the 'fixity' section, which allows for additional non-functional hashes (that is, storing fixity values, but not part of the mechanism of versioning within OCFL). I think it would be fine to add other well-supported algorithms to our list of 'approved' ones as new ones get adopted. We would need to be careful with BLAKE2b, however, since it seems it can vary in size which may introduce some ambiguity in what gets stored vs. what gets checked (we wouldn't want to confuse a BLAKE2-256 with a SHA-256 hash).
>
> "BLAKE2b (or just BLAKE2) is optimized for 64-bit platforms and produces digests of any size between 1 and 64 bytes." [2]
This is disambiguated by the `digestAlgorithm` property [1] and the
inventory digest file naming convention [2], right?
[1]
https://ocfl.io/0.1/spec/#inventory-structure
[2]
https://ocfl.io/0.1/spec/#inventory-digest