OCFL specs feedbackk

44 views
Skip to first unread message

Stefano Cossu

unread,
Oct 25, 2018, 5:59:19 PM10/25/18
to ocfl-co...@googlegroups.com
Hello there,
I started reading through the OCFL alpha1 draft specs and have a comment
about the digest algorithm section [1]:

> OCFL Objects SHOULD use sha512 by default.

And, more strongly:

> For content addressability OCFL Objects MUST use either sha256 or
sha512, to reduce the likelihood of digest collisions.

I'm uncomfortable using strong terms to suggest or even mandate a
specific digest algorithm, unless there is a reason why SHA-512 is the
absolute best fit for the job. Some people may argue that there are
algorithms out there that are better than the SHA family. For example,
BLAKE2 [2] comes to mind as an alternative worth mentioning.

SHA-512 may be indicated as the default choice given its popularity, but
a broader choice of recommendations would be beneficial.

Also, using strong requirements for SHA-512 ties the spec to the
reliability of SHA-512, which contrasts with the technology-agnostic
nature of it.


Thoughts are welcome.
Stefano

[1] https://ocfl.io/0.1/spec/#digests
[2] https://blake2.net/

--
Stefano Cossu
Director of Application Services, Collections

The Art Institute of Chicago
116 S. Michigan Ave.
Chicago, IL 60603
312-499-4026

Metz, Rosalyn

unread,
Oct 26, 2018, 9:39:26 AM10/26/18
to Stefano Cossu, ocfl-co...@googlegroups.com
Thanks for the feedback Stefano.

The strong language around using SHA256 or SHA512 is because they minimize the risk of collisions. It's my (limited) understanding that BLAKE2 also limits the likelihood of collisions. Is that your understanding as well?

Thanks,
Rosalyn


------------------------------------
Rosalyn Metz
Director, Library Technology and Digital Strategies
Libraries & Information Technology Services
Emory University
(o) 404.727.4680
(c) 404.831.8448
--
You received this message because you are subscribed to the Google Groups "Oxford Common File Layout Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ocfl-communit...@googlegroups.com.
To post to this group, send email to ocfl-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ocfl-community/03fef925-db23-afe0-f49e-c163d7788a1f%40artic.edu.
For more options, visit https://groups.google.com/d/optout.



________________________________

This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).

Stefano Cossu

unread,
Oct 26, 2018, 10:46:06 AM10/26/18
to Metz, Rosalyn, ocfl-co...@googlegroups.com
Hi Rosy,
Yes, while SHA512 is an excellent choice in general, there are other
hashing algorithms (BLAKE2, Keccak, etc.) that are equal or better in
terms of strength and performance (which is very important for archiving
large files).

My point here, however, is not which algorithm is best, but about the
flexibility of the spec. I think it would be better if the spec were
quiet about the algorithm of choice, and added a non-normative section
where non-binding suggestions can be made.

Stefano

[1] I haven't tested personally, just read the documentation.

Stefano Cossu

unread,
Oct 26, 2018, 10:46:16 AM10/26/18
to Metz, Rosalyn, ocfl-co...@googlegroups.com
Hi Rosy,
Yes, while SHA512 is an excellent choice in general, there are other
hashing algorithms (BLAKE2, Keccak, etc.) that are equal or better in
terms of strength and performance (which is very important for archiving
large files).

My point here, however, is not which algorithm is best, but about the
flexibility of the spec. I think it would be better if the spec were
quiet about the algorithm of choice, and added a non-normative section
where non-binding suggestions can be made.

Stefano


On 10/26/18 8:26 AM, Metz, Rosalyn wrote:

Andrew Hankinson

unread,
Oct 26, 2018, 11:27:50 AM10/26/18
to Stefano Cossu, Metz, Rosalyn, ocfl-co...@googlegroups.com
Thanks for reading and giving feedback, Stefano!

The initial choice of SHA512 was not intended to limit the use of newer or better algorithms, but to limit the use of older and insecure (but still widely adopted) ones. We intended to set a 'baseline' with the selection of SHA512 as the hashing function of choice.

The hashing functionality forms a core part of the content addressability function and versioning with OCFL. Limiting it to a known algorithm was intended primarily to support the implementation of OCFL versioning (that is, to support the assignment of a content hash with a path). In that sense it is 'just' an implementation detail. That it can also function as fixity is (almost) just a side-effect, albeit a highly useful one.

It's true that we don't know what future problems may be discovered in SHA-512, but the same could be said with any (or all! [1]) choices. I think we have to make the best choice we can, given the available information.

In addition, we also have the 'fixity' section, which allows for additional non-functional hashes (that is, storing fixity values, but not part of the mechanism of versioning within OCFL). I think it would be fine to add other well-supported algorithms to our list of 'approved' ones as new ones get adopted. We would need to be careful with BLAKE2b, however, since it seems it can vary in size which may introduce some ambiguity in what gets stored vs. what gets checked (we wouldn't want to confuse a BLAKE2-256 with a SHA-256 hash).

"BLAKE2b (or just BLAKE2) is optimized for 64-bit platforms and produces digests of any size between 1 and 64 bytes." [2]

That said, I think you have highlighted a place where the documentation can be much clearer in our guidance for this. I will file an issue on our spec.

Happy to hear any further comments,
-Andrew


[1] I can imagine a bug found in a widely-used random number generator, or a chip-specific bug in calculating hashes, would have widespread effects across all hashing algorithms.
[2] https://tools.ietf.org/html/rfc7693#section-1
> To view this discussion on the web visit https://groups.google.com/d/msgid/ocfl-community/4ad06478-09d6-6680-6a7f-98b7e7e58788%40artic.edu.

Stefano Cossu

unread,
Oct 26, 2018, 11:59:34 AM10/26/18
to Andrew Hankinson, Metz, Rosalyn, ocfl-co...@googlegroups.com
Hi Andrew,
Great discussion. See comments inline below.

On 10/26/18 10:12 AM, Andrew Hankinson wrote:
> Thanks for reading and giving feedback, Stefano!
>
> The initial choice of SHA512 was not intended to limit the use of newer or better algorithms, but to limit the use of older and insecure (but still widely adopted) ones. We intended to set a 'baseline' with the selection of SHA512 as the hashing function of choice.
>
> The hashing functionality forms a core part of the content addressability function and versioning with OCFL. Limiting it to a known algorithm was intended primarily to support the implementation of OCFL versioning (that is, to support the assignment of a content hash with a path). In that sense it is 'just' an implementation detail. That it can also function as fixity is (almost) just a side-effect, albeit a highly useful one.

I agree. Of course, if the implementer were free to use an algorithm of
their choice, they should be using the same across the system.


>
> It's true that we don't know what future problems may be discovered in SHA-512, but the same could be said with any (or all! [1]) choices. I think we have to make the best choice we can, given the available information.

Right, but why not leave that decision to the implementer? How about a
wording along the lines of: "OCFL Objects SHOULD use a cryptographic
hashing algorithm responding to modern security standards [reference to
non-normative list of suggestions]" (and similarly for the following
MUST sentence)?

One more thing to consider is interoperability: SHA2 is extremely
popular, BLAKE2 not as much but still has many programming language
implementations; more niche solutions with fewer bindings may limit the
technology choices to retrieve and manipulate an OCFL archive (but may
still be fine for an institution which controls its technology and wants
to take advantage of a particular algorithm).


>
> In addition, we also have the 'fixity' section, which allows for additional non-functional hashes (that is, storing fixity values, but not part of the mechanism of versioning within OCFL). I think it would be fine to add other well-supported algorithms to our list of 'approved' ones as new ones get adopted. We would need to be careful with BLAKE2b, however, since it seems it can vary in size which may introduce some ambiguity in what gets stored vs. what gets checked (we wouldn't want to confuse a BLAKE2-256 with a SHA-256 hash).
>
> "BLAKE2b (or just BLAKE2) is optimized for 64-bit platforms and produces digests of any size between 1 and 64 bytes." [2]

This is disambiguated by the `digestAlgorithm` property [1] and the
inventory digest file naming convention [2], right?
[1] https://ocfl.io/0.1/spec/#inventory-structure
[2] https://ocfl.io/0.1/spec/#inventory-digest

Libor Coufal

unread,
Oct 26, 2018, 4:47:00 PM10/26/18
to sco...@artic.edu, rosaly...@emory.edu, ocfl-co...@googlegroups.com
Hi all,

I agree with Stefano that the spec should be implementation neutral. Also, while it is true that the "insecure" algorithms were proved to be theoretically breakable, the practical implications for most applications, including very likely for what you are trying to achieve, are insignificant. For most people, other considerations such as cost of implementation or performance will be of more concern.

We need to be wary of being overly and unnecessarily prescriptive and keep in mind that everything has opportunity costs. Risk management might be a better approach.

Libor Coufal
Assistant Director, Digital Preservation
National Library of Australia

Simeon Warner

unread,
Oct 26, 2018, 5:32:25 PM10/26/18
to ocfl-co...@googlegroups.com
Thank you for your input Libor and Stefano.

I think that if we want to decouple the spec from particular algorithms
then we will end up needing to point to a canonical registry of digest
algorithms that provides the same sort of detail that the table in the
digests section [1]. Specifically, we need agreed name strings tied to
details of the particular implementations (algorithm, length variant if
any, encoding details). When we started this work I looked so such a
registry but didn't find something with appropriate content and
governance for our purposes (see [2]) but perhaps there are other
options we don't know. Otherwise I think we either have maintain a
separate document to serve as a registry, or else we have to update the
spec to update the list of known digests.

Cheers,
Simeon

[1] https://ocfl.io/0.1/spec/#digests
[2] https://github.com/OCFL/spec/issues/21#issuecomment-398186521
> <mailto:ocfl-co...@googlegroups.com> on behalf of Stefano Cossu"
> <ocfl-co...@googlegroups.com
> <mailto:ocfl-co...@googlegroups.com> on behalf of
> <mailto:ocfl-community%2Bunsu...@googlegroups.com>.
> >      To post to this group, send email to
> ocfl-co...@googlegroups.com
> <mailto:ocfl-co...@googlegroups.com>.
> <mailto:ocfl-community%2Bunsu...@googlegroups.com>.
> To post to this group, send email to ocfl-co...@googlegroups.com
> <mailto:ocfl-co...@googlegroups.com>.
> --
> You received this message because you are subscribed to the Google
> Groups "Oxford Common File Layout Community" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to ocfl-communit...@googlegroups.com
> <mailto:ocfl-communit...@googlegroups.com>.
> To post to this group, send email to ocfl-co...@googlegroups.com
> <mailto:ocfl-co...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ocfl-community/CAA%3DoF%2B28vWS8EumTS7tV5A0RtjjTuYdxDTP67ETveJ-FdsTV2w%40mail.gmail.com
> <https://groups.google.com/d/msgid/ocfl-community/CAA%3DoF%2B28vWS8EumTS7tV5A0RtjjTuYdxDTP67ETveJ-FdsTV2w%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Andrew Hankinson

unread,
Oct 29, 2018, 9:08:32 AM10/29/18
to Libor Coufal, Stefano Cossu, Metz, Rosalyn, ocfl-co...@googlegroups.com
Thanks, Libor, Stefano,

I think the appropriate course of action would be to raise this issue on our next community call, which is scheduled for 14 November.

Could you please add an item to the agenda, and we will be sure to discuss it?

https://github.com/OCFL/spec/wiki/2018.11.14-Community-Meeting

Thanks,
-Andrew
> To view this discussion on the web visit https://groups.google.com/d/msgid/ocfl-community/CAA%3DoF%2B28vWS8EumTS7tV5A0RtjjTuYdxDTP67ETveJ-FdsTV2w%40mail.gmail.com.

Stefano Cossu

unread,
Oct 29, 2018, 10:53:15 AM10/29/18
to Andrew Hankinson, Libor Coufal, Metz, Rosalyn, ocfl-co...@googlegroups.com
Ah... I'll be at a conference on 11/14. I may be able to attend, but
can't guarantee at 100% yet.

In any case, I think it's an excellent idea to add the topic.

Stefano
Reply all
Reply to author
Forward
0 new messages