Dilithium (ML-DSA) draft standard comments

920 views
Skip to first unread message

Vadim Lyubashevsky

unread,
Sep 8, 2023, 8:15:24 AM9/8/23
to pqc-...@list.nist.gov

Dear NIST, dear all,

At this week's Post-Quantum Cryptography Summit in Oxford (thank you to the organizers!), a group of us went through the exercise of implementing the Dilithium draft standard from https://csrc.nist.gov/pubs/fips/204/ipd in Python and then comparing it to our C reference implementation from https://github.com/pq-crystals/dilithium/tree/standard.  This reference implementation is a modification of the pre-standard Dilithium which includes the proposed changes from the draft standard document. The result was that modulo one small difference (and some inconsistencies in the standard algorithms which could be easily fixed), the algorithms appear to match. Since we implemented the algorithms from  https://csrc.nist.gov/pubs/fips/204/ipd as verbatim as possible, they are extremely slow, and so we could not perform many tests. So we are just mildly confident that everything matches.  

Below, we explain the minor difference and give other suggestions for the standard.

Best,
Vadim


- The ExpandMask function (Algorithm 28) already uses a different input for SHAKE for every polynomial. It should thus use the output from the beginning of the hash stream and not start at an offset (i.e. the r is not needed in the expansion in line 4). This currently differs from our reference C implementation and leads to different test vectors. Removing the r (or setting it to 1) makes things more efficient. This is the version that we tested against our reference code.

- As in the ML-KEM and SLH-DSA standards, we strongly recommend to
  use byte arrays as inputs and outputs, and not bit strings. This should not 
result in any changes in the test vectors. The way things are now, very often there are many 
consecucive BytesTobits and BitsTobytes calls -- especially when submitting inputs to hash functions, which input/output bytes -- that cancel each other out.

- Please consider including the whole table of powers of zeta for
  the NTT. Then inside the NTT description simply use \zeta_j or
  \zeta[j]. Of course add an explanation of how the entries are
  computed.

- Move discussions about how to implement certain math operations
  (like rounding) to a dedicated "Notes for Implementors" section

- The majority of us thought that the entire variable \tilde{c} should be
  used in the SampleInBall function rather than just \tilde{c_1}.
  This would remove the need for variables \tilde{c_1} and
  \tilde(c_2), where \tilde(c_2) isn't actually ever used. The
  performance impact of the change will essentially be zero, but possibly make things less confusing.

Typos and inconsistencies:

Algorithm 1: Power2Round should not take d as an input

Algorithm 5: Should take bit length as input because other functions
call it with it (Algorithms 12 and 13).

Algorithm 7: d is used as the length of z, but d is a public value

Algorithms 21 and 22: For consistency with previous algorithm descriptions, the k and l in lines 1 and 3
should be in the "double exponent"

Algorithm 25: eta should not be a parameter in calling the function
in lines 5 and 6

Algorithms 23, 35, 36: k is used as a local loop variable, but it was previously defined as global

Algorithms 35 and 36: the brv function should be written out as a separate algorithm

Line 436: Symbol for transposing matrices is inconsistent with the
one in Kyber standard. Suggest remove it from Dilithium standard
since it is not used in this document

No floating point arithmetic: Comment similar to the one in FIPS 203 (line 703) is applicable to FIPS 204. 
Implementations of ML-DSA doesn't require and should not use floating-point arithmetic.
Reply all
Reply to author
Forward
0 new messages