Dear forum,
- Baseline (signing a "Hello world" message).
- Keys and signatures of the wrong length.
- Signatures with bit flips.
- Signature hints with a non-canonical encoding:
- hints that aren't sorted,
- hints are not strictly sorted (i.e. the same hint is repeated),
- too many hints (potentially causing a buffer overflow),
- non-zero padding after the hints.
- Secret keys with the `s1` or `s2` vectors out of the `[-eta, eta]` range.
- Public key with the `t1` component set to zero (allowing trivial forgeries, but the verification algorithm should still accept signatures for this key).
- Boundary conditions in the signature rejection loop (aiming to detect incorrect comparisons):
- `z_max` equals `gamma1 - beta - 1` or `gamma1 - beta`,
- `r0_max` equals `gamma2 - beta - 1` or `gamma2 - beta`,
- `h_ones` equals `omega` or `omega + 1`,
- in the case of ML-DSA-44 (a.k.a. Dilithium 2), `|ct0|_max` equals `gamma2 - 1` or `gamma2` (*).
- A "large" number of SHAKE bytes and of SHAKE blocks generated in the `expand_a`, `expand_s`, `rej_ntt_poly` and `rej_bounded_poly` functions.
- Boundary conditions in arithmetic functions:
- `power_2_round` function: when the remainder (found in `t0`) is equal to 4096 or -4095,
- `decompose` (via `high_bits` or `low_bits`): when the condition `r_plus - r_0 = q - 1` happens.
(*) For the `|ct0|_max` condition, because there is a very low probability of obtaining a signature that exercises it by brute-force (see g/pqc-forum/c/eLaw4fgjaMg and g/pqc-forum/c/G8Zf0hC-uu0), I have used the following method, "cheating" on the key generation procedure to craft test vectors that pass the condition.
1. Starting with all-zeroes `s1` and `s2` vectors and an arbitrary matrix A, generate random changes in `s1[0]` so that the resulting `t0[0]` has a maximal L2 norm. Note that in comparison to g/pqc-forum/c/G8Zf0hC-uu0, I used the L2 norm rather than the L1 norm, because (central limit theorem) when tau is large the coefficients of `ct0` tend to follow a normal distribution which only depends on the mean (here zero) and the variance (i.e. L2 norm).
After that `s1 = [x, 0, 0, 0]`, `s2 = [0, 0, 0, 0]` and `t0 = [a, b, c, d]` where `a` has a large L2 norm. I repeated this procedure on 50 matrices and kept the best one (largest `L2(t0[0])`) to increase the probability of success.
2. Sign messages in a loop until we find one where `|ct0|_max` is close to `gamma2` and adjust `s2` to obtain the desired value of `|ct0|_max`. More precisely:
- Fail if `|ct0|_max` isn't between `gamma2 - tau*eta` and `gamma2 + tau*eta - 1`.
- In case of success, adjust the coefficients of `s2[0]` to obtain the desired `|ct0|_max` (both `gamma2 - 1` and `gamma2`). This works in the vast majority of cases because `c` contains `tau` ones / minus ones and `s2` can have coefficients within `[-eta, eta]`.
- Finish the signing procedure, accepting the message only if the other conditions succeed (on `z_max`, `r0_max` and `h_ones`), so that the test is useful.
Choosing `s2` after the fact works because it isn't a
dependency of the public key (as long as the rounded value `t1` stays
the same). Of course the generated key pair isn't honest as there is no known random seed to generate it, but it works for the purpose of functionally testing the signing procedure, as the keys are serialized correctly.
With all that, I needed to generate in the order of O(10^7) messages to find a match, or about 20 minutes of CPU time with some AVX2 optimizations (just a few minutes with several threads).
The main speedup compared to g/pqc-forum/c/G8Zf0hC-uu0 is that `|ct0|_max` doesn't need to fall exactly on the desired value, because `s2` is adjusted after the fact. With ML-DSA-44 parameters where tau=39 and eta=2, this gives a 155x speed-up. Some computations are also saved by not running the full signing procedure on each message (but a simplified one to test |ct0|_max upfront).
Best,
Guillaume