I had mentioned that there are divisions in NCC-Sign with secret inputs.
Compilers will sometimes replace these with multiplications since the
denominator is constant, but this depends on compiler options and on the
target platform; presumably the divisions will produce exploitable
timing leaks in some cases, as in KyberSlash.
Some of the SUPERCOP machines are running a patched version of TIMECOP
that scans for divisions, and in particular they catch some cases where
compilers are using division instructions for the NCC-Sign code:
https://bench.cr.yp.to/web-impl/amd64-rome0-crypto_sign-nccsign1.html
For the upcoming SUPERCOP release, I rewrote the decompose() function to
avoid variable divisions as shown below. This appears to resolve the
TIMECOP complaints, and makes only a small percentage difference in
speed. An alternative approach is to replace the whole sequence of
reduction operations mod 2*GAMMA2 (including the subsequent GAMMA2
comparison) with a multiplication, addition, and shift using suitable
constants. As far as I can tell, the input is always reduced mod Q in
context, so the reduction mod Q in the original code is unnecessary.
---D. J. Bernstein
int32_t decompose(int32_t *a0, int32_t a) {
crypto_int32 q,r,mask,bit;
r = a; q = 0;
for (bit = 32;bit >= 1;bit >>= 1) {
mask = crypto_int32_leq_mask(bit*2*GAMMA2,r);
r -= (bit*2*GAMMA2)&mask;
q += bit&mask;
}
mask = crypto_int32_smaller_mask(GAMMA2,r);
r -= (2*GAMMA2)&mask;
q -= mask;
mask = crypto_int32_equal_mask(q,(Q-1)/(2*GAMMA2));
*a0 = r+mask;
return q & ~mask;
}