This attack appeared in
https://eprint.iacr.org/2014/784. It's covered
in the NTRU Prime paper (including the modulus switch) and in earlier
NTRU Prime postings. It's much less effective against NTRU Prime than
the lattice attacks used to select NTRU Prime parameters. A closer look
shows that NTRU Prime is easier to review for security against this
attack strategy than cyclotomics are.
Take, e.g., a secret s in the ring (\Z/4591)[x]/(x^761-x-1), with weight
250 and all coefficients in {-1,0,1}. Imagine that an attacker is given
two random-looking elements A_1 and A_2 of the ring and noisy multiples
B_1 = A_1 s + e_1 and B_2 = A_2 s + e_2, where e_1 and e_2 are secrets
with coefficients in {-1,0,1}. (The cryptosystem actually has a much
wider range of e_2, but let's be generous to the attacker here.)
Let's switch modulus to, say, 5801. The attacker computes
A'_1 = round(5801 A_1/4591), B'_1 = round(5801 B_1/4591),
A'_2 = round(5801 A_2/4591), B'_2 = round(5801 B_2/4591)
and considers the differences
e'_1 = B'_1 - A'_1 s,
e'_2 = B'_2 - A'_2 s,
in the ring (\Z/5801)[x]/(x^761-x-1). Compared to the original small
differences e_1 = B_1 - A_1 s and e_2 = B_2 - A_2 s, we've
* multiplied by 5801/4591,
* added B'_i-B_i, and
* added (A'_i-A_i)s.
Let's look more closely at the third contribution:
* Each coefficient of A'_i-A_i is a random-looking number in the
real interval [-1/2,1/2] known to the attacker---typically around
1/4 in absolute value, sometimes smaller, sometimes larger.
* These coefficients are then multiplied by the 250 nonzero
coefficients of the secret s, random +-1 coefficients unknown to
the attacker.
* The resulting 190250 products in [-1/2,1/2] are distributed across
the 761 output coefficients, adding on average 250 of these
products to each output coefficient. (Plus extra weight about half
the time because the modulus is x^761-x-1 rather than x^761-1, but
let's not bother counting this.)
The sum of 250 random elements of [-1/2,1/2] has standard deviation
around 4.56, several times larger than the original noise. The Sage
script below prints out a random example of an actual e'_1 such as
... - 4*x^9 - 2*x^8 + 6*x^7 + 7*x^6 - 8*x^4 - 12*x^3 - x^2 - x
immediately illustrating how much the noise has increased compared to
the original {-1,0,1} noise.
This extra noise is what the NTRU Prime paper is referring to in the
attack description when it says "an attacker switching from q to another
modulus will noticeably increase noise". This also illustrates the
advantage of selecting an inert q to start with: attackers trying to use
factorizations are immediately faced with the extra noise from a modulus
switch.
Of course it's still conceivable that some severe weakness in *LWE is
able to handle such large errors for polynomial x^761-x-1 and modulus
5801. So let's move on to looking at the algebraic part of the attack.
The polynomial x^761-x-1 has, modulo 5801, a root 3738 of order "only"
25. The fact that 3738 is a root means that the equations
e'_1 = B'_1 - A'_1 s,
e'_2 = B'_2 - A'_2 s
in the ring (\Z/5801)[x]/(x^761-x-1) imply equations
e'_1(3738) = B'_1(3738) - A'_1(3738) s(3738),
e'_2(3738) = B'_2(3738) - A'_2(3738) s(3738)
in \Z/5801. The fact that 3738 has order 25 means that the noise
e'_1(3738) is a weighted sum of the 25 terms
t_0 = e'_1[0] + e'_1[25] + e'_1[50] + ...,
t_1 = e'_1[1] + e'_1[26] + e'_1[51] + ...,
...
t_24 = e'_1[24] + e'_1[49] + e'_1[74] + ...
with weights 1, 3738, 3738^2=3836, ..., 3738^24=5188 respectively. The
computation concluding that e'_1[i] has standard deviation around 4.56
(counting only the third contribution and ignoring the extra -x) also
says that t_j has standard deviation above 20. To confirm this, the Sage
script also prints out the polynomial t_0 + t_1 x + t_2 x^2 + ....
Example output:
32*x^24 + 35*x^23 + 45*x^22 + 26*x^21 + 49*x^20 - 9*x^19 + 17*x^18 -
59*x^17 - 2*x^16 - 26*x^14 - 49*x^13 - 35*x^12 + 28*x^11 + 39*x^10 +
46*x^9 - 14*x^8 - 25*x^7 + 13*x^6 - 22*x^5 - 47*x^4 - 45*x^3 + 12*x^2
+ 25*x + 6
Evidently t_0 is still far from uniform, but adding 3738 t_1 gives a
much better spread of possibilities, and then adding 3836 t_2 gets
closer to uniform, and so on. As a simplified model, let's imagine that
each t_i is chosen independently and uniformly at random from
{-32,-31,...,0,1,...,32}. A short calculation of the exact distribution
of the sum t_0 + 3738 t_1 + ... + 5188 t_{24} then shows that each
possibility for the sum has probability very close to 1/q, the maximum
difference being slightly below 2^(-135). One would expect to use about
2^270 samples to observe this with good probability.
For a better estimate, one can run more experiments to see the actual
distribution of each 3738^i t_i, and then convolve those distributions,
but the point is that there's obviously far too much randomness in each
t_i to be able to see anything from the weighted sum until the number of
samples is gigantic. The Arora--Ge attack uses far fewer samples and
much less time. For comparison, the cryptosystem gives the attacker a
grand total of 2 samples.
There were two major reasons for the sample blowup here: first, the
modulus switch; second, the large order---25 is large in this context
because, basically, it ends up in the exponent of the attack cost. This
is why the papers on this topic
* emphasize much smaller orders (the 2014 paper focuses on order 1
and only briefly considers larger orders), and
* select q's where the polynomial already has a factorization, rather
than paying for a modulus switch.
These are the best cases for the attack. Given the first point, let's
look at which primes q exist for order 1, 2, 3, 4:
* Can there be a prime q for which x^p-x-1 has 1 as a root modulo q?
No, there can't: no prime q divides 1^p-1-1 = -1.
* How about order 2, i.e., -1 as a root modulo q? Nope: no prime q
divides (-1)^p-(-1)-1 = -1.
* How about order 3, i.e., b^p-b-1 = 0 mod q with b^2+b+1 = 0 mod q?
Plug b^3 = 1 mod q into b^p-b-1 = 0 mod q to see that b^2-b-1 = 0
if p mod 3 is 2, forcing b = -1, contradiction; or b-b-1 = 0 if p
mod 3 is 1, again a contradiction.
* How about order 4, i.e., b^p-b-1 = 0 mod q with b^2+1 = 0 mod q?
The same calculation allows q=5 if p is 3 mod 4, but q=5 is far too
small to be useful compared to the noise. There's no q if p is 1
mod 4.
More generally, for any order, one can simply factor the resultant of
the polynomial with the appropriate cyclotomic polynomial to see all of
the possible primes q: e.g., for x^761-x-1, the only possibilities are
q=199 for order 11, q=61 for order 15, q=257 for order 16, q=307 for
order 17, q=19 for order 18, q=43 for order 21, q=139 for order 23,
q=461 for order 23, q=5801 for order 25 (my example above), etc.
To summarize: What the attack wants, namely a large q with a tiny order,
simply doesn't exist for these polynomials. This is conceptually the
same as a previously published calculation concluding that the attack
also doesn't affect cyclotomics.
Even more generally, given the polynomial f, one can try to find a prime
q and a divisor d of f mod q for which the reduced error e mod d is
visibly non-uniform. Compared to cyclotomics, NTRU Prime gives security
reviewers extra tools to analyze this generalized attack:
* For NTRU Prime, one can enumerate all divisors d for a huge range
of q and, for each d, compute various numerical features of e mod d
to see if these features detect deviations from uniform.
* For cyclotomics, one quickly bumps into q with an exponential
number of divisors d, and then it's unclear how to even get started
on figuring out whether there's some exploitable d. Security here
has _not_ been proven. The cases that have been analyzed here are
only for very small degrees of d.
Of course one _hopes_ that the generalized attack doesn't work beyond
the cases identified in the literature, but it's hard to justify any
confidence in this, so it's good to be able to investigate _all_
divisors d for a huge range of q.
> the additional error from rounding in each coefficient can [when
> switching to certain moduli, anyway] be upper bounded at significantly
> less than 1/2
How can this be true, given that many coefficients in [-1/2,1/2] from
A'_i-A_i are multiplied by many coefficients of s and added up in each
position as additional noise?
If the intention is to say that this large noise isn't necessarily
dominant compared to the scaling of the original ternary noise: Sure,
switching to a _much_ larger modulus would limit the impact of rounding,
but then what's the next step of the attack supposed to be?
> if the proper q and a can be found (assuming they do exist), it would
> quite possibly yield a distinguishing attack
The 2014 attack is inapplicable to NTRU Prime, for very much the same
reasons that it's inapplicable to cyclotomics. See above.
The generalized attack could conceivably be applicable to cyclotomics or
to NTRU Prime. The generalized attack applied to NTRU Prime has a more
restricted parameter space---and is easier for security reviewers to
check---than the generalized attack applied to cyclotomics. See above.
> if many such q and a can be found, it might even potentially yield a
> key recovery attack.
This is why the attack papers on this topic, beyond the basic objective
of finding examples of number fields with *LWE distinguishers, have set
the more ambitious objective of finding _Galois_ examples (degree-n
number fields with Galois groups of size only n). There are *LWE cases
broken more efficiently by this line of attacks than by any other known
attacks. For cryptanalysts interested in understanding the state of the
art, I recommend reading the original sequence of attack papers:
https://eprint.iacr.org/2014/784
https://eprint.iacr.org/2015/106
https://eprint.iacr.org/2015/971
https://eprint.iacr.org/2016/193
Of course, Galois number fields---such as cyclotomic fields---are
extremely rare among all number fields. The NTRU Prime fields were
designed from the outset to be as far as possible from being Galois.
---Dan (speaking for myself)
p = 761
w = 250
q = 4591
q2 = 5801
ZZx.<x> = ZZ[]
R.<y> = ZZx.quotient(x^p-x-1)
def qreduce(c):
return ((c+(q//2))%q)-(q//2)
def q2reduce(c):
return ((c+(q2//2))%q2)-(q2//2)
s = [0]*p
while sum(si**2 for si in s) < w:
s[randrange(p)] = randrange(2)*2-1
s = R(s)
e1 = R([randrange(3)-1 for i in range(p)])
A1 = R([qreduce(randrange(q)) for i in range(p)])
B1 = A1*s+e1
B1 = R([qreduce(B1[i]) for i in range(p)])
A1prime = R([round(A1[i]*q2/q) for i in range(p)])
B1prime = R([round(B1[i]*q2/q) for i in range(p)])
e1prime = B1prime-A1prime*s
e1prime = R([q2reduce(e1prime[i]) for i in range(p)])
print(e1prime)
print(lift(e1prime) % (x^25-1))