NTT doesn't require n as a power of 2, but that b | q - 1

737 views
Skip to first unread message

Markku-Juhani O. Saarinen

unread,
Apr 13, 2018, 11:15:57 AM4/13/18
to pqc-forum
Hi,

Some speakers have claimed that NTT requires n to be a power of two. The actual requirement is that n divides q-1. 

In case of NewHope, HILA5 and some other proposals that use q = 3 * 2^12 + 1 = 12289, we have 3 |  q-1 and hence "cube roots of unity" are available. From elementary number theory we know that this is a necessary and sufficient condition.

From elementary signal processing we know that one can therefore do a n=768 transform with over this field by first doing one radix-3 split and then continuing with normal radix-2 splits. The negacyclic Nussbaumer transform (required by X^n+1 basis) is not affected (but limits n to (q-1)/2 = 6144, I think).

The 3-way split is actually rather analogous to some "Module LWE" proposals.

Cheers,
- markku

Mike Hamburg

unread,
Apr 13, 2018, 11:44:39 AM4/13/18
to Markku-Juhani O. Saarinen, pqc-forum
Hi Markku,

I agree that n doesn’t have to be a power of 2, but there is another reason that NTT algorithms tend to use the power-of-2 negacyclic ring (x^(2^k)+1).  The concern is that you can attack RLWE in a subring, at least if the polynomial defining the subring is sparse and low-weight.

So for example, if the ring is (Z/qZ)[x]/(x^768 + 1), then it factors as (x^512 - x^256 + 1)(x^256+1), so you could attempt the attack in a 256-dimensional ring or a 512-dimensional ring instead of the 768-dimensional one.

The result is that most submitters have preferred cyclotomic polynomials.  But for a prime-order cyclotomic polynomial you can’t efficiently use native NTT (but you could always use it over a larger domain and then reduce).  For a composite-order cyclotomic, the degree phi(k) is still a power of 2 unless k is divisible by at least 7 — this would correspond to the cyclotomic polynomial of order 7*256, namely

x^768 - x^640 + x^512 - x^384 + x^256 - x^128 + 1

You could use q = 7*256*6+1 = 10753 for this NTT.  Probably you would also want a “clarifier” (in ThreeBears’ terminology) of x^256+1 to decrease the noise amplification caused by the many coefficients of the cyclotomic.

While such a ring would presumably work, the radix-7 steps will slow down your NTT, and it adds complexity.  So the advantage over modules is pretty small: maybe cheaper sampling of the matrix, and more coefficients to pack error correction information.

Cheers,
— Mike

--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
Visit this group at https://groups.google.com/a/list.nist.gov/group/pqc-forum/.

Gregor Seiler

unread,
Apr 13, 2018, 12:11:53 PM4/13/18
to Mike Hamburg, Markku-Juhani O. Saarinen, pqc-forum
Hi Mike, Markuu,

for fast NTT-based multiplication modulo a non-power-of-two cyclotomic
polynomial you can for example take the 2304-th cyclotomic polynomial
X^768 - X^384 + 1 of degree 768. It factors as (X^384 - zeta)(X^384 -
zeta^5) modulo q when zeta is a primitive 6th root of unity. From here
you can do standard radix-2 FFTs until polynomials of degree less than 3
followed by one step of radix-3 FFT.

Regards,
Gregor
>> <mjos....@gmail.com <mailto:mjos....@gmail.com>> wrote:
>>
>> Hi,
>>
>> Some speakers have claimed that NTT requires n to be a power of two.
>> The actual requirement is that n divides q-1.
>>
>> In case of NewHope, HILA5 and some other proposals that use q = 3 *
>> 2^12 + 1 = 12289, we have 3 | q-1 and hence "cube roots of unity" are
>> available. From elementary number theory we know that this is a
>> necessary and sufficient condition.
>>
>> From elementary signal processing we know that one can therefore do a
>> n=768 transform with over this field by first doing one radix-3 split
>> and then continuing with normal radix-2 splits. The negacyclic
>> Nussbaumer transform (required by X^n+1 basis) is not affected (but
>> limits n to (q-1)/2 = 6144, I think).
>>
>> The 3-way split is actually rather analogous to some "Module LWE"
>> proposals.
>>
>> Cheers,
>> - markku
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "pqc-forum" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to pqc-forum+...@list.nist.gov
>> <mailto:pqc-forum+...@list.nist.gov>.
> --
> You received this message because you are subscribed to the Google
> Groups "pqc-forum" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to pqc-forum+...@list.nist.gov
> <mailto:pqc-forum+...@list.nist.gov>.

Mike Hamburg

unread,
Apr 13, 2018, 12:15:57 PM4/13/18
to Gregor Seiler, Markku-Juhani O. Saarinen, pqc-forum
Oh, right.  3^2 leaves a power of 3 in phi.  My mistake.  — Mike

Christopher J Peikert

unread,
Apr 13, 2018, 12:16:49 PM4/13/18
to Mike Hamburg, Markku-Juhani O. Saarinen, pqc-forum
Markku, Mike,

On Fri, Apr 13, 2018 at 11:44 AM Mike Hamburg <mi...@shiftleft.org> wrote:
I agree that n doesn’t have to be a power of 2, but there is another reason that NTT algorithms tend to use the power-of-2 negacyclic ring (x^(2^k)+1).  The concern is that you can attack RLWE in a subring, at least if the polynomial defining the subring is sparse and low-weight.

So for example, if the ring is (Z/qZ)[x]/(x^768 + 1), then it factors as (x^512 - x^256 + 1)(x^256+1), so you could attempt the attack in a 256-dimensional ring or a 512-dimensional ring instead of the 768-dimensional one.

Here (x^768+1) is reducible over the integers (not just mod q); the usual formulation of RLWE asks for a polynomial that is irreducible over the integers.

 For a composite-order cyclotomic, the degree phi(k) is still a power of 2 unless k is divisible by at least 7

Not quite — take, e.g., k=2^e 3^f for f > 1 and you get degree (dimension) n=phi(k)=2^e 3^{f-1}, which is 768 for e=8 and f=2. It’s nicest to view this cyclotomic ring as

  Z[X,Y]/(Phi_{2^e}(X), Phi_{3^f}(Y))

rather than its univariate form.

There’s a simple mixed-{2,3}-radix NTT for these parameters which stays entirely in dimension n, as we showed in Lyubashevsky-Peikert-Regev EUROCRYPT’13 (see Section 3) 

(I’m not claiming that this is better than rank-3 MLWE over (x^256+1), only that it can be done.)

Sincerely yours in cryptography,
Chris

Leo Ducas

unread,
Apr 14, 2018, 12:46:21 PM4/14/18
to pqc-forum, mi...@shiftleft.org, mjos....@gmail.com, cpei...@alum.mit.edu
There’s a simple mixed-{2,3}-radix NTT for these parameters which stays entirely in dimension n, as we showed in Lyubashevsky-Peikert-Regev EUROCRYPT’13 (see Section 3) 
 
An implementation of which may be available in Falcon, which does use such settings ?

D. J. Bernstein

unread,
Apr 14, 2018, 4:37:17 PM4/14/18
to pqc-...@list.nist.gov
This is purely a historical note, but the general theme might be
relevant to some patent landmines.

Christopher J Peikert writes:
> There’s a simple mixed-{2,3}-radix NTT for these parameters which stays
> entirely in dimension n, as we showed in Lyubashevsky-Peikert-Regev
> EUROCRYPT’13 (see Section 3) 
> https://eprint.iacr.org/2013/293.pdf .

On a quick skim, I'd say that this is identical to the "Schoenhage
trick" in radix 2 and radix 3 as presented in Section 9 of my 2001
fast-multiplication survey paper

https://cr.yp.to/papers/m3.pdf

where the main credit is to a classic 1977 paper [87] by Schoenhage.
An alternative is the "Nussbaumer trick", which often provides slightly
better performance; this one goes back to 1980.

---Dan
signature.asc

Christopher J Peikert

unread,
Apr 14, 2018, 10:40:19 PM4/14/18
to pqc-...@list.nist.gov
> Christopher J Peikert writes:
>> There’s a simple mixed-{2,3}-radix NTT for these parameters which stays
>> entirely in dimension n, as we showed in Lyubashevsky-Peikert-Regev
>> EUROCRYPT’13 (see Section 3)
>> https://eprint.iacr.org/2013/293.pdf .
>
> On a quick skim, I'd say that this is identical to the "Schoenhage
> trick" in radix 2 and radix 3 as presented in Section 9 of my 2001
> fast-multiplication survey paper
>
> https://cr.yp.to/papers/m3.pdf

Can you elaborate? It does not look the same to me. As described in
Section 9, the Schoenhage trick (whether in radix 2 or 3) uses a
larger intermediate dimension. E.g., for radix 2,

"The polynomials in R[x][y]/(y^n+1) have x-degree at most m-1, so
their product can be recovered from its image modulo x^{nk}+1 if nk >
2m-2."

So the intermediate product polynomial in R[x,y]/(x^{nk}+1, y^n+1) has
n^2 k > n(2m-2) coefficients, which is almost twice as many as each
input polynomial has (mn).

By comparison, Section 3 of https://eprint.iacr.org/2013/293.pdf gives
a mixed-radix NTT (or more precisely, CRT -- which evaluates at only
the *primitive* m'th roots of unity) and cyclotomic ring
multiplication that *preserve the dimension throughout*. For this it
needs an appropriate root of unity to exist; Shoenhage looks less
restrictive in that respect. So the methods appear incomparable.

> An alternative is the "Nussbaumer trick", which often provides slightly
> better performance; this one goes back to 1980.

Nussbaumer also uses a larger intermediate dimension. This can
potentially result in a net win when everything else is taken into
account; see, e.g., this BS thesis by van der Lubbe:
http://www.cs.ru.nl/bachelorscripties/2016/Gerben_van_der_Lubbe___4389026___A_New_Hope_for_Nussbaumer.pdf

D. J. Bernstein

unread,
Apr 15, 2018, 2:29:21 AM4/15/18
to pqc-...@list.nist.gov
Christopher J Peikert writes:
> Can you elaborate?

Certainly. The aforementioned 2001 survey includes a description of the
radix-2 FFT trick as

remainder arithmetic modulo x^n-b and x^n+b in R[x], i.e., mapping
R[x]/(x^{2n}-b^2) \to (R[x]/(x^n-b)) \times (R[x]/(x^n+b))

and a description of the FFT (including a size-4 example) as

the FFT trick applied recursively from x^{2^k}-1 all the way down to
linear polynomials.

More generally, the survey mentions "radix-3 and higher-radix versions
of the FFT", in particular spelling out the "radix-3 FFT trick" as

remainder arithmetic modulo x^n-b, x^n-\omega b, and x^n-\omega^2 b,
where 1+\omega+\omega^2=0.

After hearing that the (radix-2) FFT is the (radix-2) FFT trick applied
recursively, the reader is expected to be able to figure out that the
"radix-3 FFT" mentioned later is the radix-3 FFT trick applied
recursively. Here's a size-9 example of a radix-3 FFT with b^3 = omega:

* factor y^9-1 into y^3-1, y^3-omega, y^3-omega^2;
* factor y^3-1 into y-1, y-omega, y-omega^2;
* factor y^3-omega into y-b, y-b omega, y-b omega^2;
* factor y^3-omega^2 into y-b^2, y-b^2 omega, y-b^2 omega^2.

All of the examples so far have binomial moduli (y^n-constant). The
reason I pointed to the "radix-3 Schoenhage trick" paragraph of the 2001
survey is that it instead explicitly uses modulus y^2n+y^n+1 and factors
the modulus as (y^n-A)(y^n-B) where both A and B are 3rd roots of 1. For
example, starting with Phi_9 = (y^9-1)/(y^3-1) = y^6+y^3+1:

* factor y^6+y^3+1 into y^3-omega, y^3-omega^2;
* continue as above.

A note later in the survey comments on the extra cost of applying "the
radix-3 FFT to y^{3n}-1 rather than y^{2n}+y^n+1". I see no difference
in arithmetic operations between this radix-3 FFT for y^2n+y^n+1 and the
radix-3 algorithm that you credit to your 2013 paper, and I see no
difference between the savings stated in the 2001 survey and the savings
that you credit to your 2013 paper. The two-dimensional 2^e 3^f case
follows immediately from Good's trick, which dates back to the 1950s and
has been widely applied in the FFT literature.

> So the intermediate product polynomial in R[x,y]/(x^{nk}+1, y^n+1) has
> n^2 k > n(2m-2) coefficients, which is almost twice as many as each
> input polynomial has (mn).

You're getting distracted by a particular application, and ignoring the
general flexibility explained in the same section of the 2001 survey:

The idea of the techniques in this section is to extend the base ring
to contain appropriate roots of 1 for an FFT. A smaller extension
suffices if R already contains some roots of 1. ... In extreme cases
R already supports the desired FFT and no extension is necessary.

The easy extreme cases, with no need for the extension techniques, are
the only cases covered by your 2013 paper.

---Dan
signature.asc

Leo Ducas

unread,
Apr 15, 2018, 5:01:56 AM4/15/18
to pqc-...@list.nist.gov
My apologies for the confusion:
- I did not meant to confirm that multi-radix NTTs were unknown before 2013,
- I only meant to point at a relevant implementation in our context. Certainly many other exists.

-- L


--
You received this message because you are subscribed to a topic in the Google Groups "pqc-forum" group.
To unsubscribe from this topic, visit https://groups.google.com/a/list.nist.gov/d/topic/pqc-forum/r9R7OJT6x_c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pqc-forum+unsubscribe@list.nist.gov.

Christopher J Peikert

unread,
Apr 20, 2018, 12:15:32 PM4/20/18
to pqc-forum
To avoid potential confusion, let me use these terms from Section 3 of
LPR'13 https://eprint.iacr.org/2013/293.pdf :

* The dimension-m DFT (and FFT algorithm) evaluates a polynomial at
all the m'th roots of unity.

* The index-m CRT (short for the admittedly generic name "Chinese
Remainder Transform") evaluates at just the phi(m) *primitive* m'th
roots of unity. This is sufficient for multiplication in the m'th
cyclotomic ring, and saves at least a factor of m/phi(m) in space and
time, and some annoying algorithmic complications, versus using the
FFT.

(It's worth pointing out that all prior treatments of Ring-LWE crypto
I'm aware of used the univariate representation Z[X]/Phi_m(X) of the
m'th cyclotomic ring, which for non-prime-power m is rather more
complicated than the "multivariate" or "tensored" representation
associated with the CRT.)

Certainly, everything in Section 3 is either well known in the FFT
literature going back to at least the 1960s, or a slight tweak of
something well known from that era that became known at some point.
Specifically, the decomposition of DFT/CRT to the prime-power case is
attributed to Good-Thomas, and the power-of-p DFT decomposition
(equivalently, radix-p FFT algorithm) is attributed to Cooley-Tukey.

The radix-p CRT decomposition is a slight variant of the DFT one (and
easy to obtain once you know you want it), but its provenance is
murkier. In particular, Schoenhage's 1976 paper
https://link.springer.com/article/10.1007/BF00289470 works in
power-of-3 cyclotomics

S[x]/(1+x^N+x^{2N})

where N=3^k, but uses the radix-3 FFT to evaluate at *all* the 3N'th
roots of unity. This carries with it the complications and
inefficiencies that would be avoided by using the CRT.

Indeed, as mentioned in the Section 9 "Notes" of your 2001 survey:
"the details in [Schoenhage'76] are unnecessarily inefficient; [it]
applies the radix-3 FFT to x^{3N}-1 rather than x^{2N}+x^N+1." I
agree that the latter, expanded out fully, matches the CRT
decomposition presented in LPR'13.

So, to me it seems a bit of a stretch to attribute the CRT
decomposition/algorithm itself (and its advantages over FFT for
cyclotomic arithmetic) to [Schoenhage'76]. But it's also clear that
its equivalent was published by 2001, and I wouldn't be at all
surprised if it was folklore or even written down somewhere before
that.

(And, for the record, we have made no patent claims on anything in LPR'13.)
Reply all
Reply to author
Forward
0 new messages