580 views

Skip to first unread message

Jan 17, 2019, 6:09:22 AM1/17/19

to pqc-...@list.nist.gov, Vadim Lyubashevsky

Dear all,

To the best of our knowledge, there are no lattice-based KEM submissions

that use NTTs over rings Z[X]/(f) where the degree of f is not a power

of 2. Since it has sort of been accepted by now that for security we

want the degree of f to be between 700 and 800, this rules out using NTT

for “optimally-set” parameters. Schemes based on generalized-LWE (like

Kyber) get around this issue by working with a random matrix of

256-degree polynomials, but NTRU schemes cannot do this without

increasing their public key size. Since NTRU-like schemes don’t have

many parts to them other than polynomial multiplication / division

(unlike LWE-based schemes that also require sampling uniform polynomials

in the ring which takes more time than actual NTT multiplication),

optimizing this part - especially the division - could lead to a really

efficient KEM.

So to test the potential of an NTT-based NTRU, we have implemented an

NTT in dimension 768 and built a complete CCA-secure constant-time KEM

around it. Our AVX2-optimized KEM (which we called NTTRU) is faster than

the AVX2-optimized NTRU-HRSS implementation by a factor of 30 in key

generation (incorporating the newest speed-ups announced by Dan), 5 in

encapsulation, and 8 in decapsulation. Concretely it only needs about

6.4K cycles for key generation, 6.1K cycles for encapsulation and 7.9K

cycles for decapsulation. All numbers were measured on a Skylake laptop

(decapsulation is not yet as optimized as the rest). The NTT itself over

the ring Z_q[X]/(X^768-X^384+1) for q=7681 only needs 810 cycles.

Our NTT in dimension 768 can also easily be used do define versions of

other NTT-based Ring-LWE encryption schemes (e.g. NewHope, LIMA, etc.)

to work over non-power-of-2 rings, and some of our internal

optimizations should also be applicable to schemes that only need to use

power-of-2 NTTs (e.g. Kyber).

So our conclusion is that if you’re OK with using cyclotomic rings with

splitting moduli (e.g. NTRU Prime purposefully avoids them, while

NTRU-HRSS is over such a cyclotomic ring), then we believe that doing

multiplication via NTT is absolutely the way to go – especially for

NTRU-like schemes because they don’t have many other expensive parts and

their key generation requires division, which is much more expensive

without NTT.

For more details, the source code is at

https://github.com/gregorseiler/NTTRU and the paper should be on eprint

in the coming days.

Regards,

Gregor and Vadim

To the best of our knowledge, there are no lattice-based KEM submissions

that use NTTs over rings Z[X]/(f) where the degree of f is not a power

of 2. Since it has sort of been accepted by now that for security we

want the degree of f to be between 700 and 800, this rules out using NTT

for “optimally-set” parameters. Schemes based on generalized-LWE (like

Kyber) get around this issue by working with a random matrix of

256-degree polynomials, but NTRU schemes cannot do this without

increasing their public key size. Since NTRU-like schemes don’t have

many parts to them other than polynomial multiplication / division

(unlike LWE-based schemes that also require sampling uniform polynomials

in the ring which takes more time than actual NTT multiplication),

optimizing this part - especially the division - could lead to a really

efficient KEM.

So to test the potential of an NTT-based NTRU, we have implemented an

NTT in dimension 768 and built a complete CCA-secure constant-time KEM

around it. Our AVX2-optimized KEM (which we called NTTRU) is faster than

the AVX2-optimized NTRU-HRSS implementation by a factor of 30 in key

generation (incorporating the newest speed-ups announced by Dan), 5 in

encapsulation, and 8 in decapsulation. Concretely it only needs about

6.4K cycles for key generation, 6.1K cycles for encapsulation and 7.9K

cycles for decapsulation. All numbers were measured on a Skylake laptop

(decapsulation is not yet as optimized as the rest). The NTT itself over

the ring Z_q[X]/(X^768-X^384+1) for q=7681 only needs 810 cycles.

Our NTT in dimension 768 can also easily be used do define versions of

other NTT-based Ring-LWE encryption schemes (e.g. NewHope, LIMA, etc.)

to work over non-power-of-2 rings, and some of our internal

optimizations should also be applicable to schemes that only need to use

power-of-2 NTTs (e.g. Kyber).

So our conclusion is that if you’re OK with using cyclotomic rings with

splitting moduli (e.g. NTRU Prime purposefully avoids them, while

NTRU-HRSS is over such a cyclotomic ring), then we believe that doing

multiplication via NTT is absolutely the way to go – especially for

NTRU-like schemes because they don’t have many other expensive parts and

their key generation requires division, which is much more expensive

without NTT.

For more details, the source code is at

https://github.com/gregorseiler/NTTRU and the paper should be on eprint

in the coming days.

Regards,

Gregor and Vadim

Jan 17, 2019, 8:18:33 AM1/17/19

to pqc-forum, vadim....@gmail.com, gse...@inf.ethz.ch

Isn't there a similar implementation of an NTT in dimension 768 in Falcon ?

-- L

Jan 17, 2019, 9:04:11 AM1/17/19

to Leo Ducas, pqc-forum, vadim....@gmail.com, gse...@inf.ethz.ch

Yes, there is an implementation of the NTT in almost the same ring as mentioned by Gregor (see https://falcon-sign.info/impl/falcon-vrfy.c.html, function mq_NTT_ternary).

Thomas

--

You received this message because you are subscribed to the Google Groups "pqc-forum" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

Visit this group at https://groups.google.com/a/list.nist.gov/group/pqc-forum/.

Jan 17, 2019, 9:54:35 AM1/17/19

to pqc-forum

Hi all,

@Leo, Thomas: Yes, our Falcon signature scheme (not a KEM) uses a very

similar NTT over a larger prime, but there is no optimized AVX2

implementation (yet). In fact, the NTT implementation in NTTRU can now

be almost directly ported to FALCON.

@Nigel: Sorry for the oversight of not mentioning the LIMA safe-prime

version. But the algorithms are different. When we say NTT, we mean an

algorithm that is in a sense "native" to the ring. Our NTT is different

from the Bluestein algorithm in LIMA in that it really does not do more

work than you would do in an optimal power-of-two NTT, apart from a few

additions. We do exactly the same 8x384 multiplications to reduce a

polynomial of degree under 768 to polynomials of degree under 3 as you

would do if 768 were a power of two.

And just to avoid any misunderstandings about our claims -- we are not

saying that we invented a new NTT technique. Our main objective was to

create an as-fast-as-possible KEM using AVX2 optimizations, and a few

small tricks particular to the kind of ring we use. And the result is

that NTRU can be made very fast when it is instantiated over an

NTT-friendly ring.

Regards,

Gregor and Vadim

>> <mailto:pqc-forum+...@list.nist.gov>.

@Leo, Thomas: Yes, our Falcon signature scheme (not a KEM) uses a very

similar NTT over a larger prime, but there is no optimized AVX2

implementation (yet). In fact, the NTT implementation in NTTRU can now

be almost directly ported to FALCON.

@Nigel: Sorry for the oversight of not mentioning the LIMA safe-prime

version. But the algorithms are different. When we say NTT, we mean an

algorithm that is in a sense "native" to the ring. Our NTT is different

from the Bluestein algorithm in LIMA in that it really does not do more

work than you would do in an optimal power-of-two NTT, apart from a few

additions. We do exactly the same 8x384 multiplications to reduce a

polynomial of degree under 768 to polynomials of degree under 3 as you

would do if 768 were a power of two.

And just to avoid any misunderstandings about our claims -- we are not

saying that we invented a new NTT technique. Our main objective was to

create an as-fast-as-possible KEM using AVX2 optimizations, and a few

small tricks particular to the kind of ring we use. And the result is

that NTRU can be made very fast when it is instantiated over an

NTT-friendly ring.

Regards,

Gregor and Vadim

Jan 17, 2019, 11:05:19 PM1/17/19

to pqc-forum, gse...@inf.ethz.ch

And just to avoid any misunderstandings about our claims -- we are not

saying that we invented a new NTT technique.

That's not how it read, I was merely pointing at a point of reference to compare to, and a potential application of you very fast code.

Best

-- Leo

-- Leo

Jan 19, 2019, 9:57:31 AM1/19/19

to pqc-...@list.nist.gov

Gregor Seiler writes:

> The NTT itself over the ring Z_q[X]/(X^768-X^384+1)

> for q=7681 only needs 810 cycles.

This is fast enough that it should also save time in implementations of
> The NTT itself over the ring Z_q[X]/(X^768-X^384+1)

> for q=7681 only needs 810 cycles.

NIST submissions that use more conservative rings. Multiplication uses 3

FFTs (and fewer if some FFTs can be cached); the usual reduction from

arbitrary polynomials to FFT-friendly polynomials sacrifices a factor of

2 (e.g., multiply modulo x^761-x-1 by multiplying modulo an FFT-friendly

polynomial of degree 1536); the usual reduction from arbitrary primes to

FFT-friendly primes sacrifices a factor of 2 when one input is small (as

it is in NTRU), or 3 in general. The FFT-friendly polynomials don't have

to be irreducible in this context: e.g., one can use the "truncated FFT"

polynomial (x^1024+1)(x^512-1) modulo 12289 and 18433. This will all

adapt straightforwardly to other polynomial lengths and coefficient

sizes, and to big-integer multiplication.

Comments below are regarding your new cryptosystem.

> a really efficient KEM

But your paper says 1248-byte ciphertexts, compared to (e.g.) 1152 bytes

for kyber768, 1088 bytes for saber, 1047 bytes for sntrup4591761.

Typically sending or receiving a byte costs at least three orders of

magnitude more than a clock cycle. Taking bytes+cycles/1000 for

sntrup4591761 gives 1047+45 = 1092 for the sender, 1047+94 = 1141 for

the receiver, which is better than 1248 no matter how few cycles you

use. Of course a full analysis would also have to study the security

level, which leads to many open questions, but presumably your security

loss from a larger q outweighs the slight difference in dimension etc.

Have you considered Rounded NTRU as a smaller alternative to Noisy NTRU?

How about a smaller q, and fixed weight instead of variable weight?

One of the reasons for upwards pressure on your q is that you're using

the old idea of taking f as 1+3*small instead of small. Your paper says

that (\Z/3)[x]/(x^768-x^384+1) doesn't support an NTT, but there are

many fast multiplication methods---for example, as in the HRSS software,

you could simply reuse your (\Z/q)[x]/(...) multiplier. As for division,

that's only an issue for keygen, and it's going to be quite fast anyway

since x^768-x^384+1 factors mod 3 as (x^64+x^32-1)^6 (x^64-x^32-1)^6.

> if you’re OK with using cyclotomic rings with

> splitting moduli (e.g. NTRU Prime purposefully avoids them, while NTRU-HRSS

> is over such a cyclotomic ring), then we believe that doing multiplication

> via NTT is absolutely the way to go

divisors (and more non-monic divisors) modulo the prime 7681 that you're

using. Have you at least analyzed the impact that this has upon security

against _known_ NTRU attacks? For example:

* Do you know how small the coefficients can be in divisors of

degree, say, 384? Sufficiently small coefficients would seem to

open up the ancient NTRU attack strategy from, e.g., 2001 Gentry.

* A multi-target attack can scan for keys that reduce to 0 modulo any

of your cubic factors. Doesn't this happen more than once in a

billion keys, allowing a slightly faster lattice attack since g is

then constrained to a lattice of slightly smaller dimension?

You draw an analogy to NTRU-HRSS, but NTRU-HRSS uses a polynomial

(x^701-1)/(x-1) that's irreducible mod 2. The actual modulus there is a

larger power of 2, so there are _some_ quotient rings to worry about,

but not your exponential collection of quotient rings. (Of course you

can point to, e.g., New Hope also having an exponential collection of

quotient rings, but this doesn't answer the security-analysis question.)

I agree that NTTRU is analogous to NTRU-HRSS in the use of cyclotomics:

there are many automorphisms (and quite a few subfields) to worry about.

Here's a larger NTTRU/HRSS/Prime comparison table regarding five NTRU

Prime features designed to reduce attack surface:

NTTRU HRSS Prime feature

----- ----- ----- -------------------------------------------

no yes yes avoid decryption failures

yes yes yes irreducible polynomial (over \Z)

no no yes avoid nontrivial number-field automorphisms

yes no yes prime modulus q

no no yes the ring is a field

The last two "no" entries for NTTRU are inherent in this use of NTT,

but the first "no" would be very easy to fix.

---Dan

Jan 19, 2019, 5:30:45 PM1/19/19

to pqc-...@list.nist.gov

Hi Dan, all,

Multiplication uses 3

FFTs (and fewer if some FFTs can be cached); the usual reduction from

arbitrary polynomials to FFT-friendly polynomials sacrifices a factor of

2 (e.g., multiply modulo x^761-x-1 by multiplying modulo an FFT-friendly

polynomial of degree 1536); the usual reduction from arbitrary primes to

FFT-friendly primes sacrifices a factor of 2 when one input is small (as

it is in NTRU), or 3 in general.

When you say "the usual reduction", do you mean setting the FFT-friendly prime q large enough so that no reduction modulo q takes place? If that's the case and your original prime is p and your original degree is d, then you need q to be at least > pd (and that's if you're multiplying by -1/0/1 polynomials). Gregor can mention much more about this, but I believe that for such large q ( > 16 bits), some of his AVX2 tricks used in NTTRU don't directly carry over. But anyway, it would certainly be nice if this helps others.

> a really efficient KEM

Of course a full analysis would also have to study the security

level, which leads to many open questions, but presumably your security

loss from a larger q outweighs the slight difference in dimension etc.

We were just comparing to NTRU-HRSS. Ignoring the ring structure, our dimension is higher and modulus is lower, so it should be at least as hard.

Have you considered Rounded NTRU as a smaller alternative to Noisy NTRU?

We wanted to include as few extra assumptions as possible and make things as close as possible to NTRU-HRSS. But yes, that should reduce the ciphertext size.

How about a smaller q, and fixed weight instead of variable weight?

In order to be able to do NTT, the only smaller compatible q that I see below 2^12 is 3457 ... it's certainly an option. Using fixed weight is harder to sample and harder to calculate the decryption error.

One of the reasons for upwards pressure on your q is that you're using

the old idea of taking f as 1+3*small instead of small. Your paper says

that (\Z/3)[x]/(x^768-x^384+1) doesn't support an NTT, but there are

many fast multiplication methods---for example, as in the HRSS software,

you could simply reuse your (\Z/q)[x]/(...) multiplier. As for division,

that's only an issue for keygen, and it's going to be quite fast anyway

since x^768-x^384+1 factors mod 3 as (x^64+x^32-1)^6 (x^64-x^32-1)^6.

This would slow things down, but it is something that could be a viable trade-off if we reduce q to 3457 and thus save 768 bits in the public key / ciphertext.

Now is a good place to discuss decryption errors (since you bring it up at the very end as a way to differentiate schemes). I am really confused as to why you draw this solid line between 0 decryption error and >0 decryption error. There are actual reductions e.g. [HHK17] that take small decryption errors into account (though admittedly, [HHK17] is not too convenient to accurately use with NTRU - but is perfectly fine for Regev-type schemes). And if your decryption error e is such that e<< 1/|M|, where M is the message space, then information theoretically, only a negligible fraction of keys will have any decryption errors in your CCA scheme (assuming you believe that SHA acts as a random function). So you are just as secure as without decryption errors.

In the NTTRU paper, we also give a simple transformation that reduces the message space of the scheme (since in NTRU schemes, the message space is larger than 256 bits) so you can apply the information theoretic argument. You mentioned in a previous email that decryption errors may be hard to calculate as a reason to completely avoid them -- but in NTRU schemes where each coefficient is distributed independently, this is not the case at all (and if we made a mistake, it should be easy to catch, so this is not a reason to avoid more efficient schemes). So I would even suggest that other schemes, like NTRU-HRSS, consider reducing their modulus to get decryption errors and (possibly) use the transformation in NTTRU to provably handle decryption errors. This both reduces the ciphertext / public key and increases security due to the smaller modulus. I can sort of understand if you complain if schemes having 2^-80 decryption error claim 128-bit security, but I don't understand why the mere presence of any decryption error is automatically a strike against the scheme.

Back to q=3457 ... I calculate that if we use f=3f'+1, then the decryption error will be around 2^-260. I could personally live with that, but our security proof doesn't work anymore. We mention as an open problem that there might be a way to get a better bound for our transformed scheme using ideas from [HHK17] - but I am not completely sure about this. If it's true, then one should almost definitely go for the smaller prime. Otherwise, an option would be, as you said, to not use f=3f'+1 and do divisions as in NTRU-HRSS. Without the extra multiplication by 3, the decryption error goes down to around 2^-460, so would be provably fine to use.

Your polynomial x^768-x^384+1 is irreducible over \Z but has 2^256 monic

divisors (and more non-monic divisors) modulo the prime 7681 that you're

using. Have you at least analyzed the impact that this has upon security

against _known_ NTRU attacks? For example:

* Do you know how small the coefficients can be in divisors of

degree, say, 384? Sufficiently small coefficients would seem to

open up the ancient NTRU attack strategy from, e.g., 2001 Gentry.

I am not discounting your concerns, but I do not want to get into a discussion about attacks that don't exist. My personal view, for what it's worth, is that all splittable cyclotomic rings are fine, if one uses a large-enough secret / noise. If the noise is small (i.e. 0/1 as in all NTRU schemes), then I would feel better if more people studied the particular rings -- but this is both a function of the small noise and the extra algebraic structure, and I am not sure which part individually should be of more concern (this is one of the reasons why we use larger errors in Kyber). The reason that I am a fan of rings with algebraic structure is that, in addition to added efficiency and elegance, they allow more interesting advanced primitives. You can't do useful FHE in some rings and we're working on much improved zero-knowledge proof in which complete splitting is crucial. So it would be a huge shame if such rings are actually fine to use, but are discounted without much further study because of "paranoia".

To have larger secret / error in NTRU schemes, I would even propose to consider working with a noise that is larger than -1,0,1, but still reduce mod 3. (So the problem is to recover the message mod 3 rather than the message itself). If you generate the errors as a binomial, then this new problem is provably at least as hard as the former, but there is almost certainly more practical security since the lattice problem is now harder.

* A multi-target attack can scan for keys that reduce to 0 modulo any

of your cubic factors. Doesn't this happen more than once in a

billion keys, allowing a slightly faster lattice attack since g is

then constrained to a lattice of slightly smaller dimension?

I don't see the improved attack. If f is 0 in some NTT coefficient, then you only get knowledge of one *NTT coefficient* of g. Unless you're doing exhaustive search, I don't know of any way that knowing an NTT coefficient can help in lattice reductions.

NTTRU HRSS Prime feature

----- ----- ----- -------------------------------------------

no yes yes avoid decryption failures

yes yes yes irreducible polynomial (over \Z)

no no yes avoid nontrivial number-field automorphisms

yes no yes prime modulus q

no no yes the ring is a field

The last two "no" entries for NTTRU are inherent in this use of NTT,

but the first "no" would be very easy to fix.

No fix needed :)

-Vadim

---Dan

--

You received this message because you are subscribed to the Google Groups "pqc-forum" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

Jan 20, 2019, 1:08:16 AM1/20/19

to pqc-...@list.nist.gov

Hi all,

On 19.01.19 23:30, Vadim Lyubashevsky wrote:

>

> Hi Dan, all,

>

> Multiplication uses 3

> FFTs (and fewer if some FFTs can be cached); the usual reduction from

> arbitrary polynomials to FFT-friendly polynomials sacrifices a factor of

> 2 (e.g., multiply modulo x^761-x-1 by multiplying modulo an FFT-friendly

> polynomial of degree 1536); the usual reduction from arbitrary primes to

> FFT-friendly primes sacrifices a factor of 2 when one input is small (as

> it is in NTRU), or 3 in general.

>

>

> When you say "the usual reduction", do you mean setting the FFT-friendly

> prime q large enough so that no reduction modulo q takes place? If

> that's the case and your original prime is p and your original degree is

> d, then you need q to be at least > pd (and that's if you're multiplying

> by -1/0/1 polynomials). Gregor can mention much more about this, but I

> believe that for such large q ( > 16 bits), some of his AVX2 tricks used

> in NTTRU don't directly carry over. But anyway, it would certainly be

> nice if this helps others.

I think what Dan means is doing Chinese remaindering modulo 2 or 3

different primes.

Cheers,

Gregor

On 19.01.19 23:30, Vadim Lyubashevsky wrote:

>

> Hi Dan, all,

>

> Multiplication uses 3

> FFTs (and fewer if some FFTs can be cached); the usual reduction from

> arbitrary polynomials to FFT-friendly polynomials sacrifices a factor of

> 2 (e.g., multiply modulo x^761-x-1 by multiplying modulo an FFT-friendly

> polynomial of degree 1536); the usual reduction from arbitrary primes to

> FFT-friendly primes sacrifices a factor of 2 when one input is small (as

> it is in NTRU), or 3 in general.

>

>

> When you say "the usual reduction", do you mean setting the FFT-friendly

> prime q large enough so that no reduction modulo q takes place? If

> that's the case and your original prime is p and your original degree is

> d, then you need q to be at least > pd (and that's if you're multiplying

> by -1/0/1 polynomials). Gregor can mention much more about this, but I

> believe that for such large q ( > 16 bits), some of his AVX2 tricks used

> in NTTRU don't directly carry over. But anyway, it would certainly be

> nice if this helps others.

different primes.

Cheers,

Gregor

Jan 20, 2019, 1:28:46 AM1/20/19

to Gregor Seiler, pqc-...@list.nist.gov

I think what Dan means is doing Chinese remaindering modulo 2 or 3

different primes.

Ah, that makes sense.

And about Dan's "Rounded NTRU" idea to reduce ciphertext size, I guess one could incorporate it even without making the extra assumption - i.e. one could add a random message as per the usual NTRU and also drop bits from the ciphertext. The hard problem is then to recover the message + extra dropped bits mod 3, but that should be ok security-wise and should reduce the ciphertext a bit.

Cheers,

Vadim

Cheers,

Gregor

Jan 20, 2019, 1:06:26 PM1/20/19

to pqc-...@list.nist.gov

Vadim Lyubashevsky writes:

> When you say "the usual reduction", do you mean setting the FFT-friendly

> prime q large enough so that no reduction modulo q takes place?

https://www.ams.org/journals/mcom/1971-25-114/S0025-5718-1971-0301966-0/S0025-5718-1971-0301966-0.pdf
> When you say "the usual reduction", do you mean setting the FFT-friendly

> prime q large enough so that no reduction modulo q takes place?

The multiplication method needs a large enough modulus but doesn't need

the modulus to be irreducible. Pollard writes "It is possible to replace

the one possibly very large prime p required by a number of smaller

primes", and gives an example where he chooses three word-size primes.

Two word-size primes are enough in the typical NTRU context. I already

gave the example of M = (x^1024+1)(x^512-1) modulo 12289 and 18433. Each

multiplication in (\Z/4591)[x]/(x^761-x-1) reduces efficiently to an

NTT-friendly multiplication in (\Z/12289)[x]/M and an NTT-friendly

multiplication in (\Z/18433)[x]/M, since one input is small.

Consequently, I expect your work to end up saving more cycles for

sntrup4591761 and ntrulpr4591761 than it has saved for similarly sized

NTT-friendly rings. This should make the NTRU-Prime-vs.-NTT cycle gap

even less noticeable than it was before, compared to the primary cost

incurred by all these systems, namely bytes transmitted.

> Using fixed weight is harder to sample and harder to calculate the

> decryption error.

trivial to choose the weight so that there are no decryption failures.

> I am really confused as to why you draw this solid line between

> 0 decryption error and >0 decryption error.

attacked with complexity below the claimed security level". Recall that

ss-ntru-pke is part of the original NTRUEncrypt submission; it was

claimed to be "extremely conservative" and provably secure. The attack

exploits decryption failures.

How many lines of failure-probability calculations has this competition

been throwing at security reviewers? And how many lines of alleged

proofs that the failures are too rare to cause trouble? How many

mistakes and security losses do we need to see before deciding that this

minor optimization within lattice-based cryptography is a bad idea?

The important metric here is the complexity of a comprehensive review of

security. This varies considerably from one system to another, and it's

directly connected to the risk of a breakable system being standardized

and/or deployed. See my email dated 8 Dec 2018 04:17:14 -0000 for

analysis and more examples, such as OCB2.

[ quotient rings ]

> I am not discounting your concerns, but I do not want to get into a

> discussion about attacks that don't exist.

Reducing the NTRU key equation modulo a divisor of the polynomial, and
> discussion about attacks that don't exist.

then applying lattice attacks to the reduced equation, is a well-known

attack strategy that breaks some rings. See, e.g., 2001 Gentry.

The quantitative impact depends on the shape of the divisor (at least

the positions and sizes of nonzero coefficients). I don't see how you

can claim security against _known_ attacks if you haven't analyzed the

shape of divisors of x^768-x^384+1 modulo 7681.

How effective are lattice attacks after reduction mod 45x^384+56, for

example? The coefficients here look projectively quite a bit smaller

(compared to 7681) than the coefficients that Jacob was trying for LAC

(compared to 251). How about reduction mod 5x^192+13x^96-5?

Of course there's also a much more recent line of attacks exploiting

automorphisms, subfields, etc., most recently (2018) breaking through

the 2^sqrt approximation-factor barrier. It would hardly be a surprise

if the next few years of developments also exploit divisors. I don't

understand why anyone would recommend giving the attacker an arsenal of

attack tools that have produced attacks at SODA 2016, Eurocrypt 2017,

etc., and that we know how to eliminate at low cost. But, even if you

don't agree, surely you have to analyze whether your system survives the

specific attack strategies that have broken some earlier systems.

> You can't do useful FHE in some rings and we're working on much

> improved zero-knowledge proof in which complete splitting is crucial.

lattice-based system. How do you imagine that we're going to get from

the current situation, with security flaws being announced in 2019 in

"provably secure" lattice systems, to a situation where we can actually

be confident in the security of what NIST standardizes? Do you think

that adding extra desiderata such as FHE is going to help this process?

---Dan

Jan 20, 2019, 3:02:41 PM1/20/19

to pqc-...@list.nist.gov

On 1/20/19 10:06 AM, D. J. Bernstein wrote:

> Vadim Lyubashevsky writes:

>> I am really confused as to why you draw this solid line between

>> 0 decryption error and >0 decryption error.

> eprint 2019/043 appeared a few days ago, saying that ss-ntru-pke "is

> attacked with complexity below the claimed security level". Recall that

> ss-ntru-pke is part of the original NTRUEncrypt submission; it was

> claimed to be "extremely conservative" and provably secure. The attack

> exploits decryption failures.

>

> How many lines of failure-probability calculations has this competition

> been throwing at security reviewers? And how many lines of alleged

> proofs that the failures are too rare to cause trouble? How many

> mistakes and security losses do we need to see before deciding that this

> minor optimization within lattice-based cryptography is a bad idea?

>

> The important metric here is the complexity of a comprehensive review of

> security. This varies considerably from one system to another, and it's

> directly connected to the risk of a breakable system being standardized

> and/or deployed. See my email dated 8 Dec 2018 04:17:14 -0000 for

> analysis and more examples, such as OCB2.

Hi Dan,
>> I am really confused as to why you draw this solid line between

>> 0 decryption error and >0 decryption error.

> eprint 2019/043 appeared a few days ago, saying that ss-ntru-pke "is

> attacked with complexity below the claimed security level". Recall that

> ss-ntru-pke is part of the original NTRUEncrypt submission; it was

> claimed to be "extremely conservative" and provably secure. The attack

> exploits decryption failures.

>

> How many lines of failure-probability calculations has this competition

> been throwing at security reviewers? And how many lines of alleged

> proofs that the failures are too rare to cause trouble? How many

> mistakes and security losses do we need to see before deciding that this

> minor optimization within lattice-based cryptography is a bad idea?

>

> The important metric here is the complexity of a comprehensive review of

> security. This varies considerably from one system to another, and it's

> directly connected to the risk of a breakable system being standardized

> and/or deployed. See my email dated 8 Dec 2018 04:17:14 -0000 for

> analysis and more examples, such as OCB2.

I'm not sure how ePrint 2019/043 supports your point, except in that

writing and reading such a paper takes work. Perhaps I am misreading

this paper, but as I understand it, it proposes using 2^127

chosen-ciphertext queries (i.e. almost the square of NIST's evaluation

point) to 2^64 public keys, plus 2^217 work, to recover one key in a

system that has 2^-80 failure probability. While the paper proposes a

concrete attack, most authors have been assuming that attacks of this

form can be much stronger, and have set their failure probabilities much

lower than 2^-80. So the actual content of the paper does not detract

from Vadim's point that sufficiently smaller failure probabilities are safe.

Overall, my position is somewhere between Dan's and Vadim's. In sum:

* If conservative parameters are chosen (say 2^-160 and proper domain

separation, a la Kyber), known CCA reaction attacks plus likely

improvements will not lead to a certificational weakness in the form of

an attack that works with high probability.

* There is some question on whether an attack counts as a

certificational weakness if it takes little effort but is extremely

unlikely to work. Such attacks are harder to rule out.

* If conservative parameters are chosen, even significant advances in

CCA reaction attacks are unlikely to make those attacks feasible.

* For a given scheme without error correction, it is not very difficult

to quantify the above factors, but it is time-consuming to do it for all

schemes.

* Exactly how conservative the parameters should be is still a matter of

debate, and that debate is a time sink. The most conservative failure

probability, and the easiest to evaluate, is zero. It may or may not be

worth the efficiency cost.

* Schemes with error correction, such as my own, are more efficient but

compound the difficulty of analysis.

Details:

As far as I am aware, all the proposed reaction attacks on FO-protected

systems work as follows:

* Compute lots of ciphertexts, probably encrypted to many different keys.

** If no domain separation, you can compute encryptions to all keys at once.

* Evaluate, somehow, those which are most failure-prone. Send them to

would-be victims.

** No attack has been proposed which can detect weak keys, or which can

use information from the public key to estimate failure probability of

the ciphertext. Rather they estimate failure against an unknown secret

drawn from the distribution of private key noise.

* If you see a decryption failure, somehow use that information to

attack that private key.

** Probably this involves repeating the above steps, but you have new

information about the key, and it may be a weak key, so the next

iterations may be faster.

While recent papers have attempted to quantify the practical complexity

of an attack of this form, none of them has improved on this procedure

for the effort required to get the first failure. For a scheme without

error correction, it is not difficult to rigorously compute both the

failure probability and the amount of work required to find the first

failure with a given query limit. However, the first failure can happen

in a few queries with tiny probability, which makes it hard to rule out

a certificational weakness if you consider attacks that usually don't

work. Also it does take appreciably more work than proving a lack of

decryption failures, so I agree that perfect correctness is a useful

analysis feature.

There is also a risk that someone will discover an advanced attack that

detects weak keys, and/or can estimate failure probability using the

public key. In the ROM, such an attack still cannot be more efficient

than a search for a failure, which gives advantage at most:

Expectation over collections of K honestly-generated keys of [max over

all messages m of [pr(fail given K,m)]] * (attacker's work)

where the inner "max over all messages m" is generally irrelevant, and

the inner pr(fail) is taken over the coins of the ciphertext. Again,

this value is not difficult to compute or at least to bound for most

schemes, but it is significant work to examine every scheme to compute

it. If the probability of failure is set at, eg, 2^-160, then such an

advanced attack could become a certificational weakness if the scheme is

designed for 2^256 security, especially with Grover's algorithm. The

weakest key of a collection of 2^64 keys might have 2^-140 failure

probability in expectation, which would then entail 2^76 quantum work

with maxdepth 2^64 to find a failure. Note however that this assumes

large cryptanalytic advances and is still probably infeasible for at

least 40 years, since maxdepth 2^64 is highly optimistic and there is a

large constant out front for reversibility, refrigeration etc.

Cheers,

-- Mike Hamburg

Jan 20, 2019, 3:46:08 PM1/20/19

to pqc-...@list.nist.gov

Consequently, I expect your work to end up saving more cycles for

sntrup4591761 and ntrulpr4591761 than it has saved for similarly sized

NTT-friendly rings. This should make the NTRU-Prime-vs.-NTT cycle gap

even less noticeable than it was before, compared to the primary cost

incurred by all these systems, namely bytes transmitted.

Also, maybe rings that don't require inert primes that NTRU Prime does, but still use a non-NTT-friendly cyclotomic of the form (X^p -1)/(x-1), can natively use the NTT-friendly modulus that matches the one for the bigger NTT-friendly ring in which they want to do the multiplication? This way, they can just do multiplication in the larger ring without worrying about the modulus. Basically, as far as I understand, the only reason that NTRU-HRSS uses a power-of-2 modulus is to speed up multiplication. But if multiplication can be made even faster with this NTT over a larger ring when the same prime is used, then maybe there is no reason to use a power-of-2 (except maybe key generation could be affected ... but I have no idea here).

Sorting int32[768] takes 6512 cycles with verified code, and it's

trivial to choose the weight so that there are no decryption failures.

That's the number of cycles in the entire key generation / encapsulation algorithms of NTTRU ... and I don't see any advantage of doing this.

> I am really confused as to why you draw this solid line between

> 0 decryption error and >0 decryption error.

eprint 2019/043 appeared a few days ago, saying that ss-ntru-pke "is

attacked with complexity below the claimed security level". Recall that

ss-ntru-pke is part of the original NTRUEncrypt submission; it was

claimed to be "extremely conservative" and provably secure. The attack

exploits decryption failures.

(My writing of this email crossed with Mike's and our conclusion about this eprint paper appears to be the same). The paper states "A result of the attack is that it is questionable whether it is possible to
achieve high security (say 256 bits in the classic sense) for proposals with midlevel decryption error probability (say 2
−100)". I am not sure if the NTRUEncrypt paper actually claimed this (or why anyone would need to claim more than 128 bits of security), but this paper is not contradicting any sort of proof.

There is a big difference between statements of the form "Our error is 2^-100 and we don't think it can be exploited" vs. "here is the security proof that formally takes decryption errors into account". I understand if someone doesn't like the former, but I cannot understand how anyone could object to the latter.... and yet in the next paragraph you do.

How many lines of failure-probability calculations has this competition

been throwing at security reviewers? And how many lines of alleged

proofs that the failures are too rare to cause trouble? How many

mistakes and security losses do we need to see before deciding that this

minor optimization within lattice-based cryptography is a bad idea?

For schemes like NTTRU, Kyber, etc, it's very few lines. All the errors are independent and the error probability is easy to calculate and verify via simple polynomial multiplication. If there is a mistake, it will be caught (especially if the scheme becomes some sort of a standard). So maybe you're complaining about not trusting [HHK '17] because it's a long paper. I obviously object to that, but ok ... how about the 1 line information theory argument in my previous email? (And regarding Mike's point about weak keys -- I am even saying that you can set your decryption error so that there are no weak keys at all with very high probability and it would still be beneficial over having no decryption errors.)

What bothers me about this "0 decryption errors" movement is that it's actually making schemes less secure and less efficient and getting *nothing* in return. For example, just extrapolating the numbers from NTTRU, I am reasonably sure that NTRU-HRSS could have divided their modulus by 2 and still had something like 2^-400 decryption error. I haven't done the calculation, but supposing that this is true, then this would be *information theoretically* secure if they go through the transformation in NTTRU. This leads to a more secure scheme (modulus is halved, which is especially important for NTRU due to previously shown overstretched modulus attacks) and saves around 87 bytes in the pk and 87-32 bytes in the ciphertext. And there is 0 penalty. Or alternatively, they could use secrets that are not restricted to -1/0/1 and again be more secure.

Or are you making a general statement of: "optimization can lead to errors, so let's not optimize?" I don't think that you actually believe this; and decryption errors is really one of the simplest things that one can verify in these schemes. And if you are going to use papers that claim something that later turns out to be not true as evidence that anything in this line of research should be abandoned, then we would not have much crypto left -- and I think that this is basically the QKD marketing argument.

The important metric here is the complexity of a comprehensive review of

security. This varies considerably from one system to another, and it's

directly connected to the risk of a breakable system being standardized

and/or deployed.

Then I think that everything should be deployed in MATLAB -- much more readable code that minimizes implementation errors.

Of course there's also a much more recent line of attacks exploiting

automorphisms, subfields, etc., most recently (2018) breaking through

the 2^sqrt approximation-factor barrier. It would hardly be a surprise

if the next few years of developments also exploit divisors. I don't

understand why anyone would recommend giving the attacker an arsenal of

attack tools that have produced attacks at SODA 2016, Eurocrypt 2017,

etc., and that we know how to eliminate at low cost. But, even if you

don't agree, surely you have to analyze whether your system survives the

specific attack strategies that have broken some earlier systems.

We don't disagree as much as you think on this issue. I think I made the statement at the 1st NIST conference that we should pick one ring and concentrate our analysis on it. This was also the point I was making about more advanced crypto - no matter what gets standardized for basic encryption / signatures, people are most likely going to still use structured rings there ... so it would be better if we really throw all our effort onto these useful rings right now.

As far as the specific cryptanalysis NTTRU, it's not a NIST entrant so we did not think a whole lot about maximizing security. But if some high frequency trader said to me that he really really needs the absolutely fastest scheme, then I would suggest that they still use NTTRU, but take as wide a distribution as possible (i.e. not -1/0/1) that still results in ~ 2^-350 decryption error.

One (additional) reason that the attack of reducing modulo small factors is concerning is because reducing a -1/0/1 polynomial modulo some smaller-degree factor doesn't necessarily create a polynomial with many different coefficients. e.g. say the d-degree f(x) has a factor (x^d/2-r), for a large r. Reducing any -1/0/1 polynomial modulo x^d/2-r will just result in coefficients close to 0, r, and -r. If there is some element somewhat close to r whose inverse is small, then we multiply by this inverse and get a polynomial with small coefficients. Not sure if this is possible, but the -1/0/1 coefficients are a concern to me. This is why I am not a fan of proposals which restrict to such small errors ... (incidentally, the NTRU schemes don't have to have small errors if they permit some decryption error :).)

Cheers,

-Vadim

> You can't do useful FHE in some rings and we're working on much

> improved zero-knowledge proof in which complete splitting is crucial.

Assume that performance constraints force NIST to standardize a small

lattice-based system. How do you imagine that we're going to get from

the current situation, with security flaws being announced in 2019 in

"provably secure" lattice systems, to a situation where we can actually

be confident in the security of what NIST standardizes? Do you think

that adding extra desiderata such as FHE is going to help this process?

---Dan

Jan 20, 2019, 7:18:56 PM1/20/19

to Vadim Lyubashevsky, Gregor Seiler, pqc-...@list.nist.gov

You know of course how famous the Lyubashevsky name is, don’t you? Are you related to the famous fellow?

Get Outlook for iOS

Jan 22, 2019, 9:55:21 AM1/22/19

to pqc-...@list.nist.gov

Vadim Lyubashevsky writes:

> Also, maybe rings that don't require inert primes that NTRU Prime

> does, but still use a non-NTT-friendly cyclotomic of the form

> (X^p -1)/(x-1), can natively use the NTT-friendly modulus that matches

> the one for the bigger NTT-friendly ring in which they want to do the

> multiplication?

Right. Sometimes the standard reduction across polynomials is enough and
> Also, maybe rings that don't require inert primes that NTRU Prime

> does, but still use a non-NTT-friendly cyclotomic of the form

> (X^p -1)/(x-1), can natively use the NTT-friendly modulus that matches

> the one for the bigger NTT-friendly ring in which they want to do the

> multiplication?

it isn't necessary to also pay for the standard reduction across primes.

Whether the prime was inert at the beginning isn't what matters for this

implementation strategy; e.g., Streamlined NTRU Prime 4591^761 could use

size-54 FFTs and even size-270 FFTs, although back-of-the-envelope

calculations suggest that these still won't be competitive with the

reduction from 4591 to primes that allow power-of-2 FFTs.

> I am reasonably sure that NTRU-HRSS could have divided their modulus

> by 2 and still had something like 2^-400 decryption error.

explains an easy way to divide the HRSS modulus by 2 in the original

dimension while guaranteeing _zero_ decryption failures. Of course

sntrup4591761 follows the same curve in dimension closer to 768.

> What bothers me about this "0 decryption errors" movement is that it's actually

> making schemes less secure and less efficient and getting *nothing* in return.

eliminating decryption failures in NTRU etc. This goes all the way back

to https://web.securityinnovation.com/hubfs/files/ntru-orig.pdf, page

19, "drawbacks". There's an even longer tradition of overconfidence in

security analyses. A proper comparison has to quantify the costs, and

has to analyze the load placed upon security reviewers.

> So maybe you're complaining about not trusting [HHK '17] because it's

> a long paper.

claimed in HHK17. See https://cr.yp.to/papers.html#tightkem, Appendix A.

Even by now, has anyone carefully checked the proofs? We know how to add

an extra assumption to rule out our counterexamples, but this doesn't

mean that these assumptions are enough to rescue the theorems.

Computer verification would drastically reduce the risks of proof

errors. With verification in mind, Persichetti and I wrote up a very

carefully modularized proof of a particularly attractive ROM IND-CCA2

conversion, applicable to _some_ NIST submissions without decryption

failures. But verification is still a big research project beyond this.

Even when we have correct proofs, there's a critical question of whether

the assumptions in the proofs are correct. Here are three aspects of

this:

* Many NIST submissions (I'm not saying all) use proofs that start

from IND-CPA and that lose tightness starting from OW-CPA. Why

should we think that IND-CPA reaches the claimed security levels?

There's much less evidence of cryptanalysts studying IND-CPA than

studying OW-CPA.

* For the QROM, we don't have _any_ tight proofs starting from

OW-CPA. There's a DS assumption that doesn't _sound_ much stronger

than OW-CPA, but how much have cryptanalysts looked at DS?

Persichetti and I have identified a QROM IND-Hash assumption that

seems strictly closer to OW-CPA than DS is, and I hope we'll have

QROM IND-CCA2 from QROM IND-Hash fully written up soon, but this

still needs cryptanalysis.

* By now quite a few people, including you, have noticed that the

attacker is allowed to inspect the secret key in the definition of

failure probability in HHK17. (I mean the final version of HHK17;

the initial version claimed theorems using a different definition,

and as far as I know nobody has proven any of those theorems.)

Often being able to see the secret key makes failing messages

easier to find. How much have cryptanalysts looked at this?

Even if NIST submissions are calculating failure probabilities

correctly, they're usually (I'm not saying always) doing it with

the wrong definition, so we usually don't have IND-CCA2 security

proofs, even assuming IND-CPA.

OCB2 had a security proof at Asiacrypt, was standardized, and 14 years

later turned out to be horribly broken. In retrospect, one piece of the

OCB2 proof was a probability calculation under a definition that turned

out to be inadequate for another piece of the proof. Do we dismiss this

as an isolated mistake? Or do we start explicitly recognizing that

security reviewers are overloaded, and that we have to simplify the task

of security review to get cryptographic risks under control?

---Dan

Jan 22, 2019, 1:27:32 PM1/22/19

to D. J. Bernstein, pqc-...@list.nist.gov

On Jan 22, 2019 6:55 AM, "D. J. Bernstein" <d...@cr.yp.to> wrote:

* By now quite a few people, including you, have noticed that the

attacker is allowed to inspect the secret key in the definition of

failure probability in HHK17. (I mean the final version of HHK17;

the initial version claimed theorems using a different definition,

and as far as I know nobody has proven any of those theorems.)

Often being able to see the secret key makes failing messages

easier to find. How much have cryptanalysts looked at this?

Even if NIST submissions are calculating failure probabilities

correctly, they're usually (I'm not saying always) doing it with

the wrong definition, so we usually don't have IND-CCA2 security

proofs, even assuming IND-CPA.

Hi Dan,

I'm a little confused on this point. Is this an NTRU issue? For LPR10 and related schemes, the message has a small (< 1 bit) or zero effect on whether decryption fails. It's all in the coins, and if you're FO transforming, those are chosen by the hash. So in the ROM, the adversary requires 1/pfail queries (or at most one bit less) in expectation to find a failure even with the private key. Therefore pfail is the right thing to measure for provable security, and this is what most submissions calculated.

The recent "impact of decryption failures" papers have studied the other problem: how hard is it to find a decryption failure with limited decryption queries, without the private key? This is probably much harder but I don't think anyone has used this in a "provable security" claim.

Or is your objection about weak keys and multi-target security? That could multiply the adversary's advantage by up to numkeys, and though I expect in reality the factor is less than sqrt(numkeys), it is probably not negligible.

-- Mike

Jan 22, 2019, 2:23:28 PM1/22/19

to D. J. Bernstein, pqc-...@list.nist.gov

Once more in plain text instead of HTML-only, sorry for the spam.

Jan 22, 2019, 7:09:33 PM1/22/19

to D. J. Bernstein, pqc-...@list.nist.gov

> On Jan 22, 2019, at 6:55 AM, D. J. Bernstein <d...@cr.yp.to> wrote:

>

> Vadim Lyubashevsky writes:

>> What bothers me about this "0 decryption errors" movement is that it's actually

>> making schemes less secure and less efficient and getting *nothing* in return.

>

> There's a long tradition of unquantified hyping of the supposed costs of

> eliminating decryption failures in NTRU etc. This goes all the way back

> to https://web.securityinnovation.com/hubfs/files/ntru-orig.pdf, page

> 19, "drawbacks". There's an even longer tradition of overconfidence in

> security analyses. A proper comparison has to quantify the costs, and

> has to analyze the load placed upon security reviewers.

I made a rough estimate by comparing existing systems (ThreeBears, Saber, Kyber, NTRU LPrime) of the cost/benefit of security features. By “efficiency” I mean min(core-sieve, hybrid) security / bytes transmitted, which is an imperfect measure but good enough for this purpose. I used a combination of Schanck’s script and “estimate all the LWE”. I mainly considered class-(1-2) systems under these estimates, including LightSaber, the Round2 candidates near LightSaber, NTRU HRSS 701, KyberLight, BabyBear, and NTRU Prime. I estimated improvements in % more (bytes sent per bit of core-sieve security), abbreviated “costs about n% more”. For MLW(E,R) you can’t tune the parameters arbitrarily, so the efficiency metric might not exactly reflect the cost of the change, but again, maybe good enough for a first pass. So all these are very rough numbers and subject to exact tradeoffs and optimization.
> Vadim Lyubashevsky writes:

>> What bothers me about this "0 decryption errors" movement is that it's actually

>> making schemes less secure and less efficient and getting *nothing* in return.

>

> There's a long tradition of unquantified hyping of the supposed costs of

> eliminating decryption failures in NTRU etc. This goes all the way back

> to https://web.securityinnovation.com/hubfs/files/ntru-orig.pdf, page

> 19, "drawbacks". There's an even longer tradition of overconfidence in

> security analyses. A proper comparison has to quantify the costs, and

> has to analyze the load placed upon security reviewers.

Error correction:

Without error correction, ThreeBears costs 6-9% more.

Very low failure probability:

For ThreeBears, generic attacks (of the type discussed in my previous email) probably won't find any decryption failures using <= 2^64 queries and <= (NIST class level) work. Let’s call this failure level “probably low enough”. If ThreeBears’ error probability is decreased to be 2^-(lattice security level + 32 bits), so that its security impact is likely to be negligible even in a weak key + improved cryptanalysis scenario — call this “very low" failure probability — then this costs 3-6%.

Ring shape:

ThreeBears’ ring shape costs about 6% more than cyclotomic due to noise amplification. NTRU LPrime’s ring shape costs about 11% more than cyclotomic due to noise amplification.

Rounding:

With errors instead of rounding, NTRU LPRime would cost about 15% more. Kyber costs 21% more than Saber, due in large part to more rounding. ThreeBears costs about 9% more than it would if half its noise came from rounding (could use full rounding to save more but the parameters are more annoying).

Perfect correctness:

With perfect correctness, ThreeBears costs 44% more, which is to say 35% more than it would with no error correction. NTRU LPrime costs 33% more than ThreeBears, which squares pretty well the above feature costs (*1.44 for perfect correctness, /1.15 for rounding, *1.05 for ring shape). Likewise NTRU-HRSS estimates 175 bytes cost each in ciphertext and public key, plus 20 bits of security cost, which works out to about 34% more.

Binomial vs +-1 noise: I didn’t consider this too carefully, but LightSaber costs about 15% more than Round2. There are probably other important differences though.

Overall, it appears to me that perfect correctness costs about 33% more than “probably low enough" failure probability, and maybe 27% more than “very low" failure probability. This makes it a significantly greater cost than LWE vs LWR, than no-ECC vs 2-ECC, or than an inert modulus. Perhaps it is still worthwhile, but this doesn’t seem as obvious to me as it does to Dan.

I did this work pretty roughly, but a more detailed study like this could be useful guidance for second round tweaks. For example, it might well be worthwhile for ThreeBears to remove error correction and/or set “very low" failure probability.

Cheers,

— Mike

Jan 23, 2019, 8:11:10 AM1/23/19

to pqc-...@list.nist.gov

Let's take simple rounding with no decryption failures as the baseline.

Mike estimates that aggressive size optimization---take noise instead of

rounding, use an error-correcting code, and aim for "probably low

enough" failure probability---reduces total key+ciphertext size by a

factor between 1.19 and 1.25 compared to the baseline. (See below.)

Nobody has pointed to an application where this minor change in size

suddenly has users saying "Okay, _now_ that's small enough to deploy."

Meanwhile these failures are a pain for security reviewers. There's a

huge pile of complicated decryption-failure analyses in lattice

submissions (not to mention code-based submissions with decryption

failures), and now we're seeing various claims that decryption failures

are actually more frequent than previously claimed, that attackers can

pump up the probability further, and that exploiting decryption failures

disproves the security levels claimed for some lattice systems.

Two years from now, are we going to be able to look at the remaining

lattice submissions and confidently say that any nonzero failure rates

aren't a problem? How much more research is this going to take? Wouldn't

it be smarter to simply eliminate decryption failures and focus the same

research effort on trying to reduce other worrisome lattice risks?

For comparison, the Ed25519 design team opted for 64-byte signatures

rather than 48-byte signatures---a 1.33x ratio in signature size, and a

1.20x ratio in key+signature size---to simplify the security analysis.

Of course, years later, someone overconfidently claimed that one could

use 48-byte signatures "without losing security", and of course exposing

errors in that claim required a more complicated security analysis,

which was time taken away from more important issues.

The rest of this message has some comments on Mike's size estimates.

The size changes we're talking about generally accommodate changes in

security level, so the problem of reviewing size comparisons includes

all the problems of reviewing security comparisons. Recall

* one estimate saying that EMBLEM-611 is hundreds of times easier

to break than uRound2.KEM-500 (2^76 vs. 2^84) and

* another estimate saying that, no, uRound2.KEM-500 is thousands of

times easier to break than EMBLEM-611 (2^126 vs. 2^142),

where all four of these wildly inconsistent estimates appear on the same

"Estimate all the LWE/NTRU schemes" page. This is a 1.25x swing in

relative security level. This could be the aggregate result of even

larger swings that the estimates end up assigning to specific design

elements. How are we supposed to extract anything meaningful from this?

What's the scientific justification for the claim that any of these

estimates are "good enough for this purpose"?

One answer would be to say that if a design element has a noticeable

benefit in _every_ estimate then surely it's beneficial. But what if

we're simply aggregating a huge pile of sloppy estimates that are all

missing exactly the point that we're trying to observe?

Here's a concrete example. Attack papers such as

https://www.iacr.org/archive/crypto2007/46220150/46220150.pdf

https://eprint.iacr.org/2011/458.pdf

https://eprint.iacr.org/2014/880.pdf

https://eprint.iacr.org/2015/823.pdf

https://eprint.iacr.org/2016/177.pdf

show how to exploit rotation (multiplication by x) to speed up various

attacks---but the same speedup idea seems somewhat less effective

against NTRU Prime. Fully quantifying this requires more algorithm

analysis. Qualitatively, this effect is in the opposite direction of the

more obvious effect that NTRU Prime has to choose slightly fewer error

positions to handle the extra coefficient of x^n-x-1.

As far as I know, the rotation impact is taken into account in _none_ of

the estimates from the "Estimate all the LWE/NTRU schemes" page. This

isn't surprising given the history of these estimates, pursuing simple

formulas at the expense of accuracy. Presumably Mike's new "combination"

of estimates, producing his conclusion that switching to the NTRU Prime

ring produces a 1.11x size increase, also fails to take the rotation

impact into account.

Another source of noise is the switch from a proper two-dimensional

size-and-security graph (as in eprint 2018/1174) to a one-dimensional

size-per-bit-of-security ratio. Size shouldn't be _exactly_ linear in

bits of security, and we have no way to see what effect this has on tiny

claimed ratios such as 1.06 (Mike's "costs about 6% more") since we

haven't been told any of the security numbers.

Finally, let me comment on the "factor between 1.19 and 1.25" mentioned

above. This comes from dividing the following two estimates provided by

Mike, disregarding all of the above caveats:

* Starting from the baseline, switching from rounding to noise is

estimated to cost something between 1.15x and 1.21x. ("With errors

* Using an error-correcting code to aim for "probably low enough"

failure probability is then estimated to save 1.44x. ("With perfect

correctness, ThreeBears costs 44% more".)

I'm unable to figure out why Mike claims 1.33x "overall", rather than

something between 1.19x and 1.25x, as the cost for "perfect

correctness". He estimates 1.33x for NTRU LPRime vs. ThreeBears, but

this includes a cost for the ring switch. He also estimates 1.34x for

NTRU-HRSS vs. ThreeBears, but

https://cr.yp.to/talks/2018.06.27-2/slides-djb-20180627-lattice-a4.pdf

and eprint 2018/1174 already showed NTRU-HRSS parameter improvements

that maintain perfect correctness.

---Dan

Mike estimates that aggressive size optimization---take noise instead of

rounding, use an error-correcting code, and aim for "probably low

enough" failure probability---reduces total key+ciphertext size by a

factor between 1.19 and 1.25 compared to the baseline. (See below.)

Nobody has pointed to an application where this minor change in size

suddenly has users saying "Okay, _now_ that's small enough to deploy."

Meanwhile these failures are a pain for security reviewers. There's a

huge pile of complicated decryption-failure analyses in lattice

submissions (not to mention code-based submissions with decryption

failures), and now we're seeing various claims that decryption failures

are actually more frequent than previously claimed, that attackers can

pump up the probability further, and that exploiting decryption failures

disproves the security levels claimed for some lattice systems.

Two years from now, are we going to be able to look at the remaining

lattice submissions and confidently say that any nonzero failure rates

aren't a problem? How much more research is this going to take? Wouldn't

it be smarter to simply eliminate decryption failures and focus the same

research effort on trying to reduce other worrisome lattice risks?

For comparison, the Ed25519 design team opted for 64-byte signatures

rather than 48-byte signatures---a 1.33x ratio in signature size, and a

1.20x ratio in key+signature size---to simplify the security analysis.

Of course, years later, someone overconfidently claimed that one could

use 48-byte signatures "without losing security", and of course exposing

errors in that claim required a more complicated security analysis,

which was time taken away from more important issues.

The rest of this message has some comments on Mike's size estimates.

The size changes we're talking about generally accommodate changes in

security level, so the problem of reviewing size comparisons includes

all the problems of reviewing security comparisons. Recall

* one estimate saying that EMBLEM-611 is hundreds of times easier

to break than uRound2.KEM-500 (2^76 vs. 2^84) and

* another estimate saying that, no, uRound2.KEM-500 is thousands of

times easier to break than EMBLEM-611 (2^126 vs. 2^142),

where all four of these wildly inconsistent estimates appear on the same

"Estimate all the LWE/NTRU schemes" page. This is a 1.25x swing in

relative security level. This could be the aggregate result of even

larger swings that the estimates end up assigning to specific design

elements. How are we supposed to extract anything meaningful from this?

What's the scientific justification for the claim that any of these

estimates are "good enough for this purpose"?

One answer would be to say that if a design element has a noticeable

benefit in _every_ estimate then surely it's beneficial. But what if

we're simply aggregating a huge pile of sloppy estimates that are all

missing exactly the point that we're trying to observe?

Here's a concrete example. Attack papers such as

https://www.iacr.org/archive/crypto2007/46220150/46220150.pdf

https://eprint.iacr.org/2011/458.pdf

https://eprint.iacr.org/2014/880.pdf

https://eprint.iacr.org/2015/823.pdf

https://eprint.iacr.org/2016/177.pdf

show how to exploit rotation (multiplication by x) to speed up various

attacks---but the same speedup idea seems somewhat less effective

against NTRU Prime. Fully quantifying this requires more algorithm

analysis. Qualitatively, this effect is in the opposite direction of the

more obvious effect that NTRU Prime has to choose slightly fewer error

positions to handle the extra coefficient of x^n-x-1.

As far as I know, the rotation impact is taken into account in _none_ of

the estimates from the "Estimate all the LWE/NTRU schemes" page. This

isn't surprising given the history of these estimates, pursuing simple

formulas at the expense of accuracy. Presumably Mike's new "combination"

of estimates, producing his conclusion that switching to the NTRU Prime

ring produces a 1.11x size increase, also fails to take the rotation

impact into account.

Another source of noise is the switch from a proper two-dimensional

size-and-security graph (as in eprint 2018/1174) to a one-dimensional

size-per-bit-of-security ratio. Size shouldn't be _exactly_ linear in

bits of security, and we have no way to see what effect this has on tiny

claimed ratios such as 1.06 (Mike's "costs about 6% more") since we

haven't been told any of the security numbers.

Finally, let me comment on the "factor between 1.19 and 1.25" mentioned

above. This comes from dividing the following two estimates provided by

Mike, disregarding all of the above caveats:

* Starting from the baseline, switching from rounding to noise is

estimated to cost something between 1.15x and 1.21x. ("With errors

instead of rounding, NTRU LPRime would cost about 15% more. Kyber

costs 21% more than Saber, due in large part to more rounding.")
* Using an error-correcting code to aim for "probably low enough"

failure probability is then estimated to save 1.44x. ("With perfect

correctness, ThreeBears costs 44% more".)

I'm unable to figure out why Mike claims 1.33x "overall", rather than

something between 1.19x and 1.25x, as the cost for "perfect

correctness". He estimates 1.33x for NTRU LPRime vs. ThreeBears, but

this includes a cost for the ring switch. He also estimates 1.34x for

NTRU-HRSS vs. ThreeBears, but

https://cr.yp.to/talks/2018.06.27-2/slides-djb-20180627-lattice-a4.pdf

and eprint 2018/1174 already showed NTRU-HRSS parameter improvements

that maintain perfect correctness.

---Dan

Jan 23, 2019, 9:15:48 AM1/23/19

to pqc-...@list.nist.gov

On Wed, Jan 23, 2019 at 2:11 PM D. J. Bernstein <d...@cr.yp.to> wrote:

Let's take simple rounding with no decryption failures as the baseline.

Mike estimates that aggressive size optimization---take noise instead of

rounding, use an error-correcting code, and aim for "probably low

enough" failure probability---reduces total key+ciphertext size by a

factor between 1.19 and 1.25 compared to the baseline. (See below.)

Nobody has pointed to an application where this minor change in size

suddenly has users saying "Okay, _now_ that's small enough to deploy."

In a previous email, you stated that "Typically sending or receiving a byte costs at least three orders of

magnitude more than a clock cycle." Your djbsort saves about 25000 cycles over a sorting function from 2001 -- https://sorting.cr.yp.to/refs.html. Let's presume that the correctness of the 2001 version has already been verified many times and people are comfortable implementing it correctly (I don't know anything about this topic, but I presume there must be some version of relatively efficient sorting that people are comfortable with.) While I am sure that you implemented your sorting algorithm correctly, it is not simple, fairly new, and will probably lead to mistakes when people try to implemented themselves in various environments. According to your metric, saving 25000 cycles could also be done by reducing the ciphertext by 25 bytes. So now, what will lead to fewer mistakes in, for example, NTRU Prime? Here are the two options:1. Reducing the size of the ciphertext in a *provable* way by introducing decryption errors. The proof needs to be verified, but *only done once*. I am sure that for a NIST standard, we can get the attention of enough top people that we can be quite confident that the math is correct. There are no discussions about security models here or anything -- it's just pure math. I hope you trust that people can verify math proofs.

2. Implement djbsort. People still need to verify that it's correct and constant time (maybe there are automatic ways to do this ... but then someone has to verify the tools are written properly). Then every time it's implemented, one has to verify the implementation itself. Mistakes will surely be made there because it has to be tested in every implementation.

So which approach is more likely to lead to mistakes?

Incidentally, I am not arguing against djbsort, I am just pointing out that the argument that calculating decryption errors correctly is the most likely point of failure in lattice-based schemes is pretty absurd.

-Vadim

Jan 23, 2019, 1:05:55 PM1/23/19

to D. J. Bernstein, pqc-...@list.nist.gov

Hi Dan,

You seem to have misunderstood my writeup. Sorry if it was too sloppy.

On Jan 23, 2019, at 5:10 AM, D. J. Bernstein <d...@cr.yp.to> wrote:Let's take simple rounding with no decryption failures as the baseline.

Mike estimates that aggressive size optimization---take noise instead of

rounding, use an error-correcting code, and aim for "probably low

enough" failure probability---reduces total key+ciphertext size by a

factor between 1.19 and 1.25 compared to the baseline. (See below.)

Using noise instead of rounding is the second-largest size

*pessimization*, after perfect correctness. Calling it “aggressive size

optimization” is simply dishonest.

Two years from now, are we going to be able to look at the remaining

lattice submissions and confidently say that any nonzero failure rates

aren't a problem? How much more research is this going to take? Wouldn't

it be smarter to simply eliminate decryption failures and focus the same

research effort on trying to reduce other worrisome lattice risks?

Indeed, that’s the topic of this thread.

Also, you seem to be ignoring Vadim’s and my point in this thread. Yes,

if the failure rate is 2^-120, it will take significantly more research to be

confident that no attack is feasible. To a lesser extent this is true for 2^-160.

If the failure rate is 2^-224 (for say a class III system), then even knowing the

private key, and even if it’s 2^32 times weaker than normal, you’d need a

192-bit brute-force search to find a failing message. If the failure rate is

2^-300, then additionally for almost all keys that will ever be honestly

generated, there are no failures.

That’s not a CCA2 QROM proof by itself, and I’m aware that writing one

takes significant effort, but your hand-wringing is a little excessive.

The rest of this message has some comments on Mike's size estimates.

The size changes we're talking about generally accommodate changes in

security level, so the problem of reviewing size comparisons includes

all the problems of reviewing security comparisons. Recall

* one estimate saying that EMBLEM-611 is hundreds of times easier

to break than uRound2.KEM-500 (2^76 vs. 2^84) and

* another estimate saying that, no, uRound2.KEM-500 is thousands of

times easier to break than EMBLEM-611 (2^126 vs. 2^142),

where all four of these wildly inconsistent estimates appear on the same

"Estimate all the LWE/NTRU schemes" page. This is a 1.25x swing in

relative security level. This could be the aggregate result of even

larger swings that the estimates end up assigning to specific design

elements. How are we supposed to extract anything meaningful from this?

What's the scientific justification for the claim that any of these

estimates are "good enough for this purpose”?

Take a look at the graphs of security vs security at

(These are with an outdated version of “estimate all the LWE”; sorry.)

You will see that Round2 and RLizard are the only off-curve systems.

You will also see that Round2 is the estimate I qualified with “I didn’t

consider this too carefully”.

Here's a concrete example. Attack papers such as

https://www.iacr.org/archive/crypto2007/46220150/46220150.pdf

https://eprint.iacr.org/2011/458.pdf

https://eprint.iacr.org/2014/880.pdf

https://eprint.iacr.org/2015/823.pdf

https://eprint.iacr.org/2016/177.pdf

show how to exploit rotation (multiplication by x) to speed up various

attacks---but the same speedup idea seems somewhat less effective

against NTRU Prime. Fully quantifying this requires more algorithm

analysis. Qualitatively, this effect is in the opposite direction of the

more obvious effect that NTRU Prime has to choose slightly fewer error

positions to handle the extra coefficient of x^n-x-1.

As far as I know, the rotation impact is taken into account in _none_ of

the estimates from the "Estimate all the LWE/NTRU schemes" page. This

isn't surprising given the history of these estimates, pursuing simple

formulas at the expense of accuracy. Presumably Mike's new "combination"

of estimates, producing his conclusion that switching to the NTRU Prime

ring produces a 1.11x size increase, also fails to take the rotation

impact into account.

Sorry, I haven’t read that section of those five papers yet. Can you quantify

the effect this has on, eg, NTRU-HRSS?

Another source of noise is the switch from a proper two-dimensional

size-and-security graph (as in eprint 2018/1174) to a one-dimensional

size-per-bit-of-security ratio. Size shouldn't be _exactly_ linear in

bits of security, and we have no way to see what effect this has on tiny

claimed ratios such as 1.06 (Mike's "costs about 6% more") since we

haven't been told any of the security numbers.

Fair. But this effect probably changes things by at most a couple %. It is good

enough to roughly quantify the costs at play. I don’t think the cost of perfect

correctness actually being 31% or 37% really changes the discussion here.

Finally, let me comment on the "factor between 1.19 and 1.25" mentioned

above. This comes from dividing the following two estimates provided by

Mike, disregarding all of the above caveats:

* Starting from the baseline, switching from rounding to noise is

estimated to cost something between 1.15x and 1.21x. ("With errors

instead of rounding, NTRU LPRime would cost about 15% more. Kyber

costs 21% more than Saber, due in large part to more rounding.")

* Using an error-correcting code to aim for "probably low enough"

failure probability is then estimated to save 1.44x. ("With perfect

correctness, ThreeBears costs 44% more".)

I'm unable to figure out why Mike claims 1.33x "overall", rather than

something between 1.19x and 1.25x, as the cost for "perfect

correctness". He estimates 1.33x for NTRU LPRime vs. ThreeBears, but

this includes a cost for the ring switch. He also estimates 1.34x for

NTRU-HRSS vs. ThreeBears, but

https://cr.yp.to/talks/2018.06.27-2/slides-djb-20180627-lattice-a4.pdf

and eprint 2018/1174 already showed NTRU-HRSS parameter improvements

that maintain perfect correctness.

—Dan

Sorry to be unclear.

For ring shapes, I estimated ThreeBears’ security with additional noise, which

would be possible if it were not amplified by the ring shape. I also estimated

ThreeBears’ security with NTRU LPRime’s ring shape. Likewise for “very low”

failure probability. I did this using the same script (Schanck) that I had used

to estimate security in the first place, since it’s faster than “Estimate all the

LWE” and I’m lazy, and compared the results from that script to each other.

For perfect correctness:

For ThreeBears, since I am most familiar with that system, I calculated

several parameter sets sufficient for perfect correctness. I used the same

script that I had used to estimate security in the first place. For the same

core-sieve security level, the best new parameters cost about 44% more

than ThreeBears as submitted. (It happens that such parameters existed that

put the security within a few bits of BabyBear; this makes the “costs more”

a more accurate representation of size ratio in that case.) They also cost

about 35% more than ThreeBears with no error correction, which I had already

calculated parameters for earlier, to decide whether to use error correction at

all. I used this same technique for “very low failure probability”.

For NTRU-HRSS, I went by the submitting team’s estimates of the cost of

perfect correctness. See https://cryptojedi.org/papers/ntrukemnist-20171130.pdf

page 24.

NTRU LPRime has a similar security level to BabyBear (as rated by

Estimate All the LWE), so also I compared those two, and found 1.33 cost

ratio. The chief differences are IMLW[ER] vs RLW[ER] which has only a

small effect on size, plus error correction, perfect correctness, ring shape, and

rounding. We already have 1.44 for error correction vs perfect correctness

in ThreeBears; 1.05 ~ 1.11/1.05 for the relative cost of their ring shapes;

and 1.15 as an estimate for NTRU LPRime with noise vs rounding. You

see that 1.44/1.15*1.05 ~ 1.31, which is pretty close to 1.33.

I take this as very weak validation that my costing methodology wasn’t terrible.

I simply don’t have time to write a 20-page survey paper on the topic complete

with charts and figures, but I think the above technique contributes to the

discussion.

If you have time, what was the cost of perfect correctness in NTRU LPRime,

vs some other possibilities (eg 2^-133 (BabyBear claimed), 2^-161 (Kyber),

or 2^-288 (my “very low” for your class V estimate))?

Cheers,

— Mike

Jan 24, 2019, 4:56:07 AM1/24/19

to pqc-...@list.nist.gov

Vadim Lyubashevsky writes:

> I hope you trust that people can verify math proofs.

Isn't this the same thread where, a few messages ago, I mentioned that
> I hope you trust that people can verify math proofs.

two of the six core "theorems" claimed in HHK17 are incorrect? And

surely you've heard about

https://eprint.iacr.org/2018/1040

https://eprint.iacr.org/2018/1087

https://eprint.iacr.org/2018/1090

completely breaking the OCB2 standard, which had a security "proof" from

Rogaway. Perhaps you mean something special by "math proofs", but the

literature on pure mathematics has many bogus "proofs", as the following

sample illustrates:

https://en.wikipedia.org/wiki/List_of_incomplete_proofs

Of course formal logic has a long tradition of defining proofs in a

rigid way that allows trivial verification. In principle, converting

today's "proofs" into formal proofs will rule out the bogus "proofs".

There are serious efforts in this direction; see, e.g., mizar.org. But

you're kidding yourself if you think that all of the "proofs" cited in

lattice-based submissions will be converted into formal proofs within

the timeframe of the competition.

> the argument that calculating decryption errors correctly is the most

> likely point of failure in lattice-based schemes is pretty absurd.

systems submitted to NIST. The August 2018 Round5 proposal claimed NIST

security level 5, but turned out to be breakable via decryption failures

(under reasonable assumptions) with <2^64 ciphertexts and a feasible

calculation. The analysis of these issues is complicated, but clearly

one of the big sources of screwups is people thinking that calculating

decryption-failure probabilities is simpler than it actually is.

I don't understand how you can be so cavalier in dismissing the security

risks in this area. It also isn't helpful when you mischaracterize

attention to a risk as an irrelevant claim that the risk is the single

"most likely point of failure".

> Your djbsort saves about 25000 cycles over a sorting function from

> 2001 -- https://sorting.cr.yp.to/refs.html. Let's presume that the

> correctness of the 2001 version has already been verified many times

> and people are comfortable implementing it correctly

your 100% confidence in "math proofs", evidently you have in mind a

principle that speedups increase risks of software bugs. You're so

convinced of the universal applicability of this principle that you post

a message applying this principle to sorting software, where your

conclusions are simply wrong.

The actual situation is the following.

djbsort has been computer-verified to correctly sort all inputs for

every array size that we use in cryptography. (See the documentation for

discussion of the remaining risks.) It's also constant-time.

The 2001 software you mention doesn't have either of these features. If

I were forced to guess, I'd guess that the 2001 software is bug-free;

but guesses like this are sometimes disastrously wrong. Anyway, since

the 2001 software isn't constant-time, it can't even be considered for

these applications.

There was some work before djbsort on verifying sorting software (see,

e.g., https://link.springer.com/article/10.1007/s10817-017-9426-4), but

I haven't found any previous verified _constant-time_ software. Such

software wouldn't have been impossible to create with previous

techniques, but it would have been intolerably slow.

You claim that "mistakes will surely be made" in every implementation

for every new environment. The automated djbsort verification tools are

general-purpose tools working directly with machine code, precisely to

resolve the conventional tension between optimization and verification.

You say that "maybe there are automatic ways" to verify correctness of

sorting software. Well, yes, there are, and if you're going to comment

on risks of bugs then maybe you should first learn what's been done in

the area to control these risks, rather than spreading generic FUD.

> someone has to verify the tools are written properly

illustrated by the following "proof": https://github.com/clarus/falso

> According to your metric, saving 25000 cycles could also be done by

> reducing the ciphertext by 25 bytes. So now, what will lead to fewer

> mistakes in, for example, NTRU Prime?

let me briefly emphasize what this question is missing about NTRU Prime.

Within small lattice systems, the primary design goal of NTRU Prime is

to minimize attack surface, making the security reviewer's job as easy

as possible. If there's a way to save 25 bytes that makes the security

reviewer's job harder, NTRU Prime says no. If there's a way to save

25000 cycles that makes the security reviewer's job harder, NTRU Prime

says no. Asking whether "yes no" is more risky than "no yes" is

irrelevant; NTRU Prime says "no no" to both risks.

---Dan

Jan 24, 2019, 5:55:35 AM1/24/19

to pqc-...@list.nist.gov

Perhaps you mean something special by "math proofs", but the

literature on pure mathematics has many bogus "proofs", as the following

sample illustrates:

https://en.wikipedia.org/wiki/List_of_incomplete_proofs

You are now actually arguing for minimizing the reliance on math in real life. Let's apply this to other situations: do you fly on airplanes that rely on the least amount of math? And in the more fuel-efficient planes, is the trade-off between fuel economy and the amount of math worth it? Should we have (tall) buildings -- their design requires math and how many people really checked the calculations? I think that this argument has run its course, so this is my last response on this topic.

-Vadim

Reply all

Reply to author

Forward

0 new messages