Inaccurate security claims in NTRUprime

Leo Ducas

unread,

May 15, 2016, 10:22:10 AM5/15/16

to Cryptanalytic algorithms

I was recently pointed to the long awaited paper about NTRUPrime: https://ntruprime.cr.yp.to/ntruprime-20160511.pdf

I find the proposal intriguing, and worth consideration. Unfortunately, the current concrete security analysis is bogus and the security claims inaccurate. A quick counter-analysis suggests the security of the proposal is overestimated by about 75 bits.

The first mistake is equation (4) (top of page 19) in Estimate(β,n,p). The chosen model for the cost of enumeration doesn't match the asymptotic of the BKZ/HKZ types of algorithm. The given model is in 2^O(β^2), while the best known algorithm (Kannan's SVP/HKZ algorithms) runs in time 2^O(β log β). Provable yet reasonably efficient version of Kannan's algorithm were recently studied by Micciancio and Walter [http://eprint.iacr.org/2014/569]. More generally, even BKZ+Enum has (heuristically ?) cost 2^(β log β) when the blocksize is appropriately chosen: this is discussed in [https://eprint.iacr.org/2015/046.pdf].

The second mistake stems from the conventional misconception that sieving algorithms are only useful to solve SVP. The authors only consider the cost of Sieving on the full lattice (or maybe only half of it, since the lattice natural has dimension 2p, but the cost is take to be 2^{.292p} in Alg 1.). This is not true: sieving can be used in replacement for enumeration inside BKZ. This brings the cost of BKZ down to poly(n)*2^(.292 β) (where the poly may be make linear with various heuristic tricks as progressive-BKZ, I personally recommend considering this poly(n) factor to 1 for any security claim consideration.).

Tweaking the script PQsecurity.py of Newhope*, which do take account for sieving inside BKZ, one obtains that BKZ algorithm will find the secret key via the ``primal attack'', when ran with blocksize β=480 on the lattice constructed from m=600 equations mod q out of the 739. This translate to λ = .292β = 140 bits of security, or a slightly more if one is ready to make a wild guess** on the polynomial factors.

In conclusion, the schemes offers a fair amount of security according to current classical attacks, but far below the 215 bits claimed. In particular, it offers less security than ntruees787ep1 (but this is obvious since the dimension is smaller, the error density similar, and the modulus larger) and much less security than NewHope (it is more similar to JarJar).

I do understand the intent of minimizing the presence of algebraic handles, and definitely think there is some value of an alternative to cyclic / cyclotomic rings. But with respect to ``known attack'' rather than ``yet to be discovered attacks'', the NTRUprime proposal is significantly less conservative than what it is compared to, so I am not sure the comparison of Fig 1.1 is as significant as one would hope.

When re-parametrized properly to compare NTRUprime to other schemes, it shouldn't be surprising to have a gap of performances with cyclotomic. Considering how fast all this already is, a factor 2x on time could be considered perfectly acceptable. It could also be possible to optimize the NTRUprime scheme a bit: Theorem 2.1 is an overkill, and tighter bounds could certainly be established probabilistically.

-- Leo

* The currently available script and paper [https://eprint.iacr.org/2015/1092.pdf] are not accurate (fortunately it underestimate the cost of the dual attack), in 6.4 equation (2) should instead read

- 2 π^2 τ^2 ≥ ln (ε/4)

A repaired script is attached, and we are doing our best to push the final version of NewHope in the shortest delays.

** Part of the hidden factor of the cost of sieving could be amortized inside BKZ, by some recycling trickery. I advise against making wild guesses.

PQsecurity.py

D. J. Bernstein

unread,

May 15, 2016, 4:18:45 PM5/15/16

to Cryptanalytic algorithms

The explicit security target for NTRU Prime is >=2^128 post-quantum.
There are no published attack strategies that threaten this. Also, Leo's
current claim of a 2^140 attack is based on a clear error.

The paper's security estimate from _known_ attack algorithms is 2^215
pre-quantum, with the obvious caveats that

* there are many huge question marks in the literature's performance
estimates for large-dimension lattice attacks and

* quantum computers probably save a little bit.

Sieving is asymptotically faster than enumeration but is highly unlikely
to have a serious impact for the NTRU Prime parameters. Even without
accounting for memory consumption, we were underestimating the cost of
sieving as a direct attack (thanks to John Schanck for pointing this
out), and sieving is even less useful as a subroutine inside BKZ etc.
(since the dimensions there are even smaller).

The New Hope paper paints a different picture, with sieving as the most
important attack, because it starts from a (clear, indisputable) severe
underestimate of the cost of all known sieving algorithms. Confusing
this underestimate with the cost of an actual attack, as Leo now does,
is simply wrong. Leo also fails to note that the parameter-selection
strategy in the New Hope paper, namely asking for this underestimated
cost to exceed 2^128, also says that our NTRU Prime parameters are fine.

Leo Ducas writes:
> * The currently available script and paper [https://eprint.iacr.org/2015/
> 1092.pdf] are not accurate

Bogus, you say?

We all know that the performance of lattice attacks is still quite
poorly understood---but treating underestimates as reality doesn't help
move the analysis forward.

---Dan (speaking for myself)

Leo Ducas

unread,

May 16, 2016, 12:52:03 AM5/16/16

to Cryptanalytic algorithms, d...@cr.yp.to

Dan first points me to his draft. Then, Dan refutes the inconsistencies of his security analysis on the ground that no one really knows.

Yes, indeed, no one really knows, but the counter-analysis I propose is based on the model of sieving proposed in your paper. One is free to do other choices to model sieving. One is less free, as a scientist, to conclude on 215 bits of security from an inconsistent analysis, at least once he has been made aware of the inconsistencies.

On page 19, you and your co-authors claim:

"
The most recent work on lattice sieving (see [7,45]) has pushed the
heuristic complexity down to 2 0.292p+o(p) . A closer look at polynomial factors
indicates that the o(p) here is positive, and the analysis of [49] suggests that
these algorithms are not helpful for cryptographic sizes, but we nevertheless use
2^0.292p to estimate the speed of sieving attacks.
"

which is a precaution I definitely recommend. But with such an estimate on the speed of sieving, I have demonstrated above that the proper security claim under this model is 140 bits.

Dan B. writes:
>> Leo Ducas writes:
>> * The currently available script and paper [https://eprint.iacr.org/2015/
>> 1092.pdf] are not accurate

> Bogus, you say?

Bogus I say, and without shame. We all do mistakes, and this is why we review each-others work, and value constructive criticism from our peers.

Leo Ducas

unread,

May 16, 2016, 5:29:58 AM5/16/16

to Cryptanalytic algorithms, d...@cr.yp.to

Let me try to clarify this discussion. First let me define three approaches to security evaluation:

A/ Provide clean and simple underestimates of the cost each relevant algorithms, that is below both the measured runtime and the asymptotic runtime. Explore exhaustively all combinations and parametrization of those algorithms. Derive an accurate security claim under this model.

B/ Rely on a measured runtimes, and fit under the appropriate model matching the asymptotic runtime. Explore exhaustively all combinations and parametrization of those algorithms, and comment in the text if some seems irrelevant AFTER having considered them precisely. Derive an accurate security claim under this model.

C/ Rely on a measured runtimes, and fit under a model that does not match the asymptotic runtime. Explore some combinations and parametrization of those algorithms, but discard arbitrary some combination without making this point explicit nor giving text explanation of why this is discarded. Claim a number an hope that the various inaccuracies will cancel each other out.

I tend to prefer A. Approach A is I think the way to obtain, using your words ``Boring crypto''. But I wouldn't criticize B. As I read it, your current draft follows methodology C, and I am trying to provide you the necessary input to help you and your co-authors to upgrade it to methodology B. It is also annoying that bits of methodology locally follows (page 19) the principles of A, which makes it very unclear to the reader whether your analysis is tight or conservative.

In my first message, I assumed that your intent was methodology A, for which I suggested that an over-claim of 75 bits. Even if your intent to follow methodology B, your analysis remains bogus and the claim inaccurate: you are currently doing C. May it be a less dramatic inaccuracy, I am still offering my expertise to reach B, yet I'll leave it to you to find out how dramatic the claim should decrease. Please understand the word inaccuracy as a measure of the cumulated systematic bias in an analysis, not as a measure of the trueness of its result. Since you dismiss my counter-argument by claiming sieving is irrelevant inside BKZ, I will dispute this claim technically in PS (in short, this claims requires assuming that a single byte of memory cost more than 2^145 clock-cycles).

I am also pointing at the fact that your comparison to other schemes is not perfectly relevant (or unfair), since those schemes should achieve higher security when all of them are evaluated under a unique methodology. This comparison is done to demonstrate how fast your current proposal is. By doing so, you are presenting performances as a more important quality than security, which is something you typically voice against. How much should one over-shoot security is an independent debate, for which I won't feel qualified to answer, but it should not interfere with making fair comparisons.

I have also heard you voice criticism on the blurriness on estimating lattice attacks. I'm offering clarification, but rather than trying to understand my points, you actually hide behind this very same blurriness to dismiss any educated critique of your analysis.

-- Leo

PS: Sieving is relevant inside BKZ, even for your range of parameter !

From your 215-bit claim, I get that the hybrid attack of [HG] is instantiated so that both steps have the same cost hybridbkzcost = hybridmitmcost = 215 (up to the granularity of the parameters). If the hybridbkzcost were to decrease using a better algorithm, then the overall cost should also decrease by adjusting the parameters so that both steps match again -- by a smaller factor admittedly. Reverse-engineering equation (4), I get that you used blocksize somewhere between 310 to 330 (depending on the number of tours you used in bkz, data that is not given. This should be just 1 or 2 using progressive BKZ strategies, but that you may have taken a larger number of rounds.)

def enum(b):
   return 0.000784314*b**2 + 0.366078*b - 6.125 + 7

enum(310) = 189.7
enum(330) = 207.1

I will now consider the impact of replacing the enumeration step. I am considering the HashSieve algorithm [http://eprint.iacr.org/2014/744.pdf] with probing algorithm of T. Laarhoven, which I am sure you have heard about, which, in term of memory consumption is more reasonable than the other variants. I am extrapolating from his estimates, which for larger blocksize is an over-estimate since it includes as constant in the exponent factors that are supposed to be sub-exponential.

def sieve(b):
   return .45*b - 18.5 + 31

def mem_sieve(b):
   return .27*b

Where + 31 accounts for converting seconds into clock-cycles @2GHz. I obtain:

sieve(310) = 152.0, mem_sieve(310) = 83.7
sieve(330) = 161.0, mem_sieve(330) = 89.1

which is less costly in time than enumeration by 37 to 46 bits, but cost some memory of course. If you which to dismiss this because of memory, then you need to actually provides those estimates and explain why they should be irrelevant. If you claim that no attacks perform better than 2^215, you need to argue that 2^215 clock-cycle are somehow more accessible than 2^90 memory, that is, assuming that a single byte of storage is more expensive than 2^{215-70} = 2^145 operations !!
Otherwise, increasing the block-size just a bit will still make the first step cheaper, and will decrease the cost of the second step, it remains to determine by how much. (Also, maybe you should made explicit how much memory does the mitm step requires, maybe it is already higher than 2^90...)

Of course, we are here in the realm of science fiction in term of both computing power and memory capacity. But any claim of security above 128 bit belongs to that realm, and I see no reason to consider 2^215 clock-cycles more relevant than 2^90 memory. In the end, scientific quality requires us to take those question more seriously, in particular to provide fair comparison.

I have never claimed your scheme to be insecure in practice, but your analysis is in several places inconsistent, inaccurate and misses exhaustiveness. Once repaired and exhaustive, if some attacks are dismissed it should be written explicitly and argued for, making for example explicit what would be there storage requirement, and why it should be considered less feasible than the slower memoryless attack.

D. J. Bernstein

unread,

May 16, 2016, 7:56:41 AM5/16/16

to Cryptanalytic algorithms

The NTRU Prime paper estimates that Streamlined NTRU Prime 9829^739 has
pre-quantum security 2^215 against known attacks. The estimation method
is fully explained in the paper and covers many more attack details than
the New Hope paper. The resulting estimates are repeatedly labeled as
estimates.

The NTRU Prime paper is also completely explicit about its target
security level: "more than 2^128 post-quantum security." Why do we claim
merely >2^128, when our best attack estimate is ~2^215? A small part of
the answer is the impact of quantum computers on lattice attacks. A much
larger part of the answer is that the lattice-security literature is
very far from the clear, stable state desired in cryptography.

Leo continues to insist that a 2^140 attack would violate some
unidentified "security claims" in the paper. He refuses to acknowledge
the gap between the paper's explicit security _target_ and the paper's
explicit security _estimates_. He also refuses to acknowledge that his
claimed 2^140 attack cost is a severe underestimate.

I fully understand Leo's enthusiasm for sieving. Asymptotics clearly say
that sieving beats enumeration for _sufficiently large_ dimensions, and
advances in sieving over the last few years seem likely to have reduced
the cutoff. However, as far as I know, everyone who has tried sieving as
a BKZ subroutine in place of enumeration has concluded that sieving is
much too slow to be useful---the cutoff is beyond cryptographically
relevant sizes.

In the NTRU Prime paper, to see whether sieving could be of _any_
potential value as an attack tool, we focused on direct sieving attacks.
In these attacks, the dimensions are much larger than the dimensions
that appear inside BKZ, giving sieving an extra advantage. We also used
an explicitly labeled underestimate of the cost of these attacks (and
further underestimated the cost by using p instead of 2p, as pointed out
by John and now Leo).

Do the resulting calculations say that sieving is _completely_
uninteresting for the attacker? No. Does this justify claiming that
previous researchers are wrong about enumeration beating sieving inside
BKZ? Of course not. Does it justify taking explicit _underestimates_ of
attack costs as if they were _actual_ attack costs? Certainly not.

I think it's a feasible research project to thoroughly assess the actual
costs of sieving (for all sizes, not just experimental sizes), taking
all overheads into account. These costs---and the resulting cutoff
between enumeration and sieving---will certainly be much higher than our
explicit underestimates. Many people would be surprised for the cutoff
to be small enough to have any useful impact on BKZ. This analysis has
been on our to-do list for a while, but it's rather low priority
compared to exploring more plausible attack ideas.

Leo Ducas writes:
> inconsistent analysis

Any competent scientist will make intelligent use of a wide variety of
estimation tools---expensive tools when high accuracy is required;
cheaper tools otherwise. Insisting on "consistently" using the expensive
tools is a foolish waste of time. Insisting on "consistently" using the
cheaper tools is a recipe for errors, such as your 2^140 claim.

> a precaution I definitely recommend

You're of course free to recommend choosing parameters on the basis of
speculative improvements in sieving. You're not free to misrepresent
these speculations as actual attacks.

---Dan (speaking for myself)

Leo Ducas

unread,

May 16, 2016, 9:29:06 AM5/16/16

to Cryptanalytic algorithms, d...@cr.yp.to

I have no intent of discrediting your proposal, I am merely suggesting improvements in the quality/accuracy
of its analysis, and prevent misconceptions from spreading.

This seems hopeless, but I'll give it one last try:

A) The security claim of >128 bits post-quantum is not under question. I acknowledge that this is the main claim.
B) The analysis should represent more accurately the state of the art :

1) the cost model of enumeration you use does not reflect the asymptotics of the state of the art algorithms (eq. 4).
This model has been debunked before: https://eprint.iacr.org/2015/046.pdf . The current analysis is therefore
constitute a regression on the effort to understand concrete security.

2) extrapolation of sieving suggest cross-over enumeration (with eq. 4) at dim 190, despite them probably being
an over-approximation while eq 4 is supposed to be a under-approximation. If (eq. 4) is to be repaired to the right
cost model, this point may move again. It is therefore not so absurd to question whether BKZ-300ish should better
use enumeration or sieving. It would not require much effort to take account for it in your analysis, or to explain why
it is not deemed relevant (maybe quantifying memory requirement ?).

You may ridicule me for an alleged enthousiasm for sieving, this does solve point 1. You may even claim that any competent
scientist should not care much for accuracy. Those caveats of your paper contribute to the general misunderstanding of lattice
algorithms, and their optimal combinations: this is not helping the field.

I'm giving my best advices to improve these points. You are of course free to not reach my expectations of scientific quality.

Good bye
-- Leo

Martin R. Albrecht

unread,

May 16, 2016, 12:13:39 PM5/16/16

to Cryptanalytic algorithms

Leo Ducas writes:
> 1) the cost model of enumeration you use does not reflect the
> asymptotics of the state of the art algorithms (eq. 4). This model has
> been debunked before: https://eprint.iacr.org/2015/046.pdf . The
> current analysis is therefore constitute a regression on the effort to
> understand concrete security.

Hi all,

to follow up on this, maybe I’m overlooking something, but isn’t there a
discrepancy in the NTRUPrime paper itself on the cost of enumeration?

a) page 19 first claims it has complexity `2^O(β^2)`

b) page 19 then references Kannan’s paper which gave an algorithm in
complexity `β^O(β)` and refers to enumeration as “slightly
super-exponential” (for which `2^O(β^2)` wouldn’t qualitfy in my book?)

Cheers,
Martin

--

_pgp: https://keybase.io/martinralbrecht
_www: https://martinralbrecht.wordpress.com
_jab: martinr...@jabber.ccc.de
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF

signature.asc

John Schanck

unread,

May 16, 2016, 2:37:43 PM5/16/16

to Martin R. Albrecht, Cryptanalytic algorithms

Martin R. Albrecht wrote:

to follow up on this, maybe I’m overlooking something, but isn’t there a
discrepancy in the NTRUPrime paper itself on the cost of enumeration?

a) page 19 first claims it has complexity `2^O(β^2)`

b) page 19 then references Kannan’s paper which gave an algorithm in
complexity `β^O(β)` and refers to enumeration as “slightly
super-exponential” (for which `2^O(β^2)` wouldn’t qualitfy in my book?)

Equation 4 of the NTRU Prime paper is immediately prefaced by a disclaimer that

it is an "estimate of the work required to perform BKZ-2.0". It would be a

significant misreading to interpret Equation 4 as applying generically to

BKZ/HKZ-type algorithms.

Equation 4 is based on an interpolation of the data in Table 4 of the full

version of Chen and Nguyen's BKZ-2.0 paper [1]. I reported this interpolation

in [2]; it is intended to be used to fill in the gaps in Chen and Nguyen's

table so that their data can be used in a computer search for the optimal

BKZ-2.0 attack parameters. Chen and Nguyen's table reports an upper bound on

the logarithm of the number of nodes visited by their enumeration routine on

inputs that have been preprocessed with fixed block size aborted BKZ. My

understanding is (still) that a quadratic fit is appropriate for this data.

The table lists estimates for block size 100 through 250 in increments of 10.

In [2], and in the NTRU Prime paper, the data is extrapolated to the low 300s.

Even if the fit is of the wrong form, Equation 4 should be a very good

approximation to Chen and Nguyen's method in the relevant range.

Best,

John

[1] http://www.di.ens.fr/~ychen/research/Full_BKZ.pdf

[2] https://eprint.iacr.org/2015/708

Leo Ducas

unread,

May 16, 2016, 3:46:24 PM5/16/16

to Cryptanalytic algorithms, martinr...@googlemail.com

Hi John,

Equation 4 of the NTRU Prime paper is immediately prefaced by a disclaimer that
it is an "estimate of the work required to perform BKZ-2.0". It would be a
significant misreading to interpret Equation 4 as applying generically to
BKZ/HKZ-type algorithms.

I am not sure to understand why this justifies choosing the wrong model. It is enough
to recursively use block-size n - O(log(n)) to make BKZ 2.0 run in time 2^{n log n} and
not 2^{n^2}. Would Chen and Nguyen have missed that fact and ended up with a worse
parametrization ?

Best
-- Leo

John Schanck

unread,

May 16, 2016, 6:29:10 PM5/16/16

to Leo Ducas, Cryptanalytic algorithms, Martin Albrecht

I misspoke earlier when I said that they use fixed block size preprocessing. Looking at

Chen's thesis ([1], Table 5.2), it's clear they use progressive BKZ preprocessing.

The optimal preprocessing will not be recursive BKZ w/ block size k - O(log(k)) for all k. That's

an asymptotic result, and Chen and Nguyen tried to find optimal preprocessings in specific

dimensions. It looks like they considered recursive preprocessing ([1] section 4.4.3), but I

don't see much analysis of it.

Let's go back to the k^2 vs k log k fit question. If we only consider block sizes between

100 and 400 then the difference |f(k) - g(k)| between

f(k) = 0.270189 k log(k) − 1.0192 k + 16.10 (from [2]), and

g(k) = 0.000784314k^2 + 0.366078k − 6.125 (from [3])

is less than 10. This isn't surprising since they're fit to the same data between 100 and 250.

At k=400 we have f(400) = 256 and g(400) = 266.

That makes 400 the largest blocksize that matters in the context of BKZ-2.0 based attacks

on cryptosystems with < 256 bit security. So, if you're fitting to Chen and Nguyen's data, the

choice between a quadratic and a quasilinear fit scarcely matters.

I'm going to stand by my choice of a quadratic fit since I'm still not convinced that a klog k fit is

realistic for the relevant range of block sizes. It's true that for very large k you will want to switch

to a recursive preprocessing. Once you do, your scaling from that point will be like 2^O(k log k),

but I'm not convinced this is relevant for attacks of cost less than 2^256.

I'm very curious about explicit cost estimates for algorithms with known 2^O(k log k) asymptotics.

Does anyone claim a leading constant smaller than 0.27?

Cheers,

John

[1] http://www.di.ens.fr/~ychen/research/these.pdf

[2] https://eprint.iacr.org/2015/046

[3] https://eprint.iacr.org/2015/708

Leo Ducas

unread,

May 17, 2016, 1:12:14 AM5/17/16

to Cryptanalytic algorithms, leo.d...@gmail.com, martinr...@googlemail.com

Hi John,

I misspoke earlier when I said that they use fixed block size preprocessing. Looking at
Chen's thesis ([1], Table 5.2), it's clear they use progressive BKZ preprocessing.

The optimal preprocessing will not be recursive BKZ w/ block size k - O(log(k)) for all k. That's
an asymptotic result, and Chen and Nguyen tried to find optimal preprocessings in specific
dimensions.

Yes, yes, but I would assume that asymptotically, the concrete optimal choice is at least as good
as the asymptotically optimal one...

I'm very curious about explicit cost estimates for algorithms with known 2^O(k log k) asymptotics.
Does anyone claim a leading constant smaller than 0.27?

It depends which log you meant. The best provable complexity I know of is given in
http://perso.ens-lyon.fr/guillaume.hanrot/Papers/kannan.pdf which is n^(n/2e +o(n) ) = n^{.185n}.
A lot of things can be scrapped of this algorithm to improve it: https://eprint.iacr.org/2015/046.pdf,
while mainting some of its proofs. This last one looks quite similar to BKZ+enum with blocksize
n-O(log n).

Best
-- Leo

Leo Ducas

unread,

May 17, 2016, 1:46:12 AM5/17/16

to Cryptanalytic algorithms, leo.d...@gmail.com, martinr...@googlemail.com

Hi again John,

from the data we currently have:

def enum2(b):
return 0.270189 * b * log(b) - 1.0192 * b + 16.10
def HashSieve(b):

return .45*b - 18.5 + 31

def mem_HashSIeve(b):
return .27*b

I compute a cross-over point at dimension b=217, for a cost 2^110.4 and memory 2^58.6.
My conclusion is that sieving is not so irrelevant for cryptographic block sizes. This contradicts Dan's claim.

As for choosing the model 2^O(b log b) or 2^O(b^2), I must insist in switching to the former: even if this does not affect much the trueness of your claims, it is more accurate and reflects properly the state of the art.

Thanks for the constructive discussion John.

-- Leo

D. J. Bernstein

unread,

May 20, 2016, 11:50:53 AM5/20/16

to Cryptanalytic algorithms

Leo Ducas writes:
> I compute a cross-over point at dimension b=217,
> for a cost 2^110.4 and memory 2^58.6.

Now try to visualize 2^58.6 bits of storage. That's on the scale of
10000 hard drives, costing something on the scale of $1000000. (Even
more if the original units are something bigger than bits.)

For the same money you can run a huge GPU cluster, making enumeration
many, many, many thousands of times faster. See generally the CHES 2011
paper from Kuo--Schneider--Dagdelen--Reichelt--Buchmann--Cheng--Yang.
This huge speedup is not taken into account in your comparison.

Meanwhile the frequent RAM accesses inside sieving turn into frequent
hard-drive accesses, making sieving slower. This is something else not
taken into account in your comparison. Replacing the 10000 drives with
RAM would increase hardware costs even more and would beg the question
of how to efficiently access so much RAM (not that this is trivial for
disks, but the usual speed of RAM makes the issues here more obvious).

In a naive cost model, sieving is easy to parallelize: use your favorite
parallel sorting or routing algorithm to handle a large batch of memory
accesses. However, in any realistic cost model, the communication inside
sorting imposes a massive overhead, scaling as roughly the square root
of the amount of data being sorted. One introduction to the literature
on this topic is

https://cr.yp.to/papers.html#nonuniform

specifically A.3, in turn pointing to [29], [24], and [18]; also Q.6,
analyzing the actual records for the energy cost of large-scale sorting.

These fundamental problems with sieving scalability are artificially
suppressed from all of the public sieving experiments that I've seen.
Specifically, in these measurements,

* the largest dimensions are chosen so that everything fits into
nearby DRAM, so one can't see the slowdown of accessing anything
bigger; and

* there's no effort to target really small dimensions, so one can't
see the analogous _speedup_ of fitting into, e.g., GPU SRAM.

The resulting extrapolations are obviously highly inaccurate: they
ignore the fact that exponential increases in storage force exponential
increases in the cost of hardware and in the cost of random access.

> My conclusion is that sieving is not so irrelevant for cryptographic
> block sizes. This contradicts Dan's claim.

Which claim, precisely? You keep making assertions of errors without
pinpointing the exact statements that you believe to be erroneous.

My current guess is that you're referring to the following statement:
"As far as I know, everyone who has tried sieving as a BKZ subroutine in

place of enumeration has concluded that sieving is much too slow to be
useful---the cutoff is beyond cryptographically relevant sizes."

Perhaps you'd like to say that _you've_ tried sieving as a BKZ
subroutine and _not_ come to this conclusion---because, I presume,
you're making the highly inaccurate extrapolations discussed above.

If so, then I'm certainly happy to add this to my knowledge---I find it
very interesting to see how benchmarks can mislead people---but this
still doesn't contradict what I wrote. As an analogy, the statement "As
far as I know, nobody has ever screwed this up" is not the same as the
statement "Nobody has ever screwed this up"; someone saying "I screwed
this up!" is contradicting the second statement but not the first.

In general the level of sloppiness in this discussion is disappointing.

---Dan

Martin R. Albrecht

unread,

May 20, 2016, 12:44:20 PM5/20/16

to Cryptanalytic algorithms

D. J. Bernstein writes:
> Leo Ducas writes:
>> I compute a cross-over point at dimension b=217,
>> for a cost 2^110.4 and memory 2^58.6.
>
> Now try to visualize 2^58.6 bits of storage. That's on the scale of
> 10000 hard drives, costing something on the scale of $1000000. (Even
> more if the original units are something bigger than bits.)
>
> For the same money you can run a huge GPU cluster, making enumeration
> many, many, many thousands of times faster. See generally the CHES 2011
> paper from Kuo--Schneider--Dagdelen--Reichelt--Buchmann--Cheng--Yang.
> This huge speedup is not taken into account in your comparison.

Hi Dan,

I don’t quite follow the logic here. I get that you’re saying we should
consider the computation cost and the memory cost and should be careful
to scale those right, i.e. storing 2^x bits costs more than doing 2^x
bit operations. However, I don’t quite get the claim that adding a
certain sum of $ to the computational budget for enumeration would make
it faster by a big factor? I guess meant it for the concrete example,
not as a general statement, so here’s my attempt to crunch the numbers
(all pretty handwaving because units are pretty much ignored; that could
make a difference):

def enum2(b):
return 0.270189 * b * log(b,2.0) - 1.0192 * b + 16.10

def HashSieve(b):
return .45*b - 18.5 + 31

def HashSieve_mem(b):
return .27*b

def enum2_cost(b):
return enum2(b) -log(3600.0, 2.0) + log(44,2.0) + log(2.52, 2.0)

def HashSieve_mem_cost(b):
return log(1000000,2) - 58.6 + HashSieve_mem(b)

The first three functions are from an earlier e-mail by Leo,
`enum2_cost` is based on
http://www.iis.sinica.edu.tw/papers/byyang/12158-F.pdf (ignoring units!)

The last function is based on the rough estimate you gave above.

This gives me $2^245 cost for blocksize 217 for enumeration and $2^20
cost for memory for sieving. But /adding/ that budget of $2^20 to the existing
budget of $2^245 wouldn’t make much of a difference as far as I can
tell? Devil might be in the detail, though.

That said, your point about (random!) memory access not being/scaling
free is well taken. I’d still conservatively assume it was when picking
parameters, but establishing a more realistic cost model would
definitely be useful.

signature.asc

Martin R. Albrecht

unread,

May 20, 2016, 2:47:57 PM5/20/16

to Cryptanalytic algorithms

Hi all,

sorry, I screwed up my base for the logarithm, as was pointed out off
list. It should read

def enum2(b):
return 0.270189 * b * log(b) - 1.0192 * b + 16.10

This gives a cost of $2^105 for blocksize 217 for enumeration.

Cheers,
Martin

signature.asc

D. J. Bernstein

unread,

May 21, 2016, 5:43:58 AM5/21/16

to Cryptanalytic algorithms

Martin R. Albrecht writes:
> However, I don’t quite get the claim that adding a certain sum of $ to the
> computational budget for enumeration would make it faster by a big factor?

As I said, the same ~$1000000 lets the attacker purchase a huge GPU
cluster (I mean huge by academic standards---serious attack machines are
much bigger). The CHES 2011 paper convincingly shows that enumeration
(inside BKZ with extreme pruning) parallelizes efficiently across many
GPUs and across the "multiprocessors" inside GPUs.

The important point isn't the particular dollar figure in the paper
(which is heavily inflated by Amazon pricing; any serious attacker will
instead buy his own hardware). The important point is that enumeration
gains almost a factor of P from running on P small processors, for a
wide range of P. Enumeration isn't really bottlenecked by communication.

Sieving is claimed to have the same number of "operations"---but these
"operations" are bottlenecked by random accesses to something like 10000
hard drives. How are these random accesses imagined to be competitive in
speed with local computations inside GPUs? This isn't the same ballpark;
it isn't the same league; it isn't even the same sport!

> I guess meant it for the concrete example, not as a general statement

Larger attacks cause even more trouble for sieving.

The big picture is that the attacker can afford some amount of hardware
(e.g., millions of ASICs), typically measured by the total chip area A,
and can afford to run this hardware for some amount of real time T
(maybe a year, maybe even longer). The critical question is what the
attacker's success chance is within these physical cost limits.

Can the success of a typical academic experiment with tiny A and T, a
small problem on a single computer, be extrapolated to larger A and T
tackling much larger problems? Sometimes, yes, but the literature shows
tremendous variations from one algorithm to another---not simply because
of huge variations in the number of "operations" but because of huge
variations in the cost and parallelizability of those "operations".

The textbook notion of n lg n sorting scalability, for example, is
totally out of whack with the reality observed in heavily optimized
large-scale sorting computations. The cost metric used in textbooks is
indefensible. Papers such as

http://maths-people.anu.edu.au/~brent/pd/rpb055.pdf

showed 35 years ago that realistic cost metrics are compatible with
serious algorithm analysis, and that the textbook analyses don't even
get the asymptotic cost _exponents_ right. Papers such as

https://cr.yp.to/papers.html#nfscircuit
https://cr.yp.to/papers.html#nonuniform
https://cr.yp.to/papers.html#batchnfs

show analogous exponent gaps for various large-memory cryptanalytic
computations. For example, NFS parallelization is constrained by the
heavy communication costs inside sparse linear algebra. Coppersmith's
L^1.638... claim for factoring a single number after precomputation
relies critically on an unrealistic cost metric. The best AT algorithm
known (pre-quantum) is actually L^1.976... for factoring a single number
with or without precomputation, and L^1.704... per number for factoring
a large batch.

Lattice sieving looks like it will suffer significantly in a realistic
cost metric. It's more memory-hungry than NFS; the depth doesn't look
too bad but the overall amount of random memory access is disastrous.
Trying to reduce space seems to really hurt time. I haven't seen any
interesting batch options.

---Dan

Leo Ducas

unread,

May 21, 2016, 2:42:13 PM5/21/16

to Cryptanalytic algorithms, d...@cr.yp.to

Hi Dan,

I have numbered my points. If you wish to keep on with the mudfight, we can argue endlessly about 1-5 (but I think we've reached a fixedpoint, and I'll rest my case). I think points 6 and 7 could lead to a conversation that is actually of scientific interest.

1) Your argument is based on a ballpark estimate, that I still disagree with: I have detailed above why this claim implies a byte/cycle cost ratio of > 2^100 to estimate a best known attack at 2^215 cycles. I get that RAM is more annoying that computation, but by a factor >2^100, that sounds a bit surprising. Again, all this belong to a sci-fi world where classical computational power gets in the >2^200 ops, but somehow, 2^60 bits of memory is deemed too expensive...

2) in the paper, you give a lot of room the the meet-in-the middle attack and its hybrid version. Aren't those attack also very memory expensive ? Why not discarding it on the same ground ?

3) A quick remark on sieving, not sure if it re-qualify them to your eyes: in its simplest versions, which require less memory, sieve does not require random access memory. Memory is accessed sequentially. Even the more evolved one can obviously be adjusted to access memory in large sequential choices.

4) None of this discussion justify choosing the wrong model for the cost of enumeration ( 2^(b^2) vs 2^(b log b)), which inaccurately represent the state of the art (which has been my point since the beginning. Not trueness of the claim, but accuracy of the analysis). Why not simply addressing this minor issue rather than arguing about something else until exhaustion ?

5) My point is not to argue your scheme is insecure. The point is that in term of security, the proposed parameters does not compare to what it is compared to, even if your gut feeling is that the other proposal widely overshot security. Both the Newhope and NTRU papers contain other parameters that are more comparable to NTRUprime 9829^739. If this is not a willingful mistake, then it should be addressed. Otherwise, this is just an advertisement trick. We all can advertise better efficiency if we lower parameters, but making such a competition only ends up in making every new proposal less secure than the previous one. The obvious rule of that game is that only comparable things are compared, and the desired security level is debated and decided independently of the choice of the best proposal.

6) Rather than advertising for efficiency, I would find it more relevant to speak a bit more about the properties of that ring: A) give the discriminant of its associated number field, B) is this ring the maximal order of that numberfield. C) how twisted is the map from the coefficient domain to the complex embeddings. All those information are relevant to assess the possibility of algebraic attacks (especially C, in the light of [https://eprint.iacr.org/2016/239.pdf]).

7) I am a bit confused by the nature of the defenses, in particular against the attack Campbell-Groves-Shepherd. Once a good basis of the log-unit lattice is known, is the situation so much different than for the cyclotomics ? In the cyclotomic case, such a basis is known to everyone. But in a non-uniform attack model, shouldn't we just assume that an attacker simply knows this basis even for other rings (unless we change the ring for each instance ...) ? This is not a rhetorical question, I am not that familiar with non uniform attack settings. In particular, I am not challenging the fact that getting away from cyclotomics is relevant contribution, and that it is a good fallback option to have if cyclotomic start showing exploitable weaknesses (in the way they are used for reasonable crypto, not overtstreched variants with huge approximations factors).

D. J. Bernstein

unread,

May 21, 2016, 9:02:00 PM5/21/16

to Cryptanalytic algorithms

Leo Ducas writes:
> 1) Your argument is based on a ballpark estimate, that I still
> disagree with: I have detailed above why this claim implies a
> byte/cycle cost ratio of > 2^100 to estimate a best known attack at
> 2^215 cycles.

You have not pointed to any attacks that can be plausibly argued to have
actual cost below 2^215.

What you're doing is taking an explicitly labeled _underestimate_ of the
the cost of one attack---an underestimate that ignores huge polynomial
factors _and_ huge memory-access costs _and_ huge parallelization
difficulties---and demanding endorsement of your analogously severe
_underestimate_ of the cost of another attack, blatantly misrepresented
as the _actual_ cost of that attack.

> all this belong to a sci-fi world

Yes, we're talking about extremely large attacks, and the difficulty of
carrying out large experiments is yet another reason to be cautious in
making predictions; but you're making much more obvious mistakes.

> but somehow, 2^60 bits of memory is deemed too expensive...

Huh? Where did I ever say that 2^60 bits was "too expensive"? I can't
even figure out what you think "too expensive" might mean here.

What I actually said was that this (as hard drives) is on the scale of a
million dollars, and that using this amount of hardware for sieving
would be extremely slow (for reasons I explained), while using the same
amount of money for enumeration would be much faster (for reasons I also
explained), so your claim of parity is wildly inaccurate.

If you want people to believe that you care about accurate security
estimates, it might help to quote and respond to what I actually wrote
on this topic, and try to identify any remaining points of disagreement,
instead of fabricating strawman arguments.

> 2) in the paper, you give a lot of room the the meet-in-the middle
> attack and its hybrid version. Aren't those attack also very memory
> expensive ?

Yes, there is significant overhead, so the attack costs are going to be
even higher than our current estimates. Figuring out the details will
require adapting Sections 5.3 and 6.3 of

http://people.scs.carleton.ca/~paulv/papers/JoC97.pdf

to the context of https://eprint.iacr.org/2016/177.pdf.

> Why not discarding it on the same ground ?

Huh? What's the alternative to the hybrid attack supposed to be?

The reason for discarding sieving is the lack of evidence that sieving
can beat enumeration for cryptographic sizes. The memory consumption of
sieving is one of many contributing factors.

> 3) A quick remark on sieving, not sure if it re-qualify them to your
> eyes: in its simplest versions, which require less memory, sieve does
> not require random access memory. Memory is accessed sequentially.

Memory is also accessed sequentially in most multiplication algorithms,
and yet the Brent--Kung area-time theorem says that n-bit multiplication
requires time Omega(n/A^(1/2)). The fundamental issue is a lack of
locality---a communication graph that's hard to cut.

> 4) None of this discussion justify choosing the wrong model for the
> cost of enumeration ( 2^(b^2) vs 2^(b log b)), which inaccurately
> represent the state of the art

The NTRU Prime paper quotes the estimate from [36]. An author of [36]
has spoken up here to point out the justification for his estimates and
to question the accuracy of the changes that you "insist" upon. Everyone
agrees that b log b is asymptotically correct, but you're making an
unjustified leap from the asymptotic range to the cryptographic range.

> Why not simply addressing this minor issue rather than arguing about
> something else until exhaustion ?

As an analogy, imagine someone running around insisting that everyone
describe matrix-multiplication costs as multiples of n^2.37...---even
for normal matrix sizes where larger-exponent methods are faster---and
claiming that anyone who doesn't comply is "inaccurately representing
the state of the art". Is it really so hard to understand that switching
to the asymptotic formula can _lose_ accuracy for sizes of interest?

> 5) My point is not to argue your scheme is insecure. The point is that
> in term of security, the proposed parameters does not compare to what
> it is compared to, even if your gut feeling is that the other proposal
> widely overshot security.

I have no idea what you think you're objecting to. The NTRU Prime paper
explicitly targets the quite reasonable goal of >2^128 post-quantum
security, and explicitly compares performance to other proposals that
target the same goal.

> Both the Newhope and NTRU papers contain other parameters that are
> more comparable to NTRUprime 9829^739.

Really? Where? For your new "JarJar" proposal you make a "claim of 94
bits of post quantum security", which doesn't qualify. For NTRU we might
take 743 instead of 787, but this is still around 100000 cycles for
encryption (mostly multiplication) without timing-attack protection.

For JarJar you also assert that "It is quite likely that a more
aggressive analysis would lead to a claim of 128 bits of security
against the best known attacks". This is a fascinating superposition of
wanting people to think you have 2^128 security and at the same time
wanting to be able to say "We never claimed 2^128 security!" if anyone
comes up with a faster-than-2^128 attack.

> 6) Rather than advertising for efficiency, I would find it more
> relevant to speak a bit more about the properties of that ring: A)

> give the discriminant of its associated number field B) is this ring

> the maximal order of that numberfield

Sage will immediately compute the discriminant of the polynomial, around
2^7042 in absolute value. There aren't any small squared prime factors
so this is practically guaranteed to be the discriminant of the number
field, also telling you the maximal order; I haven't thought about
whether this is provable. Anyway, it's not particularly difficult to
work with orders that are non-maximal, or not known to be maximal.

> C) how twisted is the map from
> the coefficient domain to the complex embeddings.

The roots of x^739-x-1 aren't far from the roots of x^739-1. Feel free
to work out whatever numerical features you're interested in.

> All those information are relevant to assess the possibility of
> algebraic attacks (especially C, in the light of [https://
> eprint.iacr.org/2016/239.pdf]).

That paper is attacking RLWE parameters for which the noise is always 0
in the first position, so each RLWE sample leaks a linear equation for
the secret. (Actually even more than the first position, but this is
overkill.) We focus on NTRU---which doesn't simply give away samples in
the first place---and of course we add noise at every position.

I'm not sure why you're calling this attack "algebraic": it's all about
the limited noise, not about any algebraic features of the ring. Our
recommendations regarding the choice of ring don't magically take away
higher-level cryptosystem issues such as adding enough noise.

> 7) I am a bit confused by the nature of the defenses, in particular
> against the attack Campbell-Groves-Shepherd. Once a good basis of the
> log-unit lattice is known

Computing _any_ basis of the log-unit lattice is a famous problem in
computational algebraic number theory. The NTRU Prime paper quotes
Cohen's classic book [23, page 217] calling this one of the five "main
computational tasks of algebraic number theory". For _almost all number
fields_ the fastest solutions known are generalizations of NFS, which is
subexponential but not very fast. Even worse, the basis produced by NFS
isn't short; reducing to a _short_ basis takes exponential time.

The only fast methods known for finding a short basis, or merely _some_
basis, or merely a basis of a small-index sublattice, are exploiting
extremely special algebraic features of number fields with small Galois
groups---producing cyclotomic units, for example, and elliptic units.
The NTRU Prime paper recommends against fields with small Galois groups.

> But in a non-uniform attack model, shouldn't we just assume that an
> attacker simply knows this basis even for other rings

The real-world attacker certainly doesn't care about oversimplified
theoretical models, so why should the competent cryptanalyst care?

Sure, there are theoreticians who incorrectly equate knowledge with
existence, and who therefore conclude that we know collisions in
SHA-512, and that all of the deployed protocols relying on collision
resistance of SHA-512 are broken. In the real world, however, we _don't_
know collisions in SHA-512; these protocols _aren't_ broken; and I'm
glad that the people engaging in serious scientific study of collision
resistance of scaled-down SHA-512, SHA-1, etc. aren't listening to
confused theoreticians.

For many years people thought that this modeling problem was limited to
collision resistance. However, Koblitz and Menezes noticed that the
model also wasn't justified for ciphers such as AES, and then came up
with an example making clear (under standard conjectures) that the model
was flat-out wrong for low-probability attacks against ciphers. Lange
and I came up with many more examples, going beyond low probability and
going beyond ciphers:

https://cr.yp.to/papers.html#nonuniform

Do you yell at people who claim that AES-128, NIST P-256, DSA-3072,
RSA-3072, and SHA-256 take 2^128 "operations" to break by known
pre-quantum attacks? This paper shows beyond a reasonable doubt that all
of these claims are wrong in non-uniform models. The problem is that the
models are flawed; it's obvious that they deviate from reality, and what
this paper shows is that this gap can't simply be ignored.

---Dan

Leo Ducas

unread,

May 22, 2016, 5:28:03 AM5/22/16

to Cryptanalytic algorithms, d...@cr.yp.to

If you want people to believe that you care about accurate security
estimates, it might help to quote and respond to what I actually wrote
on this topic, and try to identify any remaining points of disagreement,
instead of fabricating strawman arguments.

I showed the cross-over point is at dim ~200, for memory 2^60 and time 2^100, and
you discarded it. I have given precise estimates of the time and memory cost of
hash-sieve earlier and compared it to enumeration, for ``the cryptanalytic relevant
dimensions'' used in the cryptanalysis of your parameters. You've ignored them.
I have to repeat myself:

enum(310) = 189.7
enum(330) = 207.1

sieve(310) = 152.0, mem_sieve(310) = 83.7
sieve(330) = 161.0, mem_sieve(330) = 89.1

Those numbers extrapolated from [http://eprint.iacr.org/2014/744.pdf], and the implementation
not optimized. The numbers looks even better in [http://eprint.iacr.org/2015/041.pdf].

There is a gap of more than 2^100 between time (cycles) and memory (bytes), but you still
found too irrelevant to even respond to it. But sure, I am the one responsible for stalling the
discussion.

> 2) in the paper, you give a lot of room the the meet-in-the middle
> attack and its hybrid version. Aren't those attack also very memory

> expensive ? Why not discarding it on the same ground ?

Huh? What's the alternative to the hybrid attack supposed to be?

Pruned Enumeration ? Duh ?
http://link.springer.com/chapter/10.1007/978-3-642-36095-4_19

The only fast methods known for finding a short basis, or merely _some_
basis, or merely a basis of a small-index sublattice, are exploiting
extremely special algebraic features of number fields with small Galois
groups---producing cyclotomic units, for example, and elliptic units.
The NTRU Prime paper recommends against fields with small Galois groups.

I got that, yes, thanks. In this light, do you see any deep algebraic reason why X is
always a unit in those rings, or is it just about re-writing X^p = X - 1 (again, a non
rhetorical question, I'm hoping to learn something, not to argue about claims here).

--Leo

Leo Ducas

unread,

May 22, 2016, 6:17:57 AM5/22/16

to Cryptanalytic algorithms, d...@cr.yp.to

PS: It is the 5-th time I receive apologies from your qsecretary program. It starts
feeling like spam.

`` In the meantime, we're all suffering because of a few inconsiderate people.

Sincerely,
The qsecretary program

P.S. Professor Bernstein has asked me to convey his own apologies to you
if you're someone he knows. I'm sure he'll tell me to accept subsequent
messages from you without confirmation. ''

Christopher J Peikert

unread,

May 22, 2016, 10:30:56 AM5/22/16

to Leo Ducas, Cryptanalytic algorithms, D. J. Bernstein

The only fast methods known for finding a short basis, or merely _some_
basis, or merely a basis of a small-index sublattice, are exploiting
extremely special algebraic features of number fields with small Galois
groups---producing cyclotomic units, for example, and elliptic units.
The NTRU Prime paper recommends against fields with small Galois groups.

I got that, yes, thanks. In this light, do you see any deep algebraic reason why X is
always a unit in those rings, or is it just about re-writing X^p = X - 1 (again, a non
rhetorical question, I'm hoping to learn something, not to argue about claims here).

A small observation on this point: we have

1 = x^p - x = x(x^{p-1} - 1) = x \prod_{d | (p-1}} \Phi_d(x),

where the product runs over all divisors d of (p-1), and \Phi_d(x) denotes the d'th cyclotomic polynomial.

Therefore, we have at least a handful of units x, \Phi_1(x)=x-1, \Phi_2(x)=x+1, etc. -- and the smoother p-1 is, the more units we have. A quick experiment with Sage confirms that their log-embeddings are all "short," and almost all of them are linearly independent (i.e., the units themselves are multiplicatively independent).

Of course, this is far short of a full generating set of the unit group (equivalently, a basis of the log-unit lattice) or a small-index subgroup, so by itself this is nowhere near enough to run a CGS-style attack. But when we encounter statements like

Computing _any_ basis of the log-unit lattice is a famous problem in
computational algebraic number theory. The NTRU Prime paper quotes
Cohen's classic book [23, page 217] calling this one of the five "main
computational tasks of algebraic number theory"

we should remember that we are not dealing with the *general* problem of unit group computation (i.e., on arbitrary number rings), but rather a very specific one in rings of the form Z[x]/(x^p-x-1) for prime p. As seen above, it is easy to find at least a few short units in such rings; I don't know whether that's the case for arbitrary number rings or not.

We should also remember that CGS-style attacks -- which are why we are discussing the unit group in the first place -- rely quite crucially on having a principal ideal of the ring that is also *highly atypical*, in that it is promised to have an "unusually short" generator (relative to the ideal's algebraic norm). (This atypicality is shown in [Section 6, http://eprint.iacr.org/2015/313 ].) None of Ring-LWE, NTRU, or NTRU Prime give out principal ideals at all, much less ones with short generators. So while the question of computing short units is certainly interesting in its own right, it's currently not known whether it means anything for the security of these systems.

Sincerely yours in cryptography, Chris

D. J. Bernstein

unread,

May 22, 2016, 11:39:48 AM5/22/16

to Cryptanalytic algorithms

Leo Ducas writes:
> > > 2) in the paper, you give a lot of room the the meet-in-the middle
> > > attack and its hybrid version. Aren't those attack also very memory
> > > expensive ? Why not discarding it on the same ground ?
> > Huh? What's the alternative to the hybrid attack supposed to be?
> Pruned Enumeration ? Duh ?

The hybrid attack already includes a user-tunable amount of enumeration
inside basis reduction. The literature on the attack makes clear that
balancing basis reduction with a mitm stage is much more effective than
skipping the mitm stage.

I _do_ expect a thorough analysis of large-scale parallelization to
change the balance between the stages, increasing total costs somewhat,
but I don't expect it to shrink the mitm stage to zero. As an analogy,
the memory-hungry linear-algebra stage in NFS gets smaller when all
costs are accounted for, but it certainly doesn't disappear.

> enum(310) = 189.7

> sieve(310) = 152.0, mem_sieve(310) = 83.7

You are incorrectly equating expensive operations with cheap operations.
Waiting for 2^152 accesses to 2^83.7 faraway bits of memory takes more
physical time _and_ more hardware _and_ more energy than carrying out
2^129.7 enumeration steps on each of 2^60 small parallel processors.

> There is a gap of more than 2^100 between time (cycles) and memory
> (bytes), but you still found too irrelevant to even respond to it.

Actually, I've patiently explained your mistake in considerable detail.

Maybe brevity would help. You need to learn the fundamental concept of
price-performance ratio---specifically AT for computer hardware; try the
Brent--Kung and van Oorschot--Wiener papers. You also need to learn to
treat different units (in the physics sense) as a warning sign to help
you avoid nonsensical additions and subtractions, such as subtracting
price from performance.

---Dan

Leo Ducas

unread,

May 22, 2016, 3:09:13 PM5/22/16

to Cryptanalytic algorithms, d...@cr.yp.to

You are incorrectly equating expensive operations with cheap operations.
Waiting for 2^152 accesses to 2^83.7 faraway bits of memory takes more
physical time _and_ more hardware _and_ more energy than carrying out
2^129.7 enumeration steps on each of 2^60 small parallel processors.

You are (once again) incorrectly assuming sieving requires Random Memory Access, and
seems ignorant of the state of the art in parallelizing sieving [http://eprint.iacr.org/2014/880.pdf].
In this very specific algorithm, communication between nodes grows proportionally to
the memory, not proportionally to the number of operations: no overhead factor is not
to be multiplied to the number of operations, but an additive one which is proportional
to memory :

Cost = Time + constant * memory.

where the constant comes from physics and is expressed in cycles/byte. Do you claim
this constant to be greater than 2^100 ?

-- Leo

D. J. Bernstein

unread,

May 22, 2016, 3:35:19 PM5/22/16

to Cryptanalytic algorithms

Christopher J Peikert writes:
> We should also remember that CGS-style attacks -- which are why we are
> discussing the unit group in the first place

No.

I believe that I'm entitled to an acknowledgment of the actual history
here. I think the facts also nicely illustrate the cryptographer's job
of protecting against _future_ attacks---as also illustrated by the 1996
Dobbertin--Bosselaers--Preneel recommendation to switch away from MD5,
the 2005 ECRYPT recommendation of prime fields for DL, etc.

https://cr.yp.to/talks.html#2013.07.18 briefly mentioned unit-group
computation etc. as a strategy to attack "NTRU, Ring-LWE, FHE", and said
"I think NTRU should switch to random prime-degree extensions with big
Galois groups". This was more than a year before the CGS paper.

https://blog.cr.yp.to/20140213-ideal.html said much more about these
computations, and certainly wasn't limited to NTRU---in fact, I took SV
as a case study. I issued a general warning about automorphisms:

Here's what's particularly troublesome: _The structures used in the
"proofs of security", such as automorphisms, are also some of the
structures exploited in this attack._ This is not an isolated
phenomenon: I see many examples where the single-minded pursuit of
"proofs of security" adds dangerous structures to cryptographic
systems and ends up compromising actual security. Any competent
cryptographer will _pay attention to the cryptanalytic algorithms_
and recommend that cryptographic users avoid these dangerous
structures.

At the end of the blog post I specifically suggested using x^p-x-1,
which (like almost all degree-p polynomials) has largest possible Galois
group S_p (i.e., automorphisms that are as hard as possible to perform
computations with). This was still several months before the CGS paper.

The new rings subsequently built up an excellent track record, blocking
new attacks against

* SV, Gentry, GGH, etc. by CGS et al., and
* "overstretched NTRU", YASHE, etc. by ABD/CFLMR.

This is far from coincidental---the recommendations against the old
rings were explicitly motivated by worrisome mathematical operations
that have a huge overlap with the core operations exploited by the
successful attacks.

I fully realize that, in Cryptography According To Chris, this track
record is _not_ a cryptographic success story. What matters for Chris is
the extreme question of what has already been broken. Specifically, he
believes that from a security perspective

* it was wrong to recommend changing rings for SV _before_ SV was
shown to be broken with the old rings,

* it was wrong for people in 1996 to recommend against MD5,

* it was wrong for people in 2005 to recommend prime fields for DL,

* it is wrong for people to recommend prime fields for ECDL, and

* it is wrong for us to recommend changing rings for NTRU.

Please feel free to correct me, Chris, if I've misunderstood your
position regarding these five recommendations.

> None of Ring-LWE, NTRU, or NTRU Prime give out principal ideals at
> all, much less ones with short generators.

This is a weird claim on multiple levels.

The whole reason that these SV attacks are looking for generators is
that the SV secret can be viewed as a generator of an ideal that's
publicly known---and of course the details of this view force the
generator to be short. The obvious strategy to adapt this to other
systems is to view other secrets as short generators of public ideals.
The conceptual gap between "principal ideal" and "principal ideal with
short generator" simply isn't relevant to any of these attacks.

Furthermore, for _most_ number fields, the class number h is small---
i.e., the chance 1/h of an ideal being principal is significant. Having
h=1 is quite common, and then _all_ ideals are principal. It would be
entirely unsurprising for x^739-x-1 to have h=1, for example. So if
there _are_ ideals popping up randomly somewhere then for NTRU Prime one
can reasonably expect them to be principal.

The big exceptional case is fields that are "CM" in the strict sense,
i.e., totally complex extensions of totally real fields---for example,
cyclotomics. This structure forces h to be large, separating ideals into
a huge number of different classes.

> we should remember that we are not dealing with the *general* problem
> of unit group computation (i.e., on arbitrary number rings), but
> rather a very specific one in rings of the form Z[x]/(x^p-x-1) for
> prime p.

I'm reminded of people saying "Oooh, this Curve25519 isn't the standard
NIST P-256 curve! Has it really been studied? And look at how small
those coefficients are!"

Yes, of course there are billions of ways to draw superficial lines
between Curve25519 and NIST P-256. What these people don't understand is
that ECC cryptanalysts are looking systematically at _all_ curves---
looking for _any_ attack that can possibly break even _one_ curve not
previously known to be broken. Some techniques end up working for all
curves; some techniques end up working only for quite unusual curves;
some techniques are in between.

Similarly, the extensive literature on unit-group computation includes
some techniques that work for all fields, some techniques that are
limited to quite unusual fields, and some techniques in between.

Cyclotomic units are an interesting example. I freely admit that I had
totally missed this literature before CGS pointed out its relevance, but
at the same time this illustrates (1) that people _do_ study techniques
beyond the generic case and (2) that the classic structures studied in
algebraic number theory seem to be closely tied to what attackers can
exploit. Notice that I was already issuing warnings against cyclotomics
even without realizing how far the automorphisms could be pushed.

> As seen above, it is easy to find at least a few short units in such
> rings; I don't know whether that's the case for arbitrary number rings
> or not.

Exercise: Learn something about the slightly more general theory of
S-units (start from, e.g., Chapter 7 of Cohen's followup book). Then
figure out how, given _any_ number ring, you can trivially write down a
few independent S-units for suitable S.

Writing down _a basis_ is much harder, and is a famous open problem. As
I said before, the only fast solutions known rely on extremely special
algebraic features of number fields with small Galois groups.

Automorphisms play a huge theoretical and computational role in the
massive literature on algebraic number theory---and in an increasing
number of successful attacks against ideal-lattice-based cryptography.
The distinction between units and S-units is minor from theoretical and
computational perspectives, and also doesn't seem relevant to attacks.

Is it conceivable that there's a break of, say, SV using \Z[x]/(x^p-x-1)
that wouldn't apply to other rings? Sure, it's conceivable, but what
exactly is the action plan motivated by this idle speculation? Use many
large coefficients, seriously damaging the tradeoff between efficiency
and security against all known attacks, in exchange for the _hope_ that
this somehow increases security? Switch to YASHE---well, oops, maybe not
YASHE---okay, some other cryptosystem whose security is unclear?

For pure public-key encryption at any particular level of efficiency,
switching from SV to NTRU seems to take away some attack tools, and I'm
sympathetic to the hope that this will save various rings that are
broken in SV. But taking away more attack tools---switching to NTRU
Prime---further reduces risk, and we've now shown that this switch is
eminently affordable.

---Dan

D. J. Bernstein

unread,

May 22, 2016, 4:18:21 PM5/22/16

to Cryptanalytic algorithms

Leo Ducas writes:
> ignorant of the state of the art in parallelizing sieving [http://
> eprint.iacr.org/2014/880.pdf].

Have you actually read the paper? The paper

* takes an old, much more computation-intensive, form of sieving,
estimated at 2^0.48n operations;

* runs it on a cluster that spent amazing amounts of money on
networking and that has vastly less raw computation power than a
similarly expensive GPU cluster; and

* _still_, despite this massively tilted setup, suffers noticeable
communication overhead for a moderate number of cores.

This is perfectly illustrating my point.

> Cost = Time + constant * memory.
> where the constant comes from physics and is expressed in cycles/byte.

Pseudo-scientific babble. Oh, sorry, polite version: I'm not aware of
any realistic cost metrics for parallel computation that would support
an equation of this form for any of the algorithms we're talking about.

---Dan

pole.k...@gmail.com

unread,

May 26, 2016, 1:50:44 PM5/26/16

to Cryptanalytic algorithms, d...@cr.yp.to

Hello,

I am quite baffled by comments such as :

"Asymptotics clearly say
that sieving beats enumeration for _sufficiently large_ dimensions, and
advances in sieving over the last few years seem likely to have reduced

the cutoff. However, as far as I know, everyone who has tried sieving as

a BKZ subroutine in place of enumeration has concluded that sieving is
much too slow to be useful---the cutoff is beyond cryptographically
relevant sizes. "

and also :

"As an analogy, imagine someone running around insisting that everyone
describe matrix-multiplication costs as multiples of n^2.37...---even
for normal matrix sizes where larger-exponent methods are faster---and
claiming that anyone who doesn't comply is "inaccurately representing

the state of the art". "

It seems to me that Dan is evaluating security by :
- taking the fastest implementation in practice (cf. "everyone who has tried", "normal matrix sizes")
- fit the running time using some model (which has possibly nothing to do with the asymptotical complexity)
- evaluate the fit at a point corresponding to the cryptosystem.

While the approach I would use is :
- fix a computational model
- compute the minimum over all known algorithms of their complexity for breaking the cryptosystem
(and in particular, if an algorithm has to multiply matrices of size 2^100, it makes perfectly sense to use really fast matrix multiplication and not Strassen)

I emphasize that fixing a computation model is important ; for example, claiming that sieving is too slow in the AT model (without any evidence*) but analysing the complexity of the mitm in the RAM model does not make much sense (in my opinion).

I am not completely opposed to the first approach (as long as it is amended by fixing a computational model), however the paper should make it clear what the authors mean by "128 bit post-quantum security", since they mean it in an (apparently) unusual way.

* : The basic algorithm has a cost in AT of about 2^(0.415*n). For n=310, this is 129, while enumeration has a cost of 190. Of course, I did not include here the polynomial factor of the sieve, while the constant of enumeration is included.
So ruling out a sieve algorithm for 215 bit of classical security might potentially be feasible, but certainly not by handwaving.

"Sure, it's conceivable, but what exactly is the action plan motivated by this idle speculation? "

Switch to Ring-LWE, which is (almost) as efficient as NTRU and strictly more (provably, practically) secure ?

Finally, about the widely used quadratic fit :

"My understanding is (still) that a quadratic fit is appropriate for this data."

"I'm going to stand by my choice of a quadratic fit since I'm still not convinced that a klog k fit is

realistic for the relevant range of block sizes. It's true that for very large k you will want to switch

to a recursive preprocessing. Once you do, your scaling from that point will be like 2^O(k log k),

but I'm not convinced this is relevant for attacks of cost less than 2^256."

Preprocessing stronger than LLL is used from block size ~ 60.
So unless you are interested in 25 bits of security, it is used.
The reason behind the choice of 2^O(k log k) is that :
- it is the proven asymptotical complexity
- it is the heuristic asymptotical complexity
The reason for not choosing a polynomial of degree 2 :
- The quadratic term your fit gives decreases when you increase the preprocessing
- It is exactly the proven behaviour of the algorithm
- It is exactly the heuristic behaviour of the algorithm
(All these claims are justified in the literature)
Now, the only reason I ever saw for justifying the degree 2 :
- Another (much slower) algorithm solving the same problem takes time 2^O(k^2)

So if you want to convince people that a fit in 2^O(k^2) has anything to do with reality, you will need very good arguments (the fact that it is widely used in the literature does not count).

Finally, a comment on : "We all know that the performance of lattice attacks is still quite poorly understood".
No, I don't. All my (and other's) experiments show that the Gaussian heuristic is extremely precise (for the relevant parameters), and that sieve/enumeration/BKZ algorithms give the predicted basis in the predicted time.
Sure, you need a polynomial algorithm to predict, but you can easily run it.
In the recent years, I have seen only one claim of undocumented behaviour in the literature (for the relevant parameters/algorithms) ; and it will be explained soon.

John Schanck

unread,

May 26, 2016, 9:03:43 PM5/26/16

to pole.k...@gmail.com, Cryptanalytic algorithms

On Thu, May 26, 2016 at 1:50 PM, <pole.k...@gmail.com> wrote:

Finally, about the widely used quadratic fit :
"My understanding is (still) that a quadratic fit is appropriate for this data."

"I'm going to stand by my choice of a quadratic fit since I'm still not convinced that a klog k fit is
realistic for the relevant range of block sizes. It's true that for very large k you will want to switch
to a recursive preprocessing. Once you do, your scaling from that point will be like 2^O(k log k),
but I'm not convinced this is relevant for attacks of cost less than 2^256."

Preprocessing stronger than LLL is used from block size ~ 60.
So unless you are interested in 25 bits of security, it is used.

The reason behind the choice of 2^O(k log k) is that :

- it is the proven asymptotical complexity

Of course, but you have to be precise about what you mean by "stronger."

If you mean preprocessing with BKZ using blocksize O(1), then a quadratic model

is appropriate. See [1] for experiments that justify this. The Asiacrypt 2011 version of

the BKZ-2.0 paper reports cost estimates for enumeration after BKZ-75 preprocessing.

If you want to extrapolate from this data you should use a quadratic fit.

If you mean preprocessing with BKZ of blocksize O(k), then [2] makes a

compelling case that something between quadratic and quasilinear is appropriate.

If you mean quasi-HKZ reduction, or the preprocessing of [1], then a quasilinear

model is appropriate. There's no argument about this.

- it is the heuristic asymptotical complexity

This depends on what kind of preprocessing that BKZ-2.0 does, and unfortunately

this depends on which version of the paper you look at. The progressive BKZ

preprocessing used in the full version of the BKZ-2.0 paper is difficult to analyze,

but after reading [2], I agree that the use of a quasilinear model is reasonable.

I suggest:

BKZ2(k) = 0.12081*k*log2(k) - 0.42860*k

The algorithm of [1] has much better justification for using a quasilinear model.

The authors give

MW15(k) = 0.0225*k*log2(k) + 0.4574*k

[1] https://eprint.iacr.org/2014/569

[2] https://eprint.iacr.org/2014/948

John Schanck

unread,

May 26, 2016, 9:09:09 PM5/26/16

to pole.k...@gmail.com, Cryptanalytic algorithms

Just to be clear, that last estimate from [1] is with BKZ-30 preprocessing.

They suggest that a pruning strategy might reduce the number of nodes enumerated

by a factor of 2^(k/4), so:

MW15Pruning(k) = 0.0225*k*log2(k) + 0.2074*k

They also note that this last formula gives a lower temporal cost than HashSieve until k=1895.

pole.k...@gmail.com

unread,

May 27, 2016, 5:46:23 AM5/27/16

to Cryptanalytic algorithms, pole.k...@gmail.com

"If you mean preprocessing with BKZ using blocksize O(1), then a quadratic model is appropriate."

When the supposed constant varies from 2 (LLL) to 30, it's completely inappropriate to call it constant.
In particular when you can see that the complexity varies a lot between the two, see [Figure 1,2].
What I meant was "relevant algorithms"vwith the "relevant parameters", and they use use blocksize Theta(k).

"The Asiacrypt 2011 version of

the BKZ-2.0 paper reports cost estimates for enumeration after BKZ-75 preprocessing.

If you want to extrapolate from this data you should use a quadratic fit."

But you don't compute short vectors in a lattice of dimension 200 with a BKZ-75 preprocessing… This is a huge loss of time.

"If you mean preprocessing with BKZ of blocksize O(k), then [2] makes a

compelling case that something between quadratic and quasilinear is appropriate."

No, [2] proves the running time is in beta^O(k^2/beta) which for beta=Theta(k) is k^O(k), so something in k^Omega(k) is inappropriate.
[Section 4,2] proposed a model in beta^(a*n*(n-1)/(beta-1)+bn)2^(cn) which is also k^O(k) for beta=Theta(k).

"They suggest that a pruning strategy might reduce the number of nodes enumerated

by a factor of 2^(k/4)"
Well, this rough estimate is valid only under GSA (see Lattice Enumeration Using Extreme Pruning), and GSA is incorrect for highly reduced basis (almost HKZ, see e.g. 2011/198 for the shape of highly reduced basis).
So it makes much more sense to select the values of Chen, who used the precise heuristic, and which imply sieving is faster (in the RAM model) than enumeration for dimension way below 700.

Noah Stephens-Davidowitz

unread,

May 28, 2016, 11:04:36 AM5/28/16

to Cryptanalytic algorithms, leo.d...@gmail.com, martinr...@googlemail.com

Hi John,

Maybe I'm being dumb, but I'm unable to replicate this:

Let's go back to the k^2 vs k log k fit question. If we only consider block sizes between
100 and 400 then the difference |f(k) - g(k)| between
f(k) = 0.270189 k log(k) − 1.0192 k + 16.10 (from [2]), and
g(k) = 0.000784314k^2 + 0.366078k − 6.125 (from [3])
is less than 10. This isn't surprising since they're fit to the same data between 100 and 250.
At k=400 we have f(400) = 256 and g(400) = 266.

In particular, when I plot these two functions in the proposed range, I get wildly different values.

Some of the other estimates that you gave in other posts are much closer together:

So, maybe you copy + pasted the wrong equation?

(I'm just trying to bring some clarity to what is currently a very confusing conversation.)

John Schanck

unread,

May 28, 2016, 11:27:13 AM5/28/16

to Noah Stephens-Davidowitz, Cryptanalytic algorithms, Leo Ducas, Martin Albrecht

On Sat, May 28, 2016 at 11:04 AM, Noah Stephens-Davidowitz <nsd...@gmail.com> wrote:

Hi John,
Maybe I'm being dumb, but I'm unable to replicate this:

Let's go back to the k^2 vs k log k fit question. If we only consider block sizes between
100 and 400 then the difference |f(k) - g(k)| between
f(k) = 0.270189 k log(k) − 1.0192 k + 16.10 (from [2]), and
g(k) = 0.000784314k^2 + 0.366078k − 6.125 (from [3])
is less than 10. This isn't surprising since they're fit to the same data between 100 and 250.
At k=400 we have f(400) = 256 and g(400) = 266.

In particular, when I plot these two functions in the proposed range, I get wildly different values.

The quasilinear formula from the cited paper uses the natural log, I should have noted

that in my email. Try

f(k) = 0.187281 * k * log2(k) - 1.0192 * k + 16.10

Best,

John

Noah Stephens-Davidowitz

unread,

May 28, 2016, 11:33:15 AM5/28/16

to John Schanck, Cryptanalytic algorithms, Leo Ducas, Martin Albrecht

I see. That works.

-Noah

Christopher J Peikert

unread,

Jul 20, 2016, 12:38:00 AM7/20/16

to cryptanalytic-algorithms

Forgive the delay in replying; I had some other homework to do.

On Sun, May 22, 2016 at 3:35 PM, D. J. Bernstein <d...@cr.yp.to> wrote:

Christopher J Peikert writes:
> We should also remember that CGS-style attacks -- which are why we are
> discussing the unit group in the first place

No.

I believe that I'm entitled to an acknowledgment of the actual history
here.

If the objection is to the name "CGS-style attacks," I'm happy with any alternative you'd like for "the folklore technique of using a short basis of the log-unit lattice in an attempt to recover a (hopefully) short generator of a principal ideal." I'll go with "log-unit attacks" for now.

(I think we all agree that it is folklore. As further evidence, I remember encountering and trying the technique in 2005-6, in a doomed attempt to break worst-case approx-Ideal-SVP for cyclotomics and other rings, when first starting in this area. The attempt of course failed, because the technique works very poorly in the worst case, delivering (as we now know) no better than $2^{\tilde{\Omega}(\sqrt{n})}$ approximation factors in prime-power cyclotomics.)

I fully realize that, in Cryptography According To Chris, this track
record is _not_ a cryptographic success story. What matters for Chris is
the extreme question of what has already been broken.

Thanks, but I'll speak for myself. As I wrote in Feb 2015, my view is:

“Lattice-based” cryptosystems are not based on “lattices” per se, but rather on certain computational problems on lattices. There are many kinds of lattice problems, not all of which appear to be equally hard—therefore, not all lattice-based cryptosystems offer the same qualitative level of security. For ideal lattices, the choice of problem matters at least as much as the choice of ring.

... Rings themselves are not “easy/weak” or “hard/strong”—problems and their instances are. The attack [on SV/Soliloquy] shows that specializing a problem too much, or relying on random instances whose distribution we don't really understand, can introduce unexpected weaknesses.

What every successfully attacked ring-based cryptosystem has in common is that it relies on an average-case problem that turns out to be easier than a more general problem the designers had in mind, and had hoped to capture the conjectured hardness of.

In the 18 months since I wrote the above quote, we've seen even more examples of this: slightly subexponential attacks on standard NTRU parameters, highly subexponential (or even polynomial) subfield attacks on "overstretched" NTRU parameters, and insecure Ring-LWE instantiations due to incongruous error distributions.

Meanwhile, cryptosystems formally supported by worst-case approx-Ideal-SVP (and that problem itself) have remained unscathed, in any number ring of sufficient dimension. This track record is why worst-case hardness is so important.

> None of Ring-LWE, NTRU, or NTRU Prime give out principal ideals at
> all, much less ones with short generators.

The whole reason that these SV attacks are looking for generators is
that the SV secret can be viewed as a generator of an ideal that's
publicly known---and of course the details of this view force the
generator to be short. The obvious strategy to adapt this to other

systems is to view other secrets as short generators of public ideals.

Does this "obvious strategy" actually have any hope of working? In a Feb 2015 post you sketched it for NTRU, but without any runtime/correctness analysis, experiments, etc. My subsequent experiments and analysis showed strong evidence of an enormous gap between the public ideal and the principal one generated by the secrets, and that dealing with the gap is much slower than other naive attacks, including brute-force key search. I haven't seen anything further since then.

There is yet another problem with the strategy. Recall that it defines a public-key-dependent, "random" quadratic extension $E$ of the original cyclotomic ring, and then performs a short-generator attack in $E$ . This involves computing a very good basis for the log-unit lattice of $E$ , which has twice the rank of the cyclotomic log-unit lattice. Isn't this the very problem you say is quite hard (except for very special fields like CM ones), and therefore constitutes a defense for NTRU Prime rings? Here $E$ is even more "unstructured" than the NTRU Prime rings -- it's defined by the public key -- so even if we could get ahold of the principal ideal generated by the secrets, how to mount a short-generator attack?

It seems like you want it both ways: "short bases of unit groups are hard to compute" is a defense for NTRU Prime, but is no obstacle when you say you think that NTRU is vulnerable to short-generator attacks (e.g., here, in your Feb 2015 post, and in the above quote).

To be clear, I have always said that I doubt any of Ring-LWE, NTRU, or NTRU Prime are actually vulnerable to short-generator attacks. This opinion could be wrong, but it is at least logically consistent.

> we should remember that we are not dealing with the *general* problem
> of unit group computation (i.e., on arbitrary number rings), but
> rather a very specific one in rings of the form Z[x]/(x^p-x-1) for
> prime p.

Similarly, the extensive literature on unit-group computation includes
some techniques that work for all fields, some techniques that are
limited to quite unusual fields, and some techniques in between.

This is entirely believable. But it still remains the case that: (1) Z[x]/(x^p-x-1) is a very specific class of rings, and (2) we can analytically find at least a few short independent units. This is very different behavior from "random" or "hard" lattices. It's also quite different from what we get with S-units (see below).

This state of affairs informs my position on unit groups:

(1) we should not view the conjectured general-case hardness of computing unit groups, or short vectors/bases thereof, as any meaningful evidence of security for a cryptosystem defined over fixed rings -- especially when their log-unit lattices behave differently from hard lattices;

(2) we should absolutely renounce any cryptosystem that's known/likely to be broken/weak if the attacker happens to have such a basis -- especially if such knowledge results in "all for the price of one" attacks, as appears to be the case for SV/Soliloquy over alternative rings.

> As seen above, it is easy to find at least a few short units in such
> rings; I don't know whether that's the case for arbitrary number rings
> or not.

Exercise: Learn something about the slightly more general theory of
S-units (start from, e.g., Chapter 7 of Cohen's followup book). Then
figure out how, given _any_ number ring, you can trivially write down a
few independent S-units for suitable S.

Neat stuff. (See https://en.wikipedia.org/wiki/S-unit for a quick overview; it's pretty simple.) The general but expensive algorithms in Chapter 7 produce, for a given S, a complete S-unit basis consisting of quite long elements (in the log embedding).

There is a trivial way to get as many short S-units as desired: just repeatedly choose a random short ring element and check if it generates a new prime ideal (it will often enough). The trick is that the set S, and the group of S-units, is then defined by the elements we chose; we therefore know s=|S| short S-units. But the full S-unit group has rank s+r, where r is the rank of the unit group.

The above can be seen as starting with some original lattice (the log-unit lattice), adding to it some short vectors of our own choosing (the logs of our chosen elements), and saying "look, I have some short vectors in the resulting lattice!" -- as you say, trivial. And also quite different from finding short vectors in the original log-unit lattice, which the attacker cannot modify.

For certain friendly S, like one having prime principal ideals of not-too-huge-norm, we can also pretty cheaply find short generators of their inverse ideals; these generators are S-units.

Of course, the main question is whether any of this is cryptanalytically useful. Dan says:

The distinction between units and S-units is minor from theoretical and
computational perspectives, and also doesn't seem relevant to attacks.

Really? The analogous "log-S-unit attack" wouldn't even give you an element of the input ideal, but instead one in some other fractional ideal with arbitrary extra (positive or negative) powers of the ideals in S. This is pretty relevant, to say the least: the non-unit S-units can only make things worse, so we're better off just throwing them out to start with, thus putting us right back in the plain-unit regime.

Can you give some evidence to support the assertion that "the distinction between units and S-units doesn't seem relevant to attacks"? (Not that there's any historical reason to doubt your casually tossed-off comments like these...)

Christopher J Peikert

unread,

Jul 21, 2016, 1:29:40 AM7/21/16

to cryptanalytic-algorithms

Brief addendum:

The distinction between units and S-units is minor from theoretical and
computational perspectives, and also doesn't seem relevant to attacks.

Really? The analogous "log-S-unit attack" wouldn't even give you an element of the input ideal, but instead one in some other fractional ideal with arbitrary extra (positive or negative) powers of the ideals in S. This is pretty relevant, to say the least: the non-unit S-units can only make things worse, so we're better off just throwing them out to start with, thus putting us right back in the plain-unit regime.

The situation is even worse: when $|S| \geq 2$ , the log-embedding of the S-units may not even be discrete, i.e., it is not a lattice (cf. the log-unit lattice).

The simplest example is for the ring $\mathbb{Z}$ with $S = \{2,3\}$ , with generators 2,3 of the S-unit group. Then the log-embedding of this group is given by all integer combinations of log(2) and log(3). This is not a lattice: there are combinations arbitrarily close to zero. This non-discreteness extends to the general case, because the rank of the S-unit group exceeds the dimension of the log-embedding space.

When the original log-unit attack involves decoding a lattice, and moving to S-units leaves you without even a lattice to decode, I'd say that's a pretty relevant distinction.

One can recover some kind of discreteness by adding extra "coordinates" to the log-embedding, corresponding to the non-Archimedian "places" for the prime ideals in S. (This is all standard.) But this new space is non-Archimedian, so it has little if any connection with the geometry of the ring, where we are trying to find a short element of a given ideal.

Ian M

unread,

Feb 12, 2017, 6:48:03 AM2/12/17

to Cryptanalytic algorithms

After reading the discussion in this thread, along with reviewing the NTRU Prime paper I wanted to offer my perspective.

My general response to the paper revolves around the KEM proposed, along with the comparison to New Hope and focus on low-latency performance optimizations. I spent a fair amount of time reviewing the New Hope paper, and had a few concerns based on their low-latency focus along with the unauthenticated key exchange. My main contention with the New Hope implementation of Chris' KEM from "Lattice Cryptography for the Internet" is that New Hope seems to deviate from Chris' KEM by using a cross-rounding function that might produce a determinancy in the output. I think NTRU Prime introduces a useful feature in the ability to layer the KEM to offer forward secrecy, though.

Regarding the conversations in this thread, a few comments "stuck out" to me.

Chris wrote: "None of Ring-LWE, NTRU, or NTRU Prime give out principal ideals at all, much less ones with short generators. So while the question of computing short units is certainly interesting in its own right, it's currently not known whether it means anything for the security of these systems."

I agree that this question is interesting. I also think it presents a worthwhile direction for research.

Dan wrote: "The whole reason that these SV attacks are looking for generators is that the SV secret can be viewed as a generator of an ideal that's publicly known---and of course the details of this view force the generator to be short. The obvious strategy to adapt this to other systems is to view other secrets as short generators of public ideals. The conceptual gap between "principal ideal" and "principal ideal with short generator" simply isn't relevant to any of these attacks."

I'm a bit confused here, to be honest. Are you suggesting that these SV attacks could compute the SV secret, given the secret is itself the generator of the known ideal? If so, are you also saying whether the principal ideal has a short generator or not is inconsequential? The second portion of this question might be in line with Chris' co-authored article with Joel Alwen from June, 2009 on "Generating Shorter Bases for Hard Random Lattices" which states that using a Hermite Normal Form leads to a lattice in a 'least revealing' form. Chris, of course, is better equipped to address that than I am (though he seems to be saying that in the reply I've quoted above). If I'm reading his article accurately, the results of applications using a short basis is actually independent of a particular generating algorithm. My interpretation is that the results apply to short bases, as opposed to relying on specific generators. Given that the article is written w.r.t. short generators, it seems as though their results demonstrate the associated hardness of randomized lattices using shorter bases. It doesn't seem likely then, that compromising a principal ideal or a principal ideal with short generator are distinct (in terms of this attack), but neither seem susceptible to the attack I think you are proposing.

Dan Wrote: "Automorphisms play a huge theoretical and computational role in the massive literature on algebraic number theory---and in an increasing number of successful attacks against ideal-lattice-based cryptography. The distinction between units and S-units is minor from theoretical and computational perspectives, and also doesn't seem relevant to attacks."

I agree, more or less, with what I see as the concept behind your statements on automorphisms. I'm very curious as to whether an algebraic attack using some form of mapping could compromise a scheme that relies on secure automorphisms.

Chris Wrote: "One can recover some kind of discreteness by adding extra "coordinates" to the log-embedding, corresponding to the non-Archimedian "places" for the prime ideals in S. (This is all standard.) But this new space is non-Archimedian, so it has little if any connection with the geometry of the ring, where we are trying to find a short element of a given ideal."

I'm wondering if a neighborhood embedding, implied in Nearest Codeword Problems and Bounded Distance Decoding would allow for a more straightforward attack. A variation of coordinate shifts to grant a stronger attack seems plausible, or at least somewhat intuitive. I believe this is something Zhenfei covers in the article posted, titled "Analyzing subrings for some R-LWE and NTRU instances." As they've said though, the attacks in this paper fail (I'm specifically referring to Section 6, Eqs. 15 and 16).

v/r,

-Ian

On Sunday, May 15, 2016 at 9:22:10 AM UTC-5, Leo Ducas wrote:

I was recently pointed to the long awaited paper about NTRUPrime: https://ntruprime.cr.yp.to/ntruprime-20160511.pdf

I find the proposal intriguing, and worth consideration. Unfortunately, the current concrete security analysis is bogus and the security claims inaccurate. A quick counter-analysis suggests the security of the proposal is overestimated by about 75 bits.

The first mistake is equation (4) (top of page 19) in Estimate(β,n,p). The chosen model for the cost of enumeration doesn't match the asymptotic of the BKZ/HKZ types of algorithm. The given model is in 2^O(β^2), while the best known algorithm (Kannan's SVP/HKZ algorithms) runs in time 2^O(β log β). Provable yet reasonably efficient version of Kannan's algorithm were recently studied by Micciancio and Walter [http://eprint.iacr.org/2014/569]. More generally, even BKZ+Enum has (heuristically ?) cost 2^(β log β) when the blocksize is appropriately chosen: this is discussed in [https://eprint.iacr.org/2015/046.pdf].

The second mistake stems from the conventional misconception that sieving algorithms are only useful to solve SVP. The authors only consider the cost of Sieving on the full lattice (or maybe only half of it, since the lattice natural has dimension 2p, but the cost is take to be 2^{.292p} in Alg 1.). This is not true: sieving can be used in replacement for enumeration inside BKZ. This brings the cost of BKZ down to poly(n)*2^(.292 β) (where the poly may be make linear with various heuristic tricks as progressive-BKZ, I personally recommend considering this poly(n) factor to 1 for any security claim consideration.).

Tweaking the script PQsecurity.py of Newhope*, which do take account for sieving inside BKZ, one obtains that BKZ algorithm will find the secret key via the ``primal attack'', when ran with blocksize β=480 on the lattice constructed from m=600 equations mod q out of the 739. This translate to λ = .292β = 140 bits of security, or a slightly more if one is ready to make a wild guess** on the polynomial factors.

In conclusion, the schemes offers a fair amount of security according to current classical attacks, but far below the 215 bits claimed. In particular, it offers less security than ntruees787ep1 (but this is obvious since the dimension is smaller, the error density similar, and the modulus larger) and much less security than NewHope (it is more similar to JarJar).

I do understand the intent of minimizing the presence of algebraic handles, and definitely think there is some value of an alternative to cyclic / cyclotomic rings. But with respect to ``known attack'' rather than ``yet to be discovered attacks'', the NTRUprime proposal is significantly less conservative than what it is compared to, so I am not sure the comparison of Fig 1.1 is as significant as one would hope.

When re-parametrized properly to compare NTRUprime to other schemes, it shouldn't be surprising to have a gap of performances with cyclotomic. Considering how fast all this already is, a factor 2x on time could be considered perfectly acceptable. It could also be possible to optimize the NTRUprime scheme a bit: Theorem 2.1 is an overkill, and tighter bounds could certainly be established probabilistically.

-- Leo

* The currently available script and paper [https://eprint.iacr.org/2015/1092.pdf] are not accurate (fortunately it underestimate the cost of the dual attack), in 6.4 equation (2) should instead read

- 2 π^2 τ^2 ≥ ln (ε/4)

A repaired script is attached, and we are doing our best to push the final version of NewHope in the shortest delays.

** Part of the hidden factor of the cost of sieving could be amortized inside BKZ, by some recycling trickery. I advise against making wild guesses.

Reply all

Reply to author

Forward