5650 views

Skip to first unread message

Apr 4, 2022, 12:38:57 PM4/4/22

to pqc-...@list.nist.gov

- בלמ"ס -

Dear PQC researchers,

The Center of Encryption and Information Security (MATZOV) of the IDF has conducted an internal audit of leading Post-Quantum cryptographic (PQC) schemes, focusing on the Learning With Errors and Rounding problems.

After consultations with NIST over the last few months – we have decided to release the audit as a Technical Report available for public review.

https://doi.org/10.5281/zenodo.6412487

Our report presents several improvements to the dual lattice attack, which induce a noticeable reduction in the security estimation for Kyber, Saber and Dilithium, bringing them below the required threshold.

The report does not intend to provide a complete analysis of all post-quantum candidates, nor to recommend usage of specific algorithms. Rather, this publication is meant to share advances in the cryptanalysis of lattices which we believe to be relevant to the academic research in the field.

We acknowledge the remarkable work done by NIST in the process and its impact – creating interest in the post-quantum field and promoting new cryptographic schemes.

A prudent approach for these schemes is recommended, as research in the field is constantly evolving and much remains unstudied. Therefore, as a contribution to the community, the report includes further research ideas which we deem interesting.

MATZOV, IDF

Apr 4, 2022, 1:05:08 PM4/4/22

to מצו״ב, pqc-...@list.nist.gov

Good afternoon (evening) MATZOV,

Thank you very kindly for releasing this paper! I'm sure it must have been a challenge. I'm looking forward to reading it in much detail.

In the paper, you highlight in multiple places that your analysis occurs in the RAM model. There are several variables involved in assessing the cost of an actual attack. Have you considered "more realistic" memory-costing in your analyses?

There is a long history of discussion on this forum this past summer and fall about what the proper way to model attacker memory costs are. Broadly, there is the RAM model as compared to a variety of so-called "local" models. Some prominent examples of these alternative/local models include the 2D nearest neighbor model (often called the Square-Root model) and the 3D nearest neighbor model (often called the Cube-Root model). As a further example, the NIST PQC call for proposals defined a version of the (quantum) circuit model involving a MAXDEPTH parameter for gate-operations in series.

Note that this question is particularly relevant as the defining cost-metric (the computational hardness vs. AES) involves essentially no memory costs, whereas lattice sieving historically involves high memory costs.

One approach to addressing alternative models of memory-costing would be to define a single 'trade-off value' between max memory and computational properties (width, depth, network topology of the cryptanalytic device, etc.) and simply add some number of bit operations to the bit-complexity of algorithms in the RAM model. Typically, jointly settling on such a value is a difficult task, with many unknowns. See, for example, the discussion in the Kyber Round 3 spec that gives ranges of bit-complexities (either positive or negative) based on various uncertainty-factors. Do you have a preferred view on how to do this generic analysis?

However, more importantly: Do you find that any of your new algorithmic approaches have a concrete cost that would differ from a generic model-to-model analysis? (I hope to read through the work and answer "No!" but perhaps you have an opinion you could share now.)

Thank you for your insightful work and significant contribution to the science.

Best regards,

--Daniel Apon

Cryptography Lead, the MITRE Corporation

da...@mitre.org

Thank you very kindly for releasing this paper! I'm sure it must have been a challenge. I'm looking forward to reading it in much detail.

In the paper, you highlight in multiple places that your analysis occurs in the RAM model. There are several variables involved in assessing the cost of an actual attack. Have you considered "more realistic" memory-costing in your analyses?

There is a long history of discussion on this forum this past summer and fall about what the proper way to model attacker memory costs are. Broadly, there is the RAM model as compared to a variety of so-called "local" models. Some prominent examples of these alternative/local models include the 2D nearest neighbor model (often called the Square-Root model) and the 3D nearest neighbor model (often called the Cube-Root model). As a further example, the NIST PQC call for proposals defined a version of the (quantum) circuit model involving a MAXDEPTH parameter for gate-operations in series.

Note that this question is particularly relevant as the defining cost-metric (the computational hardness vs. AES) involves essentially no memory costs, whereas lattice sieving historically involves high memory costs.

One approach to addressing alternative models of memory-costing would be to define a single 'trade-off value' between max memory and computational properties (width, depth, network topology of the cryptanalytic device, etc.) and simply add some number of bit operations to the bit-complexity of algorithms in the RAM model. Typically, jointly settling on such a value is a difficult task, with many unknowns. See, for example, the discussion in the Kyber Round 3 spec that gives ranges of bit-complexities (either positive or negative) based on various uncertainty-factors. Do you have a preferred view on how to do this generic analysis?

However, more importantly: Do you find that any of your new algorithmic approaches have a concrete cost that would differ from a generic model-to-model analysis? (I hope to read through the work and answer "No!" but perhaps you have an opinion you could share now.)

Thank you for your insightful work and significant contribution to the science.

Best regards,

--Daniel Apon

Cryptography Lead, the MITRE Corporation

da...@mitre.org

--

You received this message because you are subscribed to the Google Groups "pqc-forum" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/DM5PR14MB140491EED7763525AB6C67E2A1E59%40DM5PR14MB1404.namprd14.prod.outlook.com.

Apr 9, 2022, 7:54:17 AM4/9/22

to pqc-...@list.nist.gov

Daniel Apon writes:

> lattice sieving historically involves high memory costs

The best estimates available here aren't comforting for Kyber.
> lattice sieving historically involves high memory costs

https://ntruprime.cr.yp.to/nist/ntruprime-20201007.pdf, using energy

numbers published by Intel, estimates each access to a bit within N bits

of memory as matching the cost of N^0.5/2^5 bit operations (page 57),

and uses this to estimate how RAM costs affect concrete sieving costs.

The numerical examples on page 103 show the extra security ranging from

a 2^40 factor for Core-SVP 2^129 to a 2^90 factor for Core-SVP 2^271.

See Section 6 of the same document for many reasons that these numbers

can be underestimating or overestimating the actual attack costs. The

round-3 Kyber documentation estimates that Kyber-512 attacks cost

between 2^135.5 and 2^165.5 "gates", where the "floor" for NIST's lowest

security category is 2^143 "gates".

The new attack paper starts from the Kyber documentation's middle

estimate, namely 2^151.5, and says it's reducing the attack costs by a

factor 2^14, to 2^137.5, using better attack algorithms.

It's clear that at least some of the algorithm-analysis uncertainties

stated in the round-3 Kyber documentation are regarding speedups that

combine with the speedups in the new paper. The optimistic possibility

for the attacker is that the new paper is actually giving a 2^14 savings

from 2^135.5, i.e., 2^121.5 bit operations.

Does accounting for real RAM costs close the gap between 2^121.5 and

2^143? One might think that, sure, this is covered by the 2^40 mentioned

above: Kyber-512 previously had security 2^40*2^135.5 = 2^175.5, so a

32.5-bit security margin, and the new paper is reducing this to an

18.5-bit security margin: i.e., the new paper is merely cutting out 40%

of the Kyber security margin, rather than breaking Kyber outright.

But let's look more closely at the numbers. As a preliminary point,

round-3 Kyber-512 is starting from Core-SVP just 2^112 and

revised-Core-SVP just 2^118, with exponent 87% and 91% of 129

respectively, so the obvious estimate is about 2^36 instead of 2^40.

Furthermore, this 2^36 is accounting for the energy cost of accesses to

a giant RAM array, while it's clear that many of the bits of security

beyond Core-SVP claimed in the round-3 Kyber security analysis are

coming from accounting for the cost of local bit operations. These

effects don't multiply; they add!

Internally, Core-SVP is starting from estimates of the number of

"operations" inside sieving. It makes sense to say that the attacker

needs to pay for the large-scale memory access inside each "operation".

It also makes sense to say that the attacker needs to pay for all the

bit operations inside each "operation". But the local bit operations are

an asymptotically irrelevant extra cost on top of the memory access, and

the best bet is that they don't make much difference for Kyber-512. The

real cost of this type of algorithm is, at a large scale, driven

primarily by data motion, not by local computation.

(The new paper seems to have some local speedups to the sieving inner

loop, which similarly should be presumed to make little difference next

to the memory-access bottleneck, but my understanding is that this is

under half of the bits of security loss that the paper is reporting.)

So I don't see how current knowledge can justify suggesting that the

costs of RAM rescue Kyber-512 from the new attack. It seems entirely

possible that the real costs of this Kyber-512 attack are considerably

below the costs of a brute-force AES-128 attack. Deciding this one way

or the other will require much more serious analysis of attack costs.

Certainly it's disturbing to see Kyber-512 dropping from (1) supposedly

"conservative" to (2) bleeding-edge in bit operations and then to (3)

apparently broken in bit operations and bleeding-edge in real cost. The

Kyber documentation really should stop using the word "conservative".

It's also deeply concerning that the uncertainties in evaluating costs

of lattice attacks, such as the 2^30 uncertainty factor (2^135.5 through

2^165.5) in the round-3 Kyber submission, have been weaponized again and

again to suggest that we shouldn't worry about a paper speeding up

lattice attacks by a factor 2^5 or 2^10 or 2^15. The cumulative effect

of years of such speedups has clearly been far more than 2^30. (The

mascot for lattice-based cryptography should be a slow-boiled frog.)

As a historical matter, we've seen again and again in cryptography that

a series of public attack advances has culminated in a feasible attack.

The community recognizes and promotes progress by putting serious effort

into _quantifying_ the attack costs. Security evaluation is obviously

the most important input to NISTPQC, so the NISTPQC rules should have

been designed to prioritize and assist quantification of attack costs.

Back in 2016, NIST proposed NISTPQC evaluation rules that instead

prioritized fake confidence in staying above cutoffs such as AES-128 and

AES-192. I correctly predicted in

https://blog.cr.yp.to/20161030-pqnist.html

that "Quantitatively comparing post-quantum public-key security levels

is going to be a nightmare". I recommended throwing away the security

cutoffs and replacing them with the traditional focus on analyzing

algorithm costs as accurately as possible. NIST's subsequent arguments

for prioritizing cutoffs don't stand up to examination---for details see

Appendix B.5 of

https://ntruprime.cr.yp.to/latticerisks-20211031.pdf

---and, even worse, the cutoffs are in cost metrics that NIST _pretends_

to have defined but has never actually defined. (See below.)

Going forward, clearly NIST is going to include some lattice systems in

its first standards; supposedly we'll find out which ones any moment

now. Maybe NIST is going to recklessly include the smallest proposed

parameters---but apparently we won't find this out for a while; NIST

indicated, surprisingly, that it _isn't_ planning to name parameters

yet. Given how much supposed security lattices have lost over the years

and how large the remaining attack surface is, there's a worrisome level

of risk even for bigger parameters such as Kyber-1024. So there's an

ongoing need for clear quantification of the costs of lattice attacks.

> the NIST PQC call for proposals defined a version of the (quantum)

> circuit model involving a MAXDEPTH parameter for gate-operations in series.

never defined a model to be used for NISTPQC. In particular, NIST never

defined the set of allowed "gates", despite requests for clarification.

(The algorithms literature includes many different gate sets, often

giving wildly different algorithm costs; see, e.g., how Ambainis's

distinctness paper uses quantum RAM gates.) Section 5.4 of

https://cr.yp.to/papers/categories-20200918.pdf

gives quotes, references, and numerical examples to illustrate the lack

of definition.

We've seen repeatedly how the ambiguities in NIST's pseudo-definitions

have been exploited to downplay attacks against some systems---this

reached amazing heights with last month's excuses for not withdrawing

the claim that the dimension-256 parameters in the preliminary Frodo

design from Lindner--Peikert "appear to be at least as secure as

AES-128"---and at the same time to hype attacks against other systems.

NIST's failure to pick a metric has done far more damage to comparisons

than whatever damage would have been done from the selection of a metric

that turns out to not be perfectly realistic.

---D. J. Bernstein

Apr 10, 2022, 10:29:18 AM4/10/22

to pqc-forum, D. J. Bernstein, pqc-...@list.nist.gov

Good morning (or other state-of-day), Dan,

For the sake of focusing this discussion to the open scientific questions, how about I simply concede the points in your commentary ("NIST sucks," "Lattices are frogs," etc.). I think what's most interesting here is getting to a precise (non-napkin-math) calculation of the memory costs in MATZOV's new algorithms. (Although I've only read through so far for a high level understanding, the new algorithms initially appear correct and well-analyzed in the RAM model to me.)

As a starting point for what I'd like to get at, consider the example range of calculations you began with (along with the caveat):

"The numerical examples on page 103 show the extra security ranging from

a 2^40 factor for Core-SVP 2^129 to a 2^90 factor for Core-SVP 2^271.

See Section 6 of the same document for many reasons that these numbers

can be underestimating or overestimating the actual attack costs."

Let me try to replicate that by hand for Kyber-1024 (to show some insufficiencies with a napkin math approach, whether this one I'm cooking up now or any other). The Kyber-1024 Round 3 spec claims classical Core-SVP hardness of 256. Using what I'll informally call "Thijs's May 2019 heuristic" (see https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/VHw5xZOJP5Y/m/nklFHrY4AwAJ), we can set o(1)=0 in the exponent of list size estimates from sieving analyses. Let's arbitrarily pick the 3-sieve algorithm's minimal list size costs from the G6K paper as the memory size estimate, ignoring runtime overhead induced by the smaller list size. This is 2^{0.1788n} from Fig 2 / page 19 of https://eprint.iacr.org/2019/089.pdf. Let's use the NTRU Prime memory-costing heuristic of N^{.5} / (2^5).

Putting all of this together, we get log_2((2^{.1788 * 1024})^(1/2) / (2^5)) ~= 86.55.

(Note that doing the same calculation for Kyber-512 gets you log_2((2^{.1788 * 512})^(1/2) / (2^5)) ~= 40.77 in a straightforward way.)

Now I want to just 'scale this up' to a hypothetical Kyber (ignoring issues like powers of 2 in the dimension) by taking percentages of Core-SVP values as

log_2((2^{.1788 * (271/256) * 1024})^(1/2) / (2^5)) ~= 91.91. //and I'm already off by a couple bits from 2^90

-----

A more rigorous approach would begin by calculating the concrete list size of the new dual lattice attack algorithm (the paper gives a concrete way to calculate the number of samples, D, on page 39 -- even if it's a mess to unravel). Then, one should look at the precise movement of memory required by Algorithms 2 (page 15) and 3 (page 17). It's important here to consider the memory architecture and the actual steps of the algorithm. In order to arrive at a conservative lower bound for this algorithm, it's probably best to model the memory architecture in the most ideal way possible (simply a single, uniform 2D grid, perhaps).

That is, following the NTRU Prime 3rd Round spec, if 2^30 bits of DRAM at 22nm fit in a 5mm x 5mm square, and one needs 2^90 lattice vectors, then we're talking about a 2D grid of bits spanning at least 2^(60/2) * 5mm on each side if arranged in a square, or approximately 42% of the width of Planet Earth. This is approximately the width of 1.5 our moon Luna. Re-arranging as a roughly spherical shape (perhaps in layers of 2D grids) for efficiency of communication, one derives approximately a small-moon-sized cryptanalytic device: https://www.youtube.com/watch?v=8Nho44lGVV8

So it's clearly critical that parameterizations for the new dual lattice attack consider values of D that are sufficiently small to fit on a real-world cryptanalytic device that could be constructed without importing matter from other solar systems. But that issue aside, then it's important to consider the cost of memory movement (as you highlight). A precise and perspicuous analysis of those exact costs is still outstanding.

Taking D^(1/2) / (2^5) as the additional running-time cost-factor is a reasonable first approach (even if I believe that calculation is largely over-estimating the run-time costs associating with large memory..), but a more rigorous analysis should consider the concrete steps performed by the algorithm, as well as any improvements that might be gained by how lattice vectors are laid out in memory during sieving. (This is not an idle intellectual exercise, since insights here will be applicable even when re-tooling the algorithm to achieve smaller values of D that could be effective in the real world.)

Toward that end, "simple" optimizations like in MATZOV's paper, Section 5.4 (Efficient Updating of the FFT Input) will be very strong.

Best regards,

--Daniel Apon

For the sake of focusing this discussion to the open scientific questions, how about I simply concede the points in your commentary ("NIST sucks," "Lattices are frogs," etc.). I think what's most interesting here is getting to a precise (non-napkin-math) calculation of the memory costs in MATZOV's new algorithms. (Although I've only read through so far for a high level understanding, the new algorithms initially appear correct and well-analyzed in the RAM model to me.)

As a starting point for what I'd like to get at, consider the example range of calculations you began with (along with the caveat):

"The numerical examples on page 103 show the extra security ranging from

a 2^40 factor for Core-SVP 2^129 to a 2^90 factor for Core-SVP 2^271.

See Section 6 of the same document for many reasons that these numbers

can be underestimating or overestimating the actual attack costs."

Putting all of this together, we get log_2((2^{.1788 * 1024})^(1/2) / (2^5)) ~= 86.55.

(Note that doing the same calculation for Kyber-512 gets you log_2((2^{.1788 * 512})^(1/2) / (2^5)) ~= 40.77 in a straightforward way.)

Now I want to just 'scale this up' to a hypothetical Kyber (ignoring issues like powers of 2 in the dimension) by taking percentages of Core-SVP values as

log_2((2^{.1788 * (271/256) * 1024})^(1/2) / (2^5)) ~= 91.91. //and I'm already off by a couple bits from 2^90

-----

A more rigorous approach would begin by calculating the concrete list size of the new dual lattice attack algorithm (the paper gives a concrete way to calculate the number of samples, D, on page 39 -- even if it's a mess to unravel). Then, one should look at the precise movement of memory required by Algorithms 2 (page 15) and 3 (page 17). It's important here to consider the memory architecture and the actual steps of the algorithm. In order to arrive at a conservative lower bound for this algorithm, it's probably best to model the memory architecture in the most ideal way possible (simply a single, uniform 2D grid, perhaps).

That is, following the NTRU Prime 3rd Round spec, if 2^30 bits of DRAM at 22nm fit in a 5mm x 5mm square, and one needs 2^90 lattice vectors, then we're talking about a 2D grid of bits spanning at least 2^(60/2) * 5mm on each side if arranged in a square, or approximately 42% of the width of Planet Earth. This is approximately the width of 1.5 our moon Luna. Re-arranging as a roughly spherical shape (perhaps in layers of 2D grids) for efficiency of communication, one derives approximately a small-moon-sized cryptanalytic device: https://www.youtube.com/watch?v=8Nho44lGVV8

So it's clearly critical that parameterizations for the new dual lattice attack consider values of D that are sufficiently small to fit on a real-world cryptanalytic device that could be constructed without importing matter from other solar systems. But that issue aside, then it's important to consider the cost of memory movement (as you highlight). A precise and perspicuous analysis of those exact costs is still outstanding.

Taking D^(1/2) / (2^5) as the additional running-time cost-factor is a reasonable first approach (even if I believe that calculation is largely over-estimating the run-time costs associating with large memory..), but a more rigorous analysis should consider the concrete steps performed by the algorithm, as well as any improvements that might be gained by how lattice vectors are laid out in memory during sieving. (This is not an idle intellectual exercise, since insights here will be applicable even when re-tooling the algorithm to achieve smaller values of D that could be effective in the real world.)

Toward that end, "simple" optimizations like in MATZOV's paper, Section 5.4 (Efficient Updating of the FFT Input) will be very strong.

Best regards,

--Daniel Apon

Apr 12, 2022, 11:51:02 AM4/12/22

to pqc-...@list.nist.gov

Daniel Apon writes:

> That is, following the NTRU Prime 3rd Round spec, if 2^30 bits of DRAM at

> 22nm fit in a 5mm x 5mm square, and one needs 2^90 lattice vectors, then

> we're talking about a 2D grid of bits spanning at least 2^(60/2) * 5mm on

> each side if arranged in a square, or approximately 42% of the width of

> Planet Earth.

First, 22nm is very far from the latest chip technology. See, e.g.,
> That is, following the NTRU Prime 3rd Round spec, if 2^30 bits of DRAM at

> 22nm fit in a 5mm x 5mm square, and one needs 2^90 lattice vectors, then

> we're talking about a 2D grid of bits spanning at least 2^(60/2) * 5mm on

> each side if arranged in a square, or approximately 42% of the width of

> Planet Earth.

https://en.wikipedia.org/wiki/5_nm_process

and the links from there to upcoming technology nodes. The advance from

22nm to 5nm took just 8 years, and made transistors about 3x smaller in

each direction.

Second, dividing the width mentioned above by about 30 gets down to the

radius of existing tracts of uninhabited land owned by governments with

a history of carrying out attacks.

Third, the NISTPQC call for proposals says "The security provided by a

cryptographic scheme is the most important factor in the evaluation",

not "The security provided by a cryptographic scheme against 22nm

attackers is the most important factor in the evaluation". It would be

astonishing if a project trying to protect against the long-term quantum

threat were allowing such shortsighted security goals.

Fourth, the "Improved Dual Lattice Attack" that this thread is about

appears to considerably reduce the attack costs. The Kyber documentation

mentions various other reasons for a 2^30 uncertainty factor regarding

the attack costs. Given the context, the above mention of "2^90 lattice

vectors" needs to be accompanied by a warning that the costs could be

much lower.

Fifth, restricting attention to 22nm is not endorsed by the NTRU Prime

documentation. On the contrary, the documentation explicitly points to

the trend towards smaller technology---and stays away from selecting any

bleeding-edge parameters in the first place.

The reason 22nm shows up in the documentation is that, as a separate

question from how expensive computation is on an absolute scale, it's

important to understand the _relative_ costs of different operations

inside attacks. Intel was nice enough to publish detailed energy figures

for 22nm in 2015; the NTRU Prime documentation compares those to readily

available data regarding 22nm RAM, obtaining an estimated sqrt(N)/2^5

_ratio_ between the cost of accessing a bit in N bits of RAM and the

cost of a bit operation. The documentation then explains why it's

reasonable to guess that future technology will have similar ratios:

Smaller technology than 22nm reduces the cost of bit operations, as

noted above, while also packing memory more densely. It is reasonable

to guess that these effects will stay approximately balanced:

compared to performing an AND or XOR on two bits within a tiny

distance, moving a bit over a tiny distance uses the same basic

physical phenomena but uses those phenomena in a simpler way, and

having it cost a constant factor less is unsurprising. This guess is

not meant as a substitute for continuing to monitor technology

trends.

None of this is endorsing the idea that the security goal should be

security against 22nm attackers.

> consider values of D that are sufficiently small to fit on a

> real-world cryptanalytic device that could be constructed without importing

> matter from other solar systems

require "importing matter from other solar systems": the Earth weighs

2^92 grams, and silicon etc. are very common.

Is an attacker in the foreseeable future going to build 2^60 grams of

chips, and have the energy budget to run those chips? No, and no. See

my pqc-forum email dated 20 Nov 2016 05:14:07 +0000. But it would be

crazy for NISTPQC to set its minimum security level at just barely

stopping attacks with current technology.

The NISTPQC call for proposals sets a minimum security level

considerably above this. Specifically, it sets brute-force search for a

single AES-128 key as a "floor" for security. It's clear that attackers

aren't anywhere near carrying out 2^128 operations.

What happens if an attack falls in the gap: easier to break than AES-128

but still not feasible for attackers today? Unfortunately, we've seen

that the answer depends on the cryptosystem being attacked:

* For some cryptosystems, the infeasibility is hyped. So much

equipment needed to finish in a reasonable time! So much energy!

Look at how hard this would be!

* For other cryptosystems, we instead hear that anything below 2^128

operations, even with access to a massive memory array counted as

just one "operation", counts as a break.

Specifically, the pqc-forum comparisons of attack costs to Earth

resources have been encouraging consideration of larger-scale attacks

for cryptosystems in general (Perlner email dated 17 Aug 2020 17:41:27

+0000) and for LAC in particular (Hamburg email dated 12 Apr 2018

14:50:16 -0400). For most other specific attacks (including infeasible

attacks), a comparison to Earth resources isn't even mentioned. But, for

the latest Kyber-512 security loss, we're seeing an Earth comparison

being used to suggest that the attack should be ignored. See also

https://twitter.com/mjos_crypto/status/1511027605033652234

saying that dropping below AES-128 security is ok.

Some of the NIST statements have suggested that AES-128 isn't a hard

floor for security. What _is_ the floor, then? If an attack uses only

2^128 _bit_ operations, does that count as a break? What if it also

needs 2^70 bits of RAM? NIST keeps dodging concrete questions, and keeps

dodging the question of which "gates" it allows. The unclear boundaries

are then used to reward some cryptosystems and punish others.

The documented facts are that _some_ attack speedups are big asymptotic

changes (e.g., L(1) down to L(1/2)), but most attack speedups (and many

breaks) come from people looking more closely at attack costs. For these

people, the way that NIST promotes a yes/no cutoff question regarding

security, without actually defining the cutoff, is a big disincentive to

the necessary research. Instead of saying, wow, this makes an attack

1000x or 1000000x faster, one is faced with people asking whether this

speedup crosses NIST's cutoff. How is one supposed to answer this

question when NIST doesn't say which cutoff definitions it allows?

So, instead of a scientific process studying clearly defined questions,

there's a political process weaponizing a lack of clarity. At some point

observers are forced to ask whether the lack of clarity is deliberate.

---D. J. Bernstein

Apr 12, 2022, 12:21:32 PM4/12/22

to pqc-...@list.nist.gov

The "nm" process sizes are almost entirely marketing terms. TSMCs' 5nm SRAM bit cell size is 0.021µm^2 according to TSMC's marketing materials on semiWiki[1], which is closer to 22nm than 5nm.

That said, I agree that clarity of the metrics is important. Is RAM access considered? What is a "gate operation"? A single clearly-defined metric for security evaluation would ease comparisons, even if it's a non-physical one with planet-sized RAM arrays and instantaneous access at a distance. As long as the assumptions of the metric make attacks *easier* than in real-world systems any real attack should also meet (at least) the same target security level.

[1] https://semiwiki.com/semiconductor-manufacturers/tsmc/283487-tsmcs-5nm-0-021um2-sram-cell-using-euv-and-high-mobility-channel-with-write-assist-at-isscc2020/

—Carl Mitchell

My views herein are my own, not those of my employer (Motive).

That said, I agree that clarity of the metrics is important. Is RAM access considered? What is a "gate operation"? A single clearly-defined metric for security evaluation would ease comparisons, even if it's a non-physical one with planet-sized RAM arrays and instantaneous access at a distance. As long as the assumptions of the metric make attacks *easier* than in real-world systems any real attack should also meet (at least) the same target security level.

[1] https://semiwiki.com/semiconductor-manufacturers/tsmc/283487-tsmcs-5nm-0-021um2-sram-cell-using-euv-and-high-mobility-channel-with-write-assist-at-isscc2020/

—Carl Mitchell

My views herein are my own, not those of my employer (Motive).

--

You received this message because you are subscribed to the Google Groups "pqc-forum" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/20220412155050.752181.qmail%40cr.yp.to.

Apr 12, 2022, 1:32:06 PM4/12/22

to pqc-...@list.nist.gov

'Carl Mitchell' via pqc-forum writes:

> The "nm" process sizes are almost entirely marketing terms.

Node "nm" is poorly related to actual nm, yes, but what I said was that
> The "nm" process sizes are almost entirely marketing terms.

the move from 22nm to 5nm "made transistors about 3x smaller in each

direction". Compare, e.g., the transistor densities reported on the

Intel slide in

https://spectrum.ieee.org/intel-now-packs-100-million-transistors-in-each-square-millimeter

to current figures for the Apple M1 and other 5nm chips, and take a

square root to get from 2-dimensional density to 1-dimensional length.

---D. J. Bernstein

Apr 12, 2022, 2:29:54 PM4/12/22

to pqc-forum, מצו״ב

Dear MATZOV researchers, dear all,

thank you for sharing your work, and in particular bringing attention to the cost model for BDGL sieve, where, you claim a few bits of security can be shaved. I am looking at the estimates from [AGPS20, page 47], and found the following model for [BDGL16] list decoding:

insert_cost = filters * C (d,T2) * COST_IP(d) * log2(d)

query_cost = filters * C (d,T1) * COST_IP(d) * log2(d)

(the d parameter is not explicit for ip_cost, but I'll need to tweak it below).

I agree with you that this model is not adequate; the original algorithm from [BDGL16] should instead have cost:

Z = filters^(d/m)

insert_cost = Z * COST_IP(d/m) * m + m * Z * log(Z) * COST_COMPARE_SWAP + m * filters * C(d , T2) * COST_TREE_ITER

query_cost = Z * COST_IP(d/m) * m + m * Z * log(Z) * COST_COMPARE_SWAP + m * filters * C(d , T1) * COST_TREE_ITER

(Some discussion on the choice of m is necessary, I am delaying to PS for readability. TLDR: there are some overheads that have not ben accounted by the literature, and they get worse as m increases).

1/ Can you confirm that this is how you modeled its cost ? And if so, how did you choose m ? If not, then what ?

2/ Even better, would you be kind enough to provide you modified scripts for obtaining your conclusions ?

Best regards

- Leo Ducas

PS: On the choice of m

The choice of m we propose in BDGL is m = O(log d), though the explicit constant is hard to choose. Why not take very large m to thwart the first terms ? After all, one could take Z=2 my carefully choosing m, but doing so essentially sends us back to Hyperplane LSH, whose complexity is exponentially worse.

Choosing m = O(log d), our Theorem 5.1 in [BDGL16] shows that the loss compared to the idealized model is at most sub-exponential 2^{~O(\sqrt n)}. Unfortunately, this overhead has never been quantified concretely in the literature, and has essentially been ignored in the NIST estimates.

If forced to guess, I note that in practice [DSvW21] m=3 for d=120 seems to be the optimal trade-off between probabity loss and speed of list decoding, so a reasonable choice for d~400 might be m = 5 or 6.

[BDGL16] https://eprint.iacr.org/2015/1128

[AGPS20] https://eprint.iacr.org/2019/1161

[DSvW21] https://eprint.iacr.org/2021/141

thank you for sharing your work, and in particular bringing attention to the cost model for BDGL sieve, where, you claim a few bits of security can be shaved. I am looking at the estimates from [AGPS20, page 47], and found the following model for [BDGL16] list decoding:

insert_cost = filters * C (d,T2) * COST_IP(d) * log2(d)

query_cost = filters * C (d,T1) * COST_IP(d) * log2(d)

(the d parameter is not explicit for ip_cost, but I'll need to tweak it below).

I agree with you that this model is not adequate; the original algorithm from [BDGL16] should instead have cost:

Z = filters^(d/m)

insert_cost = Z * COST_IP(d/m) * m + m * Z * log(Z) * COST_COMPARE_SWAP + m * filters * C(d , T2) * COST_TREE_ITER

query_cost = Z * COST_IP(d/m) * m + m * Z * log(Z) * COST_COMPARE_SWAP + m * filters * C(d , T1) * COST_TREE_ITER

(Some discussion on the choice of m is necessary, I am delaying to PS for readability. TLDR: there are some overheads that have not ben accounted by the literature, and they get worse as m increases).

1/ Can you confirm that this is how you modeled its cost ? And if so, how did you choose m ? If not, then what ?

2/ Even better, would you be kind enough to provide you modified scripts for obtaining your conclusions ?

Best regards

- Leo Ducas

PS: On the choice of m

The choice of m we propose in BDGL is m = O(log d), though the explicit constant is hard to choose. Why not take very large m to thwart the first terms ? After all, one could take Z=2 my carefully choosing m, but doing so essentially sends us back to Hyperplane LSH, whose complexity is exponentially worse.

Choosing m = O(log d), our Theorem 5.1 in [BDGL16] shows that the loss compared to the idealized model is at most sub-exponential 2^{~O(\sqrt n)}. Unfortunately, this overhead has never been quantified concretely in the literature, and has essentially been ignored in the NIST estimates.

If forced to guess, I note that in practice [DSvW21] m=3 for d=120 seems to be the optimal trade-off between probabity loss and speed of list decoding, so a reasonable choice for d~400 might be m = 5 or 6.

[BDGL16] https://eprint.iacr.org/2015/1128

[AGPS20] https://eprint.iacr.org/2019/1161

[DSvW21] https://eprint.iacr.org/2021/141

Apr 12, 2022, 2:55:35 PM4/12/22

to pqc-forum, Leo Ducas, מצו״ב

Typo:

Z = filters^(d/m) --> Z = filters^(1/m)

Z = filters^(d/m) --> Z = filters^(1/m)

Apr 26, 2022, 3:06:35 PM4/26/22

to Leo Ducas, pqc-forum

- בלמ"ס -

Dear PQC researchers,

Thank you for your comments.

1. Mr. Apon,

Please note the analysis in the RAM model in the report is largely based on previous works, such as Albrecht et al (https://ia.cr/2019/1161) and more, as described in sections 6 and 7.

The choice of this model enables proper comparison of the attack to other similar methods and to external results, and we expect its relative improvement to be similar in other computation and memory models. In fact, because the
sieve is performed on smaller dimensional lattices than comparable attacks (the main source of improvement), then the memory requirements should only be smaller, and the comparative improvement should possibly be even more significant in other models that
penalize large memory usage.

Beyond the memory cost of the sieve, there is the addition of the FFT step, similar to the work presented by Guo and Johansson at AsiaCrypt 2021 (https://doi.org/10.1007/978-3-030-92068-5_2). This step also requires accesses to
somewhat large memory, but its requirements are smaller than the sieve in most parameters, and even increasing its cost does not affect the model significantly. To give an example, we tried assuming the FFT costs were larger by a factor of 1,000 and re-optimized
the parameters, and the result was a change of less than a factor of 2 for the overall costs in most cases.

> However, more importantly: Do you find that any of your new algorithmic approaches have a concrete cost that would differ from a generic model-to-model analysis? (I hope to read through
the work and answer "No!" but perhaps you have an opinion you could share now.)

The short answer is indeed "no", as our work does not present increased memory usage compared to other similar attacks published for lattice candidates.

2. Mr. Ducas,

> I agree with you that this model is not adequate; the original algorithm from [BDGL16] should instead have cost:

> Z = filters^(1/m)

> insert_cost = Z * COST_IP(d/m) * m + m * Z * log(Z) * COST_COMPARE_SWAP + m * filters * C(d , T2) * COST_TREE_ITER

> query_cost = Z * COST_IP(d/m) * m + m * Z * log(Z) * COST_COMPARE_SWAP + m * filters * C(d , T1) * COST_TREE_ITER

These formulas do reflect the cost of algorithm from [BDGL16]. However, in Section 6.2.1 we describe a different decoding algorithm, which utilizes a somewhat different iteration tree. In this algorithm, the subcode lists are relabeled
in a way that allows for more efficient pruning. The formulas for the cost of this algorithm are

insert_cost = Z * COST_IP(d/m) * m + m * Z * log(Z) * COST_COMPARE_SWAP + filters * C(d , T2) * COST_TREE_ITER

query_cost = Z * COST_IP(d/m) * m + m * Z * log(Z) * COST_COMPARE_SWAP + filters * C(d , T1) * COST_TREE_ITER

The first two terms, which account for preprocessing, are the same as [BDGL16] (as the preprocessing is very similar). The cost of iterating over the tree, which is usually the significant part, is different. While the algorithm
in [BDGL16] costs O(m) operations per obtained filter, the algorithm in our report costs O(1) operations per obtained filter.

Note that this also means the choice of m has a far less significant effect on the runtime. For 400 < d < 1000, choosing m = 5 or 6 ensures that the cost of preprocessing is negligible relative to the cost of iteration.

3. We have received another comment from Mr. Ducas regarding the usage of the G6K model for estimations.

We acknowledge that extrapolating the model to higher sieving dimensions was adviced against in (https://doi.org/10.1007/978-3-030-17656-3_25), and was added mainly for comparison to previous results of Guo and Johannson.

Creating an adequate model for higher sieving dimensions other than the asymptotic model is a matter of further research.

Our report has been updated accordingly: https://doi.org/10.5281/zenodo.6493704

We thank Mr. Ducas for the clarification.

We are thankful for all the comments, and will be happy to answer questions from either the community or NIST's team regarding the presented algorithmic improvements.

Best regards,

MATZOV

--

You received this message because you are subscribed to the Google Groups "pqc-forum" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

You received this message because you are subscribed to the Google Groups "pqc-forum" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

To view this discussion on the web visit
https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/bceaaee6-50b0-45e7-b4f3-beba7c95043dn%40list.nist.gov.

Apr 26, 2022, 3:19:18 PM4/26/22

to מצו״ב, Leo Ducas, pqc-forum

Dear MATZOV,

I tend to agree with your comments on my questions. Thank you for sharing them.

Best regards,

--Daniel Apon

I tend to agree with your comments on my questions. Thank you for sharing them.

Best regards,

--Daniel Apon

To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/DM5PR14MB14045D3A0AB14C4D9831D352A1FB9%40DM5PR14MB1404.namprd14.prod.outlook.com.

May 2, 2022, 2:45:21 PM5/2/22

to pqc-...@list.nist.gov

Hi all,

We looked at the changes to sieving estimates in the "Report on the Security of LWE: Improved Dual Lattice Attack" by MATZOV available at https://doi.org/10.5281/zenodo.6412487 and agree

1. We made a mistake in costing visiting each node as an inner product. These costs can be amortised by preprocessing as already described in the original BDGL paper.

2. We agree that the enumeration described in Section 6.2.1 of the MATZOV paper reduces the cost per solution from O(m) to constant.

We have updated out scripts and cost estimates here:

https://github.com/jschanck/eprint-2019-1161/pull/3

We stress that our scripts and estimates were developed to assess the effect of quantum computing on sieving. For this reason, i.e. the seemingly prohibitive cost of QRAM, we explicitly ignore all memory access costs. Accounting for such memory access costs would affect other trade-offs.

We also note that our estimates are slightly lower than those reported by MATZOV. We assume this is down to some choice of magic constants somwhere. Thus, we would appreciate if MATZOV could publish their estimation scripts to allow us to reproduce and compare to them.

In addition, the lattice estimator <https://github.com/malb/lattice-estimator/> has been updated with these new costs

https://github.com/malb/lattice-estimator/pull/35

Note that these corrections and improvements to sieving are not restricted to the dual attack but also apply to the primal attack.

Best,

John, Eamonn and Martin

PS: We expect that the lattice estimator will be updated shortly with the dual attack model from the above mentioned report, too.

PPS: We thank Léo Ducas for helpful discussions on list decoding in BDGL.

On Mon, Apr 04 2022, מצו״ב wrote:

> - בלמ"ס -

>

>

> Dear PQC researchers,

>

>

>

> The Center of Encryption and Information Security (MATZOV) of the IDF has

> conducted an internal audit of leading Post-Quantum cryptographic (PQC) schemes,

> focusing on the Learning With Errors and Rounding problems.

>

> After consultations with NIST over the last few months – we have decided to

> release the audit as a Technical Report available for public review.

>

> https://doi.org/10.5281/zenodo.6412487<https://doi.org/10.5281/zenodo.6412487>

_pgp: https://keybase.io/martinralbrecht

_www: https://malb.io

_prn: he/him or they/them

We looked at the changes to sieving estimates in the "Report on the Security of LWE: Improved Dual Lattice Attack" by MATZOV available at https://doi.org/10.5281/zenodo.6412487 and agree

1. We made a mistake in costing visiting each node as an inner product. These costs can be amortised by preprocessing as already described in the original BDGL paper.

2. We agree that the enumeration described in Section 6.2.1 of the MATZOV paper reduces the cost per solution from O(m) to constant.

We have updated out scripts and cost estimates here:

https://github.com/jschanck/eprint-2019-1161/pull/3

We stress that our scripts and estimates were developed to assess the effect of quantum computing on sieving. For this reason, i.e. the seemingly prohibitive cost of QRAM, we explicitly ignore all memory access costs. Accounting for such memory access costs would affect other trade-offs.

We also note that our estimates are slightly lower than those reported by MATZOV. We assume this is down to some choice of magic constants somwhere. Thus, we would appreciate if MATZOV could publish their estimation scripts to allow us to reproduce and compare to them.

In addition, the lattice estimator <https://github.com/malb/lattice-estimator/> has been updated with these new costs

https://github.com/malb/lattice-estimator/pull/35

Note that these corrections and improvements to sieving are not restricted to the dual attack but also apply to the primal attack.

Best,

John, Eamonn and Martin

PS: We expect that the lattice estimator will be updated shortly with the dual attack model from the above mentioned report, too.

PPS: We thank Léo Ducas for helpful discussions on list decoding in BDGL.

On Mon, Apr 04 2022, מצו״ב wrote:

> - בלמ"ס -

>

>

> Dear PQC researchers,

>

>

>

> The Center of Encryption and Information Security (MATZOV) of the IDF has

> conducted an internal audit of leading Post-Quantum cryptographic (PQC) schemes,

> focusing on the Learning With Errors and Rounding problems.

>

> After consultations with NIST over the last few months – we have decided to

> release the audit as a Technical Report available for public review.

>

>

>

>

> Our report presents several improvements to the dual lattice attack, which

> induce a noticeable reduction in the security estimation for Kyber, Saber and

> Dilithium, bringing them below the required threshold.

>

> The report does not intend to provide a complete analysis of all post-quantum

> candidates, nor to recommend usage of specific algorithms. Rather, this

> publication is meant to share advances in the cryptanalysis of lattices which we

> believe to be relevant to the academic research in the field.

>

>

>

> We acknowledge the remarkable work done by NIST in the process and its impact –

> creating interest in the post-quantum field and promoting new cryptographic

> schemes.

>

> A prudent approach for these schemes is recommended, as research in the field is

> constantly evolving and much remains unstudied. Therefore, as a contribution to

> the community, the report includes further research ideas which we deem

> interesting.

>

>

>

> MATZOV, IDF

--
>

>

> Our report presents several improvements to the dual lattice attack, which

> induce a noticeable reduction in the security estimation for Kyber, Saber and

> Dilithium, bringing them below the required threshold.

>

> The report does not intend to provide a complete analysis of all post-quantum

> candidates, nor to recommend usage of specific algorithms. Rather, this

> publication is meant to share advances in the cryptanalysis of lattices which we

> believe to be relevant to the academic research in the field.

>

>

>

> We acknowledge the remarkable work done by NIST in the process and its impact –

> creating interest in the post-quantum field and promoting new cryptographic

> schemes.

>

> A prudent approach for these schemes is recommended, as research in the field is

> constantly evolving and much remains unstudied. Therefore, as a contribution to

> the community, the report includes further research ideas which we deem

> interesting.

>

>

>

> MATZOV, IDF

_pgp: https://keybase.io/martinralbrecht

_www: https://malb.io

_prn: he/him or they/them

May 2, 2022, 9:18:41 PM5/2/22

to pqc-forum, מצו״ב

Dear MATZOV researchers, dear all,

but are using progressive-BKZ to cost it. The GSA requires many tours

at the same blocksize to reach. Progressive BKZ lags behind a bit. At

the relevant dimension this is about 9 blocksize, so about 2.5 bits extra

on the cost of BKZ. See data and script at

various steps.

Its small, but we seem to be at that level of details. I'm pointing it out

for completeness, and for comparing apple to apple (our costing of the

primal attack uses the simulator).

Best regards

-- Léo

-- Léo

d=1024

beta, b0_simul, b0_gsa

...

378 73.277 68.534379 72.721 68.010

380 72.173

381 71.631 66.982

382 71.095 66.477

383 70.567 65.977

384 70.045 65.483

385 69.529 64.995

386 69.020 64.513

387 68.517 64.036

388 68.020 63.564

389

390 67.045 62.637

391 66.566 62.181

Le lundi 4 avril 2022 à 18:38:57 UTC+2, מצו״ב a écrit :

May 6, 2022, 11:33:22 AM5/6/22

to מצו״ב, pqc-...@list.nist.gov

Dear all: on the question of the memory cost of this attack, I'd like to highlight some concrete numbers.

Tables 3--6 of the MATZOV report show that for (say) Kyber-512, the attack uses sieving in dimensions close to 380 (+-3, depending on choice of models).

How much memory does this need? A fairly precise estimate is at least 2^90 bits for pair-sieving (which the MATZOV report uses for its runtime analysis), and at least 2^85 bits for triple-sieving (which is slower than pair-sieving).

(I derived these numbers using the algorithms' models and real-world experiments, which closely align. For example, the data in Table 1 of https://eprint.iacr.org/2021/141.pdf nicely fits the triple-sieving model of 2^{(0.1887+o(1))*d}. The pair-sieving model has 0.2075 in place of 0.1887.)

Sieving algorithms are highly memory-bound, so these large memory requirements would impose a significant real-world cost that is not counted in the RAM-model analysis (and would also affect the overall optimization of parameters). Of course, quantifying this precisely is an important research question.

Sincerely yours in cryptography,

Chris

May 7, 2022, 8:39:12 AM5/7/22

to pqc-...@list.nist.gov

'Martin R. Albrecht' via pqc-forum writes:

> Note that these corrections and improvements to sieving are not

> restricted to the dual attack but also apply to the primal attack.

Am I correctly understanding that your latest "lattice-estimator" cost
> Note that these corrections and improvements to sieving are not

> restricted to the dual attack but also apply to the primal attack.

estimate for breaking Kyber-512 is 2^140.3 bit operations: i.e., several

times fewer bit operations than AES-128 key search, and 2400x fewer bit

operations than the 2^151.5 from the round-3 Kyber documentation?

The round-3 Kyber documentation also mentions various other speedups

that could save "a factor of up to 2^16" without any new attack ideas.

Are any of these speedups covered by your cost estimate? If so, is there

a chart making clear which speedups are covered and which aren't? Thanks

in advance for any clarification you can provide.

---D. J. Bernstein

May 7, 2022, 11:41:20 AM5/7/22

to pqc-...@list.nist.gov

Hi Dan,

I assume you’re referring to the difference between “usvp” and “bdd” in the estimator, e.g. here: https://github.com/malb/lattice-estimator/

This corresponds to Q7 of

https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf

It is worth reiterating that this improvement has a time-memory trade-off flavour to it since the final sieving step is over a larger dimension than the BKZ sieving steps. However, the cost of memory access is not costed (in the estimator nor usually the wider literature)

To me this supports a point that you’ve been making for many years: we should cost memory access, too, to get a better understanding of the true costs of these attacks.

I should also stress that the Kyber spec uses the CN11 simulator to predict the shape after lattice reduction while the estimator uses the GSA by default. The former is more precise, the latter is faster.

Here’s the effect of that difference:

sage: LWE.primal_usvp(Kyber512, red_shape_model="CN11")

rop: ≈2^146.8, red: ≈2^146.8, δ: 1.003869, β: 417, d: 994, tag: usvp

sage: LWE.primal_bdd(Kyber512, red_shape_model="CN11")

rop: ≈2^142.2, red: ≈2^141.1, svp: ≈2^141.3, β: 396, η: 430, d: 1013, tag: bdd

sage: LWE.primal_usvp(Kyber512, red_shape_model="GSA")

rop: ≈2^143.8, red: ≈2^143.8, δ: 1.003941, β: 406, d: 998, tag: usvp

sage: LWE.primal_bdd(Kyber512, red_shape_model="GSA")

rop: ≈2^140.3, red: ≈2^139.7, svp: ≈2^138.8, β: 391, η: 421, d: 1013, tag: bdd

Skimming through “5.3 Approximations, overheads, and foreseeable improvements” of

https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf

nothing stood out as already covered by the estimator.

Cheers,

Martin

On Sat, May 07 2022, D. J. Bernstein wrote:

> [[PGP Signed Part:Undecided]]

I assume you’re referring to the difference between “usvp” and “bdd” in the estimator, e.g. here: https://github.com/malb/lattice-estimator/

This corresponds to Q7 of

https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf

It is worth reiterating that this improvement has a time-memory trade-off flavour to it since the final sieving step is over a larger dimension than the BKZ sieving steps. However, the cost of memory access is not costed (in the estimator nor usually the wider literature)

To me this supports a point that you’ve been making for many years: we should cost memory access, too, to get a better understanding of the true costs of these attacks.

I should also stress that the Kyber spec uses the CN11 simulator to predict the shape after lattice reduction while the estimator uses the GSA by default. The former is more precise, the latter is faster.

Here’s the effect of that difference:

sage: LWE.primal_usvp(Kyber512, red_shape_model="CN11")

rop: ≈2^146.8, red: ≈2^146.8, δ: 1.003869, β: 417, d: 994, tag: usvp

sage: LWE.primal_bdd(Kyber512, red_shape_model="CN11")

rop: ≈2^142.2, red: ≈2^141.1, svp: ≈2^141.3, β: 396, η: 430, d: 1013, tag: bdd

sage: LWE.primal_usvp(Kyber512, red_shape_model="GSA")

rop: ≈2^143.8, red: ≈2^143.8, δ: 1.003941, β: 406, d: 998, tag: usvp

sage: LWE.primal_bdd(Kyber512, red_shape_model="GSA")

rop: ≈2^140.3, red: ≈2^139.7, svp: ≈2^138.8, β: 391, η: 421, d: 1013, tag: bdd

Skimming through “5.3 Approximations, overheads, and foreseeable improvements” of

https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf

nothing stood out as already covered by the estimator.

Cheers,

Martin

On Sat, May 07 2022, D. J. Bernstein wrote:

> [[PGP Signed Part:Undecided]]

May 7, 2022, 10:08:54 PM5/7/22

to pqc-...@list.nist.gov

The call for NISTPQC proposals describes the security of AES-128 as a

"floor" for the lowest security "category" allowed in NISTPQC and says

In order for a cryptosystem to satisfy one of the above security

requirements, any attack must require computational resources

comparable to or greater than the stated threshold, with respect to

_all_ metrics that NIST deems to be potentially relevant to practical

security.

(Italics in original.) The call for proposals highlights "classical

gates" as a metric (along with variants having depth limits), and says

that attacking AES-128 costs "2^143 classical gates".

Does Kyber-512 cost >=2^143 "classical gates" to break? The round-3

Kyber documentation said yes (sort of), specifically 2^151.5 (with the

caveat below regarding 2^16), but today the answer seems to be no,

because of an attack paper a month ago. This raises further questions:

* Is the Kyber-512 proposal going to be withdrawn by the submission

team?

* Are the "category" claims for Kyber-768 and Kyber-1024 going to be

adjusted downwards to 2 and 4 respectively?

* NISTIR 8309 "strongly encourages the submitters to provide at least

one parameter set that meets category 5"; is there going to be a

proposal of a larger Kyber parameter set? (This isn't easy; see

https://ntruprime.cr.yp.to/latticerisks-20211031.pdf pages 89-90.)

I haven't seen an official comment from the Kyber team on the new

attack. Did I miss an announcement? Is Kyber still claiming 2^151.5?

Previously, in email dated 8 Dec 2020 13:11:49 -0800, NIST wrote "If

this analysis is correct, then Kyber clearly meets the security

categories defined in the CFP. If the analysis is found to be incorrect,

or if new attacks arise, then we will re-examine the situation".

Has NIST re-examined the security of Kyber? Where are the details for

public review? MATZOV referred to "consultations with NIST over the last

few months", but I haven't seen anything public from NIST about this.

The NISTPQC evaluation criteria also downgrade "schemes that were

designed by repeatedly patching older schemes that were shown vulnerable

to cryptanalysis". Given Kyber's security losses so far, how precisely

does NIST plan to address the risk of further security losses? (This is

a separate question from the "floor" question.)

I've complained before about NIST's multi-year failure to define the

gate set that it's referring to when it says "gates". The literature

defines many different gate sets, leading to different conclusions about

algorithm cost. Is NIST going to use this ambiguity to say that the

current attacks don't break Kyber-512? If so, will the public ever get

to see a commitment to the gate set NIST is using for this evaluation,

so that cryptanalysts have a clear target to break?

My understanding is that NIST plans to temporarily dodge concrete

parameter questions by announcing selections without parameters. This is

problematic. Concrete parameter choices are an essential component of

cryptographic standards, influencing both performance and security.

'Martin R. Albrecht' via pqc-forum writes:

> I should also stress that the Kyber spec uses the CN11 simulator to

> predict the shape after lattice reduction while the estimator uses the

> GSA by default. The former is more precise, the latter is faster.

[ ... ]

estimate is 2^142.2 bit operations to break Kyber-512, i.e., 600x fewer

bit operations than what the Kyber-512 round-3 documentation said

(namely 2^151.5)? And this doesn't account for the known speedups that

the documentation says could save "a factor of up to 2^16"?

> To me this supports a point that you’ve been making for many years: we

> should cost memory access, too, to get a better understanding of the

> true costs of these attacks.

For people who care about actual attack costs, definitely.

For NISTPQC, however, the call for proposals described cutoffs for the

number of "gates". NIST's pqc-forum email dated 23 Jun 2020 21:25:49

+0000 set impossible-to-meet "minimum" criteria for replacing "gates"

with something more realistic. We've repeatedly seen "gates" used as the

foundation for objections to other submissions, including submissions

that

* never had as much spec instability as Kyber,

* never had as much security-analysis instability as Kyber,

* never had as much security degradation as Kyber,

* never had bleeding-edge parameter proposals such as Kyber-512, and

* consistently pointed to the real costs of RAM from the outset.

These objections have had NIST's apparent approval and often explicit

endorsement. So it would look very bad for NIST to rescue Kyber-512 by

ignoring or changing the cutoffs at the last minute.

Furthermore, after NISTIR 8309 introduced its "strongly encourages ...

category 5" in the middle of NISTPQC and retroactively criticized some

submissions for not proposing parameters that reach "category 5" _when

RAM costs are ignored_, it would look even worse for NIST to allow

Kyber-1024 as "category 5" on the basis of RAM costs.

As another example, consider recent comments we've seen on pqc-forum

trying to downplay an attack as supposedly needing a "small-moon-sized

cryptanalytic device" (at 22nm, as if this were the latest technology),

and then imagine what it would look like for NIST to adopt this position

given, e.g., the following quote from NIST's Ray Perlner:

I would also add that math doesn’t seem to care very much about the

energy budget of an earth sized planet or how many atoms are in its

crust, so if your claim to have a large security margin relies

crucially on exceeding some such physical limit, you might not have

as much as you think.

Anyway, according to everything NIST has announced so far, attacks below

"2^143 classical gates" are breaks even if they use a ton of RAM. I

appreciate the enthusiasm of the people who have recently learned and

announced how important RAM cost is in the real world, but somehow the

sudden "RAM! RAM! RAM!" chorus seems to have interfered with reaching

clarity regarding the number of "classical gates" to break Kyber-512.

---D. J. Bernstein

"floor" for the lowest security "category" allowed in NISTPQC and says

In order for a cryptosystem to satisfy one of the above security

requirements, any attack must require computational resources

comparable to or greater than the stated threshold, with respect to

_all_ metrics that NIST deems to be potentially relevant to practical

security.

(Italics in original.) The call for proposals highlights "classical

gates" as a metric (along with variants having depth limits), and says

that attacking AES-128 costs "2^143 classical gates".

Does Kyber-512 cost >=2^143 "classical gates" to break? The round-3

Kyber documentation said yes (sort of), specifically 2^151.5 (with the

caveat below regarding 2^16), but today the answer seems to be no,

because of an attack paper a month ago. This raises further questions:

* Is the Kyber-512 proposal going to be withdrawn by the submission

team?

* Are the "category" claims for Kyber-768 and Kyber-1024 going to be

adjusted downwards to 2 and 4 respectively?

* NISTIR 8309 "strongly encourages the submitters to provide at least

one parameter set that meets category 5"; is there going to be a

proposal of a larger Kyber parameter set? (This isn't easy; see

https://ntruprime.cr.yp.to/latticerisks-20211031.pdf pages 89-90.)

I haven't seen an official comment from the Kyber team on the new

attack. Did I miss an announcement? Is Kyber still claiming 2^151.5?

Previously, in email dated 8 Dec 2020 13:11:49 -0800, NIST wrote "If

this analysis is correct, then Kyber clearly meets the security

categories defined in the CFP. If the analysis is found to be incorrect,

or if new attacks arise, then we will re-examine the situation".

Has NIST re-examined the security of Kyber? Where are the details for

public review? MATZOV referred to "consultations with NIST over the last

few months", but I haven't seen anything public from NIST about this.

The NISTPQC evaluation criteria also downgrade "schemes that were

designed by repeatedly patching older schemes that were shown vulnerable

to cryptanalysis". Given Kyber's security losses so far, how precisely

does NIST plan to address the risk of further security losses? (This is

a separate question from the "floor" question.)

I've complained before about NIST's multi-year failure to define the

gate set that it's referring to when it says "gates". The literature

defines many different gate sets, leading to different conclusions about

algorithm cost. Is NIST going to use this ambiguity to say that the

current attacks don't break Kyber-512? If so, will the public ever get

to see a commitment to the gate set NIST is using for this evaluation,

so that cryptanalysts have a clear target to break?

My understanding is that NIST plans to temporarily dodge concrete

parameter questions by announcing selections without parameters. This is

problematic. Concrete parameter choices are an essential component of

cryptographic standards, influencing both performance and security.

'Martin R. Albrecht' via pqc-forum writes:

> I should also stress that the Kyber spec uses the CN11 simulator to

> predict the shape after lattice reduction while the estimator uses the

> GSA by default. The former is more precise, the latter is faster.

> Skimming through “5.3 Approximations, overheads, and foreseeable improvements” of

> https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf

> nothing stood out as already covered by the estimator.

So, just to make sure I'm clear about the conclusion: Your current
> https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf

> nothing stood out as already covered by the estimator.

estimate is 2^142.2 bit operations to break Kyber-512, i.e., 600x fewer

bit operations than what the Kyber-512 round-3 documentation said

(namely 2^151.5)? And this doesn't account for the known speedups that

the documentation says could save "a factor of up to 2^16"?

> To me this supports a point that you’ve been making for many years: we

> should cost memory access, too, to get a better understanding of the

> true costs of these attacks.

For NISTPQC, however, the call for proposals described cutoffs for the

number of "gates". NIST's pqc-forum email dated 23 Jun 2020 21:25:49

+0000 set impossible-to-meet "minimum" criteria for replacing "gates"

with something more realistic. We've repeatedly seen "gates" used as the

foundation for objections to other submissions, including submissions

that

* never had as much spec instability as Kyber,

* never had as much security-analysis instability as Kyber,

* never had as much security degradation as Kyber,

* never had bleeding-edge parameter proposals such as Kyber-512, and

* consistently pointed to the real costs of RAM from the outset.

These objections have had NIST's apparent approval and often explicit

endorsement. So it would look very bad for NIST to rescue Kyber-512 by

ignoring or changing the cutoffs at the last minute.

Furthermore, after NISTIR 8309 introduced its "strongly encourages ...

category 5" in the middle of NISTPQC and retroactively criticized some

submissions for not proposing parameters that reach "category 5" _when

RAM costs are ignored_, it would look even worse for NIST to allow

Kyber-1024 as "category 5" on the basis of RAM costs.

As another example, consider recent comments we've seen on pqc-forum

trying to downplay an attack as supposedly needing a "small-moon-sized

cryptanalytic device" (at 22nm, as if this were the latest technology),

and then imagine what it would look like for NIST to adopt this position

given, e.g., the following quote from NIST's Ray Perlner:

I would also add that math doesn’t seem to care very much about the

energy budget of an earth sized planet or how many atoms are in its

crust, so if your claim to have a large security margin relies

crucially on exceeding some such physical limit, you might not have

as much as you think.

Anyway, according to everything NIST has announced so far, attacks below

"2^143 classical gates" are breaks even if they use a ton of RAM. I

appreciate the enthusiasm of the people who have recently learned and

announced how important RAM cost is in the real world, but somehow the

sudden "RAM! RAM! RAM!" chorus seems to have interfered with reaching

clarity regarding the number of "classical gates" to break Kyber-512.

---D. J. Bernstein

May 8, 2022, 6:14:28 AM5/8/22

to pqc-forum, pqc-...@list.nist.gov

Dear Dan, dear all,

> estimate is 2^142.2 bit operations to break Kyber-512, i.e., 600x fewer

> bit operations than what the Kyber-512 round-3 documentation said

> (namely 2^151.5)? And this doesn't account for the known speedups that

> the documentation says could save "a factor of up to 2^16"?

lines of Martin:

> This corresponds to Q7 of

> https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf
you are referring too. This Q7 accounted for 2^8 of this potential speed-up.

It is in fact a 2^4.4 speed-up. The rest comes from the mis-costing of BDGL

that Matzof pointed too and that we have been discussing earlier in this

thread and correcting in the estimator.

I'm also pointing to the fact that this list of open questions does not only

list potential speed-ups, but also unaccounted potential overheads.

Best regards.

Léo

May 8, 2022, 9:32:39 AM5/8/22

to pqc-forum

Leo Ducas writes:

> It is in fact a 2^4.4 speed-up.

So, to make sure I'm clear about your position regarding the overall
> It is in fact a 2^4.4 speed-up.

status of Kyber-512:

* Compared to the round-3 Kyber documentation estimating 2^151.5

"gates" to break Kyber-512, your current estimate after the latest

attack paper is 600x fewer "gates", i.e., 2^142.2? Is this also the

official Kyber position?

* Furthermore, within the known speedups that the documentation says

could save "a factor of up to 2^16", you're saying that 2^8 could

apply to this 2^142.2, i.e., that known Kyber-512 attacks could

cost just 2^134.2 "gates", well below the AES-128 attack cost?

I understand that you're also pointing to "potential overheads", but is

the Kyber team now claiming on this basis that known attacks require

"2^143 classical gates"?

The estimate of 2^151.5 "gates" also appears to be the basis for NIST's

2020 claim that "Kyber clearly meets the security categories defined in

the CFP". Is the Kyber team continuing to claim these categories?

> It *does* account for some of them. I am unsure how you misread those

> lines of Martin:

Martin right above my question, (2) actively substitute other lines, and

(3) on this basis claim a misreading. The lines that I actually quoted

fully justify the clarification question that I asked.

If Martin erred in writing "nothing" rather than "nothing except X" for

some specific X, then this error is something to attribute to him,

certainly not to the followup clarification question.

---D. J. Bernstein

May 8, 2022, 10:05:42 AM5/8/22

to pqc-...@list.nist.gov

Christopher J Peikert writes:

> How much memory does this need? A fairly precise estimate is at least

> 2^90 bits for pair-sieving (which the MATZOV report uses for its runtime

> analysis), and at least 2^85 bits for triple-sieving (which is slower than

> pair-sieving).

The numbers here are similar to numbers posted in the same thread a
> How much memory does this need? A fairly precise estimate is at least

> 2^90 bits for pair-sieving (which the MATZOV report uses for its runtime

> analysis), and at least 2^85 bits for triple-sieving (which is slower than

> pair-sieving).

month ago, but somehow they seem to have mutated from being rough

estimates a month ago into something that sounds reasonably confident

("fairly precise ... at least").

Would you describe 2^90 and 2^85 as "barriers"? Will there be an

admission of error if these attacks against Kyber-512 are shown to fit

into less memory? Or is the word "fairly" intended to allow subsequent

wiggle room, eliminating falsifiability? Thanks in advance for

clarifying the status of your claim.

> (I derived these numbers using the algorithms' models and real-world

> experiments, which closely align. For example, the data in Table 1 of

> https://eprint.iacr.org/2021/141.pdf nicely fits the triple-sieving model

> of 2^{(0.1887+o(1))*d}. The pair-sieving model has 0.2075 in place of

> 0.1887.)

concrete size, so the claims of fit and alignment must be based on

something else. Can you please spell out your calculations, to support

public assessment of the risks of the 90 and 85 being overestimates?

---D. J. Bernstein

May 8, 2022, 10:13:33 AM5/8/22

to pqc-...@list.nist.gov

On Sun, May 08 2022, D. J. Bernstein wrote:

> Hmmm. For some reason you (1) omit the lines that I actually quoted from

> Martin right above my question, (2) actively substitute other lines, and

> (3) on this basis claim a misreading. The lines that I actually quoted

> fully justify the clarification question that I asked.

>

> If Martin erred in writing "nothing" rather than "nothing except X" for

> some specific X, then this error is something to attribute to him,

> certainly not to the followup clarification question.

For the avoidance of doubt, I did mean “nothing except X” where “X” is the thing (Q7) I had mentioned as being in that list.
> Hmmm. For some reason you (1) omit the lines that I actually quoted from

> Martin right above my question, (2) actively substitute other lines, and

> (3) on this basis claim a misreading. The lines that I actually quoted

> fully justify the clarification question that I asked.

>

> If Martin erred in writing "nothing" rather than "nothing except X" for

> some specific X, then this error is something to attribute to him,

> certainly not to the followup clarification question.

May 8, 2022, 1:30:29 PM5/8/22

to pqc-...@list.nist.gov

'Martin R. Albrecht' via pqc-forum writes:

> For the avoidance of doubt, I did mean “nothing except X” where “X” is

> the thing (Q7) I had mentioned as being in that list.

Um, the message mentioned Q7 as corresponding to a usvp-bdd difference,
> the thing (Q7) I had mentioned as being in that list.

not as being in the 5.3 list (a list that the message cited separately).

Even the weaker notion that the message _hinted_ at Q7 being in 5.3

seems impossible to reconcile with the plain meaning of the word

"nothing" in the self-contained sentence that I quoted before:

> > > Skimming through “5.3 Approximations, overheads, and foreseeable improvements” of

> > > https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf

> > > nothing stood out as already covered by the estimator.

in 5.3" isn't resolving ambiguity; it's retroactively switching to a

different statement with different consequences.

It's amazing that, when I quote a questionable sentence and politely ask

for confirmation of what the sentence is communicating, I'm accused of

misreading---by someone who omits the quote I gave and substitutes a

different quote!---and after two further messages there's still no

admission of error from the actual source of the error.

---D. J. Bernstein

May 12, 2022, 6:01:06 AM5/12/22

to D. J. Bernstein, pqc-...@list.nist.gov

Dear Prof. Bernstein and deal all in PQC community:

The recent advances of dual attacks might bring the worry the possibility of achieving the security goals set by NIST for lattice-based KEM schemes, particularly on dimension of 512. Our recent work shows it may still be possible, but with optimized constructions.

In our recent work: https://arxiv.org/abs/2205.05413 CNTR-512 can have 2^{170.4} gate complexity at 2^{107.4} memory complexity with error probability 2^{-94}. It is tested by running the script provided by Kyber. Assuming each secret key will not be used to decrypt for more than 2^{94} times in its lifttime, this parameter set may achieve security level II (2^{143} gates required by NIST) even if with the recent advances on dual attacks. The details are given in Appendix E in the mentioned paper.

It also appears that the technique used by CNTR may also be applied to NTRU-prime. As it is a new work, we sincerely look forward to your kind comments and critiques to further improve it.

All my best

Yours sincerely

Yunlei

> -----原始邮件-----

> 发件人: "D. J. Bernstein" <d...@cr.yp.to>

> 发送时间: 2022-05-09 01:29:56 (星期一)

> 收件人: pqc-...@list.nist.gov

> 抄送:

> 主题: Re: [pqc-forum] Improved Dual Lattice Attack

The recent advances of dual attacks might bring the worry the possibility of achieving the security goals set by NIST for lattice-based KEM schemes, particularly on dimension of 512. Our recent work shows it may still be possible, but with optimized constructions.

In our recent work: https://arxiv.org/abs/2205.05413 CNTR-512 can have 2^{170.4} gate complexity at 2^{107.4} memory complexity with error probability 2^{-94}. It is tested by running the script provided by Kyber. Assuming each secret key will not be used to decrypt for more than 2^{94} times in its lifttime, this parameter set may achieve security level II (2^{143} gates required by NIST) even if with the recent advances on dual attacks. The details are given in Appendix E in the mentioned paper.

It also appears that the technique used by CNTR may also be applied to NTRU-prime. As it is a new work, we sincerely look forward to your kind comments and critiques to further improve it.

All my best

Yours sincerely

Yunlei

> -----原始邮件-----

> 发件人: "D. J. Bernstein" <d...@cr.yp.to>

> 发送时间: 2022-05-09 01:29:56 (星期一)

> 收件人: pqc-...@list.nist.gov

> 抄送:

> 主题: Re: [pqc-forum] Improved Dual Lattice Attack

> --

> You received this message because you are subscribed to the Google Groups "pqc-forum" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

> To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/20220508172956.158683.qmail%40cr.yp.to.
> You received this message because you are subscribed to the Google Groups "pqc-forum" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

May 12, 2022, 8:55:38 AM5/12/22

to pqc-...@list.nist.gov

'赵运磊' via pqc-forum writes:

> The recent advances of dual attacks might bring the worry the

> possibility of achieving the security goals set by NIST for

> lattice-based KEM schemes, particularly on dimension of 512. Our

> recent work shows it may still be possible, but with optimized

> constructions.

Can you please comment on what's covered by your patents related to this
> The recent advances of dual attacks might bring the worry the

> possibility of achieving the security goals set by NIST for

> lattice-based KEM schemes, particularly on dimension of 512. Our

> recent work shows it may still be possible, but with optimized

> constructions.

work? I noticed that your patents

https://patents.google.com/patent/CN107566121A/en

https://patents.google.com/patent/CN108173643B/en

were reported in the KCL/OKCN/AKCN/CNKE submission, which is very

similar to "NewHope without reconciliation". The patents were filed a

month before "NewHope without reconciliation" was published, and I

haven't seen any analysis of the patent coverage.

It would be useful to see public assurances as to your company's

position regarding usage of "NewHope without reconciliation" and its

variants, such as Kyber, SABER, and your latest proposals.

---D. J. Bernstein

May 12, 2022, 9:42:46 AM5/12/22