Structure of memory access in sieving

969 views
Skip to first unread message

D. J. Bernstein

unread,
Dec 9, 2023, 10:49:04 AM12/9/23
to pqc-...@list.nist.gov
NIST wrote:
> sieving techniques, which require a large amount of (apparently
> unstructured) access to a very large memory

A recent blog post from John Schanck explains in quite a bit of detail
how to exploit the structure of memory access in one sieving algorithm
to reduce "2^152 adjacent-rack bit moves" to "2^142 adjacent-rack bit
moves". See

https://web.archive.org/web/20231125213807/https://finiterealities.net/kyber512/

specifically the 2023-11-17 update.

The numbers 152 and 142 are based on heuristics, and are each surrounded
by uncertainty intervals (in particular, risks of much smaller numbers)
because of "known unknowns" (see, e.g., the P.P.S. below). As far as I
can see, none of the pre-quantum "known unknowns" would interfere with
the basic structure of the speedup.

NIST didn't cite a source for its "apparently unstructured" claim. This
claim appears to be directly contradicted by the "bucket" structure in
some previous sieving papers---a structure exploited in this speedup.

This claim was part of NIST's rationale for claiming that Kyber-512 is
"unlikely" to be easier to break than AES-128 by known attacks. This
raises two questions:

(1) What exactly is the source of NIST's "apparently unstructured"
claim?

(2) What does the 2023-11-17 update do to NIST's assessment of the
probability of Kyber-512 being easier to break than AES-128?

Regarding the second question, NIST did point to the NTRU submission as
raising the possibility of "a fully local implementation of the BGJ1
sieving algorithm" (in NIST's words), but NIST downplayed this as a
conjecture---and, as quoted above, described sieving as requiring
"apparently unstructured" memory access---whereas the recent blog post
is much more concrete in presenting a way to exploit the structure of
memory access in sieving. Surely this has an impact on risk assessment.

---D. J. Bernstein

P.S. https://finiterealities.net/kyber512 was claiming higher costs for
memory access before the 2023-11-17 update. I would expect further study
of concrete memory-access costs to produce further reductions. Lattices
have a huge attack surface to begin with; optimizing lattice attacks to
account for costs of large-scale memory access is a research challenge
that should not be taken lightly.

P.P.S. I have a new paper https://cr.yp.to/papers.html#hybrid showing
that hybrid attacks produce an exponential speedup in primal attacks
under existing heuristics. This also (1) reduces asymptotic memory use
and (2) allows further time-memory tradeoffs. Section 4 gives a long
list of "known unknowns" beyond what's in the Kyber documentation.
signature.asc

John Schanck

unread,
Dec 9, 2023, 3:12:00 PM12/9/23
to pqc-...@list.nist.gov
On Sat, Dec 9, 2023 at 7:49 AM D. J. Bernstein <d...@cr.yp.to> wrote:
> A recent blog post from John Schanck explains in quite a bit of detail
> how to exploit the structure of memory access in one sieving algorithm
> to reduce "2^152 adjacent-rack bit moves" to "2^142 adjacent-rack bit
> moves".

Please clarify what you mean by "reduce". The original blog post
contains a significant error and claims 2^152 adjacent-rack bit moves.
The updated blog post corrects the error and claims 2^142
adjacent-rack bit moves. The error was quickly pointed out by other
cryptanalysts. None of the NIST finalists made the same error. What
was reduced?

My post is about how cost increases under a memory access constraint.
It does not call the Kyber team's security analysis into question in
any way. And it certainly does not suggest that Kyber512 is less
secure than AES-128.

The post comes with software that can estimate the cost of (one
subroutine of) an attack on Kyber512 under varying memory access
constraints. That software estimates:

* 2^133 operations using 2^0 CPUs and 2^97 bits of RAM per CPU,
* 2^137 operations using 2^17 CPUs and 2^80 bits of RAM per CPU,
* 2^142 operations using 2^37 CPUs and 2^60 bits of RAM per CPU,
* 2^148 operations using 2^57 CPUs and 2^40 bits of RAM per CPU,
* 2^153 operations using 2^77 CPUs and 2^20 bits of RAM per CPU,
* 2^159 operations using 2^97 CPUs and 2^0 bits of RAM per CPU.

The software only costs memory access and data movement. It does not
cost computation, which might dominate in some or all of these
scenarios. The last two or three rows here would probably be
outperformed, in practice, by a local implementation of BGJ1 running
on the same hardware. The AGPS20 tables, which cost computation but
not memory access, estimate 2^161 operations for BGJ1.

John

D. J. Bernstein

unread,
Dec 9, 2023, 5:27:22 PM12/9/23
to pqc-...@list.nist.gov
John Schanck writes:
> On Sat, Dec 9, 2023 at 7:49 AM D. J. Bernstein <d...@cr.yp.to> wrote:
> > A recent blog post from John Schanck explains in quite a bit of detail
> > how to exploit the structure of memory access in one sieving algorithm
> > to reduce "2^152 adjacent-rack bit moves" to "2^142 adjacent-rack bit
> > moves".
> Please clarify what you mean by "reduce".

The cited blog post presents one sieving algorithm (including specific
parameters) and a cost tally of "2^152 adjacent-rack bit moves" for that
algorithm. It then (as part of the specifically cited 2023-11-17 update)
presents a way to exploit the "bucket" structure of memory accesses to
reduce cost to "2^142 adjacent-rack bit moves".

> The original blog post contains a significant error

To clarify, "error" here means that the first algorithm misses cost
improvements (for the target metric) that are in the second algorithm?

> None of the NIST finalists made the same error.

Obviously the Kyber submission didn't miss reductions in memory-access
costs: it didn't even try to quantify those costs!

My questions to NIST are explicitly about NIST's more recent statement
that "sieving techniques ... require a large amount of (apparently
unstructured) access to a very large memory".

The "apparently unstructured" claim is not backed by a citation. It's
contradicted by the bucket structure in some earlier sieving papers, a
structure exploited in the 2023-11-17 update. Hence my questions.

If you mean to endorse this NIST claim, please clearly say so and please
say why.

> The error was quickly pointed out by other cryptanalysts.

Have there been previous public objections to NIST's "apparently
unstructured" claim?

---D. J. Bernstein
signature.asc

Daniel Apon

unread,
Dec 10, 2023, 6:06:21 AM12/10/23
to pqc-forum, D. J. Bernstein, pqc-...@list.nist.gov
Please provide the reference to the NIST statement.

Daniel Apon

unread,
Dec 10, 2023, 6:08:52 AM12/10/23
to pqc-forum, Daniel Apon, D. J. Bernstein, pqc-...@list.nist.gov

John Schanck

unread,
Dec 10, 2023, 11:19:49 AM12/10/23
to pqc-...@list.nist.gov
D. J. Bernstein writes:
> If you mean to endorse this NIST claim, please clearly say so
> and please say why.

Sure. The "apparently unstructured" claim is made in [1], where
Ray Perlner wrote:
> the MATZOV attack is based on sieving techniques, which
> require a large amount of (apparently unstructured) access to
> a very large memory
> [...]
> only 6 bits of security from memory access costs are required
> for Kyber512 to meet category 1
> [...]
> The low end estimate [for memory access cost] of approximately
> 20 bits (from the NTRU submission) is based on a conjecture by
> Ducas that a fully local implementation of the BGJ1 sieving
> algorithm is possible.

BDGL and BGJ1 use a large number of unstructured memory accesses
while bucketing. This is obvious from the descriptions of the
algorithms. The "2023-11-17 update" algorithm interpolates
between BDGL and BGJ1 under varying memory layout constraints.
All of the parameterizations between BDGL and BGJ1 also use a
large number of unstructured memory accesses. (By "large" here I
mean that the cost of unstructured memory accesses is a
significant, if not dominant, part of the total cost. See my
last message in this thread for concrete numbers.)

In [2] you write:
> the corner of the cryptanalytic literature paying attention to
> communication costs usually succeeds in optimizing attack
> _algorithms_ to drastically reduce those costs.
> [...]
> it would be a serious mistake to simply use an attack
> algorithm optimized for "operations".

NIST's "low end estimate" based on BGJ1, which I quoted above,
is precisely this kind of attack algorithm optimization. So is
the "2023-11-17 update".

The high end estimate from [1] based on (a misreading of?) the
NTRU Prime submission makes the "serious mistake" of using BDGL
optimized for "operations" in a model with significant
communication costs. My blog post originally made the same
error.

John

[1] https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/4MBurXr58Rs/m/xHojUDCaBAAJ
[2] https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/W2VOzy0wz_E/m/ZnS_0DwGAwAJ

D. J. Bernstein

unread,
Dec 10, 2023, 4:36:17 PM12/10/23
to pqc-...@list.nist.gov
John Schanck writes:
> NIST's "low end estimate" based on BGJ1, which I quoted above,
> is precisely this kind of attack algorithm optimization. So is
> the "2023-11-17 update".

Agreed. However:

* As I said at the start of the thread, NIST downplayed this as a
conjecture ("The low end estimate of approximately 20 bits (from
the NTRU submission) is based on a conjecture by Ducas that a fully
local implementation of the BGJ1 sieving algorithm is possible").

* NIST also distinguished this conjecture from published algorithms
("[failure] would need local memory access that is much better than
any such published algorithm, and in fact better than any that has
been conjectured").

* The cited NTRU submission also downplayed this as a conjecture, and
also distinguished it from published algorithms: "attacks in local
models have not received the same level of attention, and the
situation is unstable. We discuss a conjecture by Ducas below that,
if true, would cause us to revise our security categories relative
to local models". (The word "would" is weaker than "will".)

* https://web.archive.org/web/20231125213807/https://finiterealities.net/kyber512/
_before_ the 2023-11-17 update downplayed this speedup as "a
conjecture, not an established fact", said that this speedup wasn't
"realistic", and said that this unrealistic speedup plays into "a
false narrative that the cost of lattice attacks has been falling
precipitously".

I don't see how to reconcile any of this with the 2023-11-17 update,
which presents an algorithm with this speedup in quite a bit of detail.
Sure, more details and experimental evidence would be good, but whatever
probability NIST was assigning to this "low end" scenario should be
bumped up close to 100%. (Also, there should be an erratum for the
"false narrative" claim, which still hasn't been crossed out.)

As I said at the start of the thread, the context is NIST's claim that
Kyber-512 is "unlikely" to be easier to break than AES-128 by known
attacks. Hence my question to NIST: "What does the 2023-11-17 update do
to NIST's assessment of the probability of Kyber-512 being easier to
break than AES-128?"

This is a normal risk-assessment question. A bad event that was
previously considered as a _possibility_ has turned into _reality_, so
one has to recalculate any probability assessments flowing from that.
The limited information that NIST has posted about its risk analysis is
missing quantitative weights, so at the moment there's no way for the
public to do this recalculation. (One can't even tell what NIST means
by "unlikely": <50%? <20%? <5%?)

My other question to NIST was where the "apparently unstructured" claim
comes from. Earlier NIST said "we did consult among ourselves and with
the Kyber team"; maybe the records of those discussions show what
happened here, but the records of those discussions are still secret.

I have a transparency request for NIST: Can you please post the records
of those discussions for public review? I also have the same request for
the Kyber team.

In the case of NIST, this transparency is required by law. There's a
NIST regulation, 81 Fed. Reg. 92787, which cites criteria that "will" be
used to evaluate the submissions; those criteria, in turn, say that
"NIST will perform a thorough analysis of the submitted algorithms in a
manner that is open and transparent to the public". Transparency is also
required by FOIA since I filed FOIA requests covering the relevant time
period. More importantly, NIST should be happy to support public review,
so that mistakes are caught as quickly as possible.

> By "large" here I mean that the cost of unstructured memory accesses
> is a significant, if not dominant, part of the total cost.

I agree that interpreting "large" here as "significant" and not as
"dominant" would make the claim correct (rather trivially, since the
claim would then be essentially content-free) when the words are taken
out of context. However, this interpretation (1) doesn't make sense in
context and (2) leaves yet another aspect of NIST's message unjustified.

Specifically, NIST wrote "assessments by the NTRU and NTRUprime teams
gave estimates that would suggest the cost of sieving against category 1
parameters, in models that account for the cost of memory access, is
something like 20 to 40 bits of security more than would be suggested by
the RAM model ... [failure] would require the approximately 40 bits of
additional security quoted as the 'real cost of memory access' by the
NTRUprime submission to be a massive overestimate".

I've separately challenged the text "40 bits of security more than would
be suggested by the RAM model ... approximately 40 bits of additional
security quoted as the 'real cost of memory access' by the NTRUprime
submission" as being (1) not plausible and (2) not what the NTRU Prime
submission says. In short, NIST erroneously multiplied the cost of each
memory access by the number of bit operations, rather than by the number
of memory accesses. None of the purported defenses have even _quoted_
the challenged text, let alone argued that it's correct.

But suppose we ignore the distinction between tallying bit operations
and tallying memory accesses. The structure of NIST's calculation is
still taking a _total_ tally---not just some "significant" part of it---
and multiplying that by the "real cost of memory access".

NIST's "40 bits" is an estimate for the cost of an _unstructured_ memory
access. This perfectly explains NIST writing that sieving requires a
large amount of "apparently unstructured" memory access---if "large" is
understood as talking about the dominant operations. If "large" is just
talking about some fragment of the memory accesses, then it doesn't make
any sense to multiply the _total_ tally of memory accesses by the cost
of each _unstructured_ memory access.

Again, this issue exists independently of the conflation of the number
of bit operations with the number of memory accesses. If I compare
NIST's "40 bits of security more than would be suggested by the RAM
model" to the 2023-11-17 update, then I see three major discrepancies:

* Bit operations sound negligible in the 2023-11-17 update, whereas
they're obviously dominant in "the RAM model".

* The memory access in the 2023-11-17 update is at a vastly smaller
average distance than could possibly justify a 40-bit estimate.

* In the opposite direction, NIST's starting "RAM model" numbers are
considering a wider range of algorithms than the 2023-11-17 update.

Changing the meaning of "large" can shift the portion of NIST's message
that's responsible for the second discrepancy, but doesn't make any of
the discrepancies disappear.

> The high end estimate from [1] based on (a misreading of?) the
> NTRU Prime submission makes the "serious mistake" of using BDGL
> optimized for "operations" in a model with significant
> communication costs.

To clarify, are you claiming that "6. Analysis of known attacks" in the
NTRU Prime documentation missed a publication reporting lower costs for
sieving?

If so, which publication are you referring to? Are you also saying that
the NTRU documentation was wrong in stating that the possibility of such
an algorithm was, at that time, just a "conjecture"?

Regarding "misreading": Yes, obviously. The 2016 NTRU Prime paper
already warned about the "absence of any realistic analyses of sieving
cost for large β". The NTRU Prime documentation has always pointed out
lower-memory options, most obviously enumeration (the exponent of which
then dropped by a third!); evidently the documentation is not endorsing
more limited security analyses that ignore those options.

More broadly, the NTRU Prime documentation contains extensive warnings
about the risks of overestimates and attack improvements. For example,
here are the first two paragraphs in "6.11. A selection of estimates":

There are many proposals of lattice-based cryptosystems, and of
parameter sets for those cryptosystems. What are the pre-quantum and
post-quantum security levels of each of these parameter sets?

The literature contains many different strategies to compute answers
to this question. Given the issues described in Sections 6.4 through
6.9, it seems reasonably clear that all of these strategies are
wrong, but that some strategies are more wrong than others. This
creates serious problems in using the results: for example, deciding
what parameter sizes are needed, or evaluating the merits of more
specific design decisions such as the choice of weight.

These warnings are critical for understanding the meanings of the
numbers, and are not properly reported in NIST's text.

---D. J. Bernstein
signature.asc

Daniel Apon

unread,
Dec 10, 2023, 10:03:30 PM12/10/23
to pqc-forum, D. J. Bernstein, pqc-...@list.nist.gov
" In the case of NIST, this transparency is required by law. There's a
NIST regulation, 81 Fed. Reg. 92787, which cites criteria that "will" be
used to evaluate the submissions; those criteria, in turn, say that
"NIST will perform a thorough analysis of the submitted algorithms in a
manner that is open and transparent to the public"."

The public should note that the obfuscated reference in this email means the NIST PQC Call for Papers:

1) " 81 Fed. Reg. 92787 " refers to https://www.govinfo.gov/content/pkg/FR-2016-12-20/pdf/2016-30615.pdf
2) This points to (in text format): https://www.nist.gov/pqcrypto
2) (Taking the likely guess, this means:) https://csrc.nist.gov/CSRC/media/Projects/Post-Quantum-Cryptography/documents/call-for-proposals-final-dec-2016.pdf

The (omitted) text and context is then:

NIST anticipates that the evaluation process for these post-quantum cryptosystems may be significantly more complex than the evaluation of the SHA-3 and AES candidates. [various reasons cited]

As a result of these complexities, NIST believes that its post-quantum standards development process should not be treated as a competition; in some cases, it may not be possible to make a well-supported judgment that one candidate is “better” than another. Rather, NIST will perform a thorough analysis of the submitted algorithms in a manner that is open and transparent to the public, as well as encourage the cryptographic community to also conduct analyses and evaluation. This combined analysis will inform NIST’s decision on the subsequent development of post-quantum standards.

John Schanck

unread,
Dec 11, 2023, 6:27:16 PM12/11/23
to pqc-...@list.nist.gov
D. J. Bernstein writes:
>
> John Schanck writes:
> > The high end estimate from [1] based on (a misreading of?) the
> > NTRU Prime submission makes the "serious mistake" of using BDGL
> > optimized for "operations" in a model with significant
> > communication costs.
>
> To clarify, are you claiming that "6. Analysis of known attacks" in the
> NTRU Prime documentation missed a publication reporting lower costs for
> sieving?

No, I said "The high end estimate from [1]" (and I) made that
mistake.

In [1] Ray writes:
> The largest practical implementation of sieving techniques we
> know of, described in detail in “Advanced Lattice Sieving on
> GPUs, with Tensor Cores” by Ducas, Stevens, and van Woerden
> (https://eprint.iacr.org/2021/141), was forced by memory
> access limitations, to adopt settings for bucket size, that
> would be suboptimal according to the RAM model.

Section 4.2 of the cited ePrint 2021/141 uses BDGL-like
bucketing and says "To optimize the parameters we again balance
the cost of bucketing and reducing. [...] we essentially obtain
bgj1 with [...] time complexity of 2^{0.349n+o(n)}".

Any serious analysis of memory access costs in sieving after
ePrint 2021/141 should have applied similar optimizations /
should have seen that the NTRU Prime documentation used
sub-optimal parameters.

I don't think NIST's conclusion about the security of Kyber512
depended on the NTRU Prime documentation, so I don't think this
oversight is terribly important.

John

[1] https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/4MBurXr58Rs/m/xHojUDCaBAAJ

D. J. Bernstein

unread,
Dec 11, 2023, 7:55:38 PM12/11/23
to pqc-...@list.nist.gov
John Schanck writes:
> Section 4.2 of the cited ePrint 2021/141 uses BDGL-like
> bucketing and says "To optimize the parameters we again balance
> the cost of bucketing and reducing. [...] we essentially obtain
> bgj1 with [...] time complexity of 2^{0.349n+o(n)}".

Wow.

Why was NIST claiming in late 2022 that this speedup was just a
conjecture? NIST said it consulted with the Kyber team; why weren't
there objections from the intersection of the paper authors and the
Kyber team? Why hasn't there been an erratum from NIST?

We really need transparency as to what happened here.

---D. J. Bernstein
signature.asc

Daniel Apon

unread,
Dec 11, 2023, 8:01:48 PM12/11/23
to pqc-...@list.nist.gov

--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/20231212005450.2415240.qmail%40cr.yp.to.
Reply all
Reply to author
Forward
0 new messages