Structural analysis of McEliece asymptotically better than generic decoding

1,321 views

Skip to first unread message

Hugues Randriam

unread,

Jul 29, 2024, 11:06:52 AM7/29/24

to pqc-forum

Dear pqc-forum members,

I'm pleased to share with you the following recent work:
https://eprint.iacr.org/2024/1193

Up to now the situation with McEliece was as follows:

(1) Despite interesting results these last two years, structural analysis of the McEliece system was either ridiculously slow (i.e. comparable to exhaustive search of the private key), or did not apply to the codes proposed in actual schemes (e.g. required extreme parameters, or non-Goppa codes).

(2) Security of the system was thus defined relative to the best message recovery attacks, through generic decoding algorithms.
For Goppa codes of rate R and length n, Prange's 1962 algorithm has complexity (c(R)+o(1))^(n/log(n)).
After more than six decades, and several dozens of papers on the subject, the best algorithms still have complexity (c(R)+o(1))^(n/log(n)), for the same constant c(R).
All improvements remain confined in the o(1) term. Thus, improving this constant c(R) would be a major breakthrough.

The main claim of this paper is a new distinguisher addressing these two points.
Compared e.g. with the recent Couvreur-Mora-Tillich distinguisher, it has a broader range of distinguishable parameters (including those of the Classic McEliece proposition), and it runs much faster for equal parameters.
Moreover, its asymptotic complexity is better than that of generic decoding algorithms.
Actually, it does better than improving the constant c(R): it improves the n/log(n) exponent, to n*log(log(n))^3/log(n)^2.
Equivalently, it replaces the contant c(R) with a quantity that goes to 1 at speed c'(R)^(log(log(n))^3/log(n)).

However for the moment this is only an asymptotic result.

That said, the paper introduces a new tool for cryptanalysis: higher order modules of syzygies of codes, or of sets of points in projective space, as can be learned e.g. from the first chapters of Eisenbud's _Geometry of syzygies_.
As far as I know this is the first time these are applied in cryptography. Of course syzygies are already used by cryptographers, but only the first module of syzygies. While we make use of the full minimal resolution.
One can expect further progress on the structural analysis of McEliece (and possibly other schemes) when the community will have mastered this new notion.

The paper has not been peer-reviewed yet. Hopefully, making it public as early as possible will help the process.

D. J. Bernstein

unread,

Aug 16, 2024, 3:49:54 PM8/16/24

to pqc-...@list.nist.gov

This is a message from the Classic McEliece team.

We appreciate the extensive community interest in all aspects of the
McEliece cryptosystem. https://classic.mceliece.org/papers.html tracks a
wide range of papers, and we encourage further research. Confidence in
the security of the cryptosystem is founded upon the long, continuing,
publicly documented series of failures to break the cryptosystem.

This message analyzes what the claims in https://ia.cr/2024/1193 mean
for Classic McEliece, assuming the claims are correct. The short summary
is that the paper's syzygy distinguisher does not affect any of the
Classic McEliece security claims. The Classic McEliece security analysis
avoids relying on the indistinguishability property targeted by
2024/1193. Furthermore, the distinguisher has vastly higher cost than
any of the security levels claimed by Classic McEliece, so it would
still not affect the system even if it could somehow be converted into a
key-recovery attack.

The paper's main claims are nevertheless of potential interest for
computational complexity theory. The claims rely on conjectures
formulated in the paper, and we encourage further investigation of
whether the distinguisher works.

The rest of this message is organized as follows: (1) This distinguisher
has cost far above 2^256. (2) Classic McEliece avoids relying on this
indistinguishability property. (3) Notes on the conjectures in
2024/1193. (4) Further comments. (5) Summary of impact.

1. This distinguisher has cost far above 2^256

The syzygy distinguisher is, according to the estimates in 2024/1193,
much slower than a ludicrously expensive key-recovery attack already
explained in the Classic McEliece documentation.

Specifically, an attacker can recover private keys in at most 2^256
key-generation operations by searching through all 256-bit seeds. This
attack is noted on, e.g., page 24 of the Classic McEliece security
guide, https://classic.mceliece.org/mceliece-security-20221023.pdf.

A side effect of recovering a private key is that the public key is
distinguished from a uniform random matrix (with probability extremely
close to 1, since private keys are much shorter than public keys). A
much more important side effect of recovering a private key is being
able to break every ciphertext encapsulated to the public key.

The reason this attack is not of concern is that searching through
guesses for 256-bit secrets is infeasible, even with a future quantum
computer. As page 3 of the security guide puts it, "guessing 256-bit
secrets" is "not a real-world threat".

For comparison, page 26 of paper 2024/1193 considers our smallest
proposed parameters (3488,64), and conjectures that distinguishing
public keys from uniform random matrices has "complexity around 2^529"
(using linear algebra acting on vectors of length "around 2^223").
Whether or not this conjecture is correct, complexity 2^529 is much
slower than searching through 2^256 seeds.

There are many previous papers on infeasible methods to detect the
structure of McEliece keys. Section 3.1 of the security guide says the
following:

For all parameters of interest, the fastest attacks known against the
OW-CPA problem treat G as a general matrix with no evident structure.
... There are also much slower “structural attacks” known that try to
exploit the secret structure of G, for example by recovering
Goppa-code parameters for G.

Distinguishers costing far above 2^256, whether or not they recover
keys, are consistent with this statement and with the rest of the
security analysis. Section 3.6 of the security guide mentions various
other examples of very slow key-recovery attacks.

The "fastest attacks" comment is referring to ISD, which takes about
2^150 bit operations for (3488,64). For precise quantification and high
assurance regarding the number of bit operations used by many ISD
variants, see the CryptAttackTester paper at Crypto 2024.

2. Classic McEliece avoids relying on this indistinguishability property

Beyond being extremely slow, the syzygy distinguisher is attacking a
problem that appears in some papers but that the Classic McEliece
security analysis explicitly avoids relying on.

The security guide organizes the security analysis as follows:

* The Classic McEliece security goal is IND-CCA2. See Section 1. (We
recommend using separate modules for generic transformations that
aim at anything beyond an IND-CCA2 KEM; see Section 6.)

* IND-CCA2 attacks much faster than QROM IND-CCA2 attacks would be
surprising for the selected hash function. See Section 5.3.3. This
lets reviewers focus on QROM IND-CCA2 attacks.

* QROM IND-CCA2 security follows tightly from the hypothesis of
OW-CPA security of the underlying PKE. See Section 5. After
checking this, reviewers can focus on OW-CPA security.

* OW-CPA security of the underlying PKE follows tightly from the
hypothesis of OW-CPA security of the original McEliece PKE. See
Section 4. This lets reviewers focus on McEliece OW-CPA security.

* Section 3 reviews the state of the art in McEliece OW-CPA attacks.

This provides a clear structure for Classic McEliece security reviews.
The only ways that something can possibly go wrong are some sort of
disaster involving SHAKE256, a mistake in the tight reductions, or a
better OW-CPA attack against the original McEliece system.

The literature sometimes mentions the following trivial implication:
McEliece OW-CPA security follows from OW-CPA security of a uniform
random matrix if the McEliece public key is indistinguishable from a
uniform random matrix. Of course, finding a distinguishing attack faster
than the target OW-CPA security level would render this implication
vacuous.

The Classic McEliece security guide notes this motivation for studying
distinguishing attacks, but it also notes that this is "not a two-way
implication---perhaps there are distinguishers even if OW-CPA is
secure". Even fast high-probability distinguishers would not be relevant
to any of the Classic McEliece security claims.

As a (real) pre-quantum analogy for such (hypothetical) distinguishers,
consider the following trivial implication: if RSA public keys are
indistinguishable from uniform random integers of the same size then
OW-CPA for an RSA public key follows from OW-CPA for the same system
with a uniform random integer. This is vacuous, since RSA keys are
efficiently distinguishable from uniform. OW-CPA is nevertheless a
reasonable pre-quantum hypothesis for RSA. The distinguisher doesn't
threaten any of the security properties that NIST claims for RSA.

In other words, for an algorithm to have an effect on the Classic
McEliece security analysis, it would have to be much faster than the
2024/1193 distinguisher _and_ would have to be a key-recovery attack or
something else that breaks OW-CPA.

3. Notes on the conjectures in 2024/1193

Paper 2024/1193 claims that its distinguisher is "subexponential in
the error-correcting capability, hence better than that of generic
decoding algorithms".

Quantitatively, with the usual choice of parameters (n,t) where t is
proportional to n/log n, the cost of ISD is exponential in n/log n (for
almost all matrices, under amply tested assumptions), while this paper
claims a distinguishing cost exponential in n (log log n)^3 / (log n)^2.

(Note that exponential in n (log log n)^3 / (log n)^2 is slightly
subexponential in n/log n. Once n is large enough, the (log log n)^3 in
the numerator is outweighed by the extra log n in the denominator.)

If this claim is correct then presumably a wider range of algebraic
attacks should also try iterating XL to compute higher-order syzygies.
However, the question of whether the claim is correct needs further
investigation.

The largest distinguishing experiments reported in the paper, at the
bottom of page 25, are for t = 3 and n below field size 64. The paper
reports that the distinguisher becomes less reliable for n below 56 and
completely unreliable for n below 50. The paper explains this as being
correlated with n crossing a line in "Heuristic 1". It is surprising
that the abstract does not mention that the paper's claims depend on a
new heuristic formulated in the paper.

The evidence presented for the heuristic consists of other experiments
for n = 56 reported on page 34. Those experiments also detect failures.
Heuristic 1 says "high probability" without quantification; there is no
evidence of how the failure probability scales with n and t. We
encourage the author to carry out a wider range of experiments.

4. Further comments

Four specific comments in the paper might seem to indicate that the
paper's claims contradict Classic McEliece security claims, so we
address those comments here.

4a. The paper claims that "a security proof" for the McEliece
cryptosystem relies on (1) the assumption of indistinguishability from
uniform random matrices and (2) the hardness of decoding random codes,
which the paper says "stands on a firm theoretical ground: the decoding
problem for generic linear codes is known to be NP-hard".

It is true that the literature sometimes assumes indistinguishability,
and sometimes claims connections between NP-hardness and security.
However, the Classic McEliece security analysis does not claim any such
connections, and does not assume indistinguishability. The analysis is
instead explicitly founded upon a simpler assumption, OW-CPA, which is a
major target of McEliece cryptanalysis.

4b. The paper claims to disprove the belief that "we could possibly have
reached the intrinsic complexity of cryptanalysis of this system", and
claims that this is "the first time an analysis of the McEliece
cryptosystem breaks the exponential barrier".

As a counterexample, "sloppy Alice" attacks from the turn of the century
take polynomial time and are analyses of the McEliece cryptosystem.
NTS-KEM and PALOMA are two examples of newer McEliece-based KEMs that
had IND-CCA2 efficiently broken using variants of these attacks. The way
that Classic McEliece obtains QROM IND-CCA2 security from just OW-CPA
security is an important, explicit feature of the design of Classic
McEliece, and also means that a hypothetical fast distinguisher would
still not qualify as a break of Classic McEliece. It is important to be
clear about which problems are being attacked.

4c. The paper claims that "previous distinguishers" have "strong regime
limitations", making those distinguishers inapplicable to "the codes
used in Classic McEliece".

The documentation already notes various examples of attacks finding the
codes used in Classic McEliece. Again, these key-recovery attacks imply
distinguishers and, more importantly, OW-CPA attacks. The reason that
Classic McEliece dismisses these attacks is explicitly that the attacks
are much slower than the state-of-the-art OW-CPA attacks for "all
parameters of interest".

4d. The paper says "Observe [5, §3.4] that the security parameter in
the Classic McEliece system, based on the complexity of generic decoding
algorithms, is linear in t ... our complexity is subexponential in the
security parameter".

Here [5] is the security guide. Presumably the intention here was to say
"claimed security level" rather than "security parameter". (There is no
security parameter in Classic McEliece. The Classic McEliece security
analysis maps _forward_ from the selected choices of parameters n, t,
etc. to security levels; the Classic McEliece rationale says that
mapping backwards from security parameters is not robust.) However,
Section 3.4 is explicitly on "Asymptotic costs of information-set
decoding", and Section 3 is explicitly on "OW-CPA security", not on
broader notions of security.

5. Summary of impact

The claims in 2024/1193 do not contradict any of the Classic McEliece
security claims. There are two independent reasons for this, one
quantitative and one structural:

* The costs conjectured in 2024/1193 are much more expensive than ISD
for all proposed Classic McEliece parameters.

* The costs are for an algorithm that is merely a distinguisher, not
an OW-CPA attack. The targeted indistinguishability assumption is
not used in the Classic McEliece security analysis; it is even
explicitly disclaimed by the Classic McEliece security analysis.

As noted above, even a fast distinguisher wouldn't violate the Classic
McEliece security claims. A distinguisher above the target OW-CPA
security level is even weaker: it doesn't even render vacuous the
aforementioned trivial implication. A hypothetical extension of the
distinguisher to an OW-CPA attack would certainly not make it faster.

---D. J. Bernstein (on behalf of the Classic McEliece team)

signature.asc

Hugues Randriam

unread,

Aug 21, 2024, 9:05:26 AM8/21/24

to pqc-forum, D. J. Bernstein, pqc-...@list.nist.gov

Dear Dan, dear forum members,

First, thanks for pointing out elements in the paper that might appear unclear, inaccurate, or misleading. Corrections will be brought in an updated version.

The paper mainly focuses on the general mathematical principles of the McEliece system, not specifically on its Classic McEliece instantiation.
But it is important indeed to understand its implications for the security of Classic McEliece.
On this point, I confirm that I consider your conclusions are in accord with the results claimed.

I apologize for not explicitly providing more experimental data on how the distinguisher runs in practice. Actually, these can be inferred from the tables given page 33.
For instance (Fig. 12) a binary Goppa code with m=10, n=1024, t=9, can be distinguished from a random code by computing beta_{3,4} of its 34-shortened dual. We consistently find beta_{3,4}=30 in the Goppa case vs 0 in the random case.
I was able recently to extend this to m=10, n=1024, t=10, now with the 40-shortened dual.
I will modify the paper to make this more apparent.

Heuristic 1 already includes restrictions on its use, motivated by the discussion pages 22-23. In all experiments, including those mentioned in this message, I saw no deviation from its validity.

Anyway, it would be very desirable indeed to have a proof, and also a more quantitative version of this heuristic.