Question on Classic McEliece definition of FixedWeight

261 views
Skip to first unread message

Wrenna Robson

unread,
Feb 14, 2023, 9:59:05 AM2/14/23
to pqc-forum
Hi,

I have a question for the Classic McEliece authors (though if someone
else here has insight I would welcome it). In the Round-4 version of
the specification, it is stated: "Three of these algorithms, namely
FixedWeight, KeyGen, and Encap, are randomized,
generating random bits at specified moments. The set of strings of
random bits allowed as
input for the corresponding mathematical functions is defined as the
set of strings of random
bits consumed by these algorithms."

This makes sense to me. However, what then is the domain of the
FixedWeight mathematical function? As written, steps 3 and 4 can cause
a restart of the algorithm back to 1, which then requires taking more
random bits. So in theory FixedWeight could need an arbitrary number
of random bits before it successfully outputs, and it isn't
deterministic.

This is in contrast to the approach with KeyGen, where KeyGen is
described as a no-input function which generates a random string l,
and then SeededKeyGen uses the pseudorandom bit generator G in order
to generate the required bits: if there is a restart in SeededKeyGen,
the output of G is used to produce a new seed. So once you decide the
initial seed, it is deterministic.

Is there an intent that FixedWeight be given a definition which makes
it deterministic in an analogous way? If so, what is the intended
source of the new bits each loop? Presumably it would not be good to
re-use G (though that is simply an instinct on my part).

Best,

Wrenna Robson

D. J. Bernstein

unread,
Feb 24, 2023, 2:51:32 AM2/24/23
to pqc-forum
The general context here is the Classic McEliece specification defining
exactly how random input is used, including exactly how much randomness
is used. https://classic.mceliece.org/mceliece-rationale-20221023.pdf
includes the security rationale for this in Section 4.

In particular, pure rejection-sampling loops such as "flip a coin until
it's heads" can consume an arbitrary amount of randomness, although the
chance of this happening drops exponentially with the amount. For
careful testing and verification, it's important to note that the
possible coin sequences used are heads, tails-heads, tails-tails-heads,
etc., and not, e.g., heads-tails with the tails being ignored.

Wrenna Robson writes:
> However, what then is the domain of the FixedWeight mathematical
> function?

The specification "defines each mathematical function F by presenting an
algorithm to compute F" (page 3). For randomized functions such as
FixedWeight, the set of strings of random bits allowed as input "is
defined as the set of strings of random bits consumed by these
algorithms" (page 3).

In particular, the specification presents a FixedWeight algorithm (page
10). The FixedWeight mathematical function is the function computed by
this algorithm. The domain of the function is the set of strings of
random bits consumed by this algorithm. The output of the mathematical
function, for any particular string in the domain, is the output of this
algorithm for that string of random bits.

FixedWeight begins by generating sigma_1 t random bits (Step 1). It then
performs a specific computation (Steps 2, 3, 4), which may lead the
algorithm to restart; otherwise the algorithm produces an output (Steps
5 and 6). Consequently, the domain consists of

* all accepted strings R_0 of sigma_1 t bits, where "accepted" means
that the algorithm doesn't restart;

* all strings R_0+R_1 (with + for concatenation) where R_0 is a
rejected string of sigma_1 t bits ("rejected" means non-accepted)
and R_1 is an accepted string of sigma_1 t bits;

* all strings R_0+R_1+R_2 where R_0 and R_1 are rejected and R_2 is
accepted;

etc.

Note that, in general, algorithms do not necessarily halt. If you're
formalizing the definition of the output of an algorithm, you have to
include an extra output possibility meaning non-halting. See, e.g., the
definition of "M(x) = ⬈" on page 20 of Papadimitriou's 1993
Computational Complexity textbook.

In particular, pure rejection-sampling algorithms such as FixedWeight
can consume an infinite string R_0+R_1+... rather than halting---which
doesn't matter in reality for any well-designed algorithm since it
happens with probability 0, but does matter for rigorous proofs. The
domain includes all R_0+R_1+... such that each R_i is rejected.

Something else to watch out for in formalization is that elementary
definitions of randomness for finite sets can't formalize the infinite
RNG-output string as a random object, can't formalize statements such as
"the average number of coin flips until heads is 2", and can't formalize
statements about average run times of typical algorithms. The necessary
definitions for rigorous analysis of probabilistic algorithms come from
measure theory. See https://gilith.com/papers/termination.pdf for an
example of formally verifying the behavior of probabilistic algorithms.

> As written, steps 3 and 4 can cause
> a restart of the algorithm back to 1, which then requires taking more
> random bits. So in theory FixedWeight could need an arbitrary number
> of random bits before it successfully outputs,

Right. Pure rejection-sampling algorithms can consume arbitrarily long
finite strings, or infinite strings in the non-halting case.

> and it isn't deterministic.

Correct. That isn't tied to the number of random bits used or to the
internal rejection-sampling structure: encapsulation algorithms (and
keygen algorithms) always have to be randomized for security.

It's normal for larger systems to make their own decisions of RNGs as
separate modules from cryptosystem specifications. Security review is
partitioned the same way; see Section 4 of the rationale.

> This is in contrast to the approach with KeyGen, where KeyGen is
> described as a no-input function which generates a random string l,
> and then SeededKeyGen uses the pseudorandom bit generator G in order
> to generate the required bits: if there is a restart in SeededKeyGen,
> the output of G is used to produce a new seed. So once you decide the
> initial seed, it is deterministic.

Right. You can think of key generation as being constructed in three
layers: first there's a pure rejection-sampling algorithm; then this is
composed with a stream cipher generating randomness for that algorithm;
then this is composed with the external RNG to generate the cipher key.

The reason for this complication is explained in Section 4.2 of the
rationale. In short, the details support a spectrum of interoperable
compression options for the private key.

For formalization, one disadvantage of merging symmetric cryptography
into rejection-sampling loops is that suddenly it's no longer clear that
the composed algorithm has probability 0 of running forever. However,
suitable PRF assumptions imply that the probability is extremely close
to 0.

> Is there an intent that FixedWeight be given a definition which makes
> it deterministic in an analogous way?

Any test framework that provides a deterministic randombytes() for test
vectors (e.g., SUPERCOP) is already doing this in a modular way, without
complicating individual cryptosystem specifications.

McTiny needed to compress the encapsulation state to meet extreme
performance requirements, and simply took the modular approach to
compressing this state. McEliece-specific compression options provide
interesting performance tradeoffs for private keys but not for the
encapsulation state.

See Section 4.5 of the rationale: "Applications might plug in an RNG
that generates output from a series of seeds as in key generation,
allowing a ciphertext to be compressed to the final seed. See, e.g.,
[8]. The same compression approach works for rejection sampling in much
more generality."

---D. J. Bernstein (on behalf of the Classic McEliece team)
signature.asc

Wrenna Robson

unread,
Feb 24, 2023, 3:09:28 AM2/24/23
to pqc-forum
Thanks Dan - that's a really useful answer and I appreciate the clarity and time spent on the response. I don't suppose you have a reference for how large each set of accepted bits (R_0, R_0 + R_1 (different R_0 I think?), etc. will be? (Obviously they are decreasing in size and I assume fairly quickly so that you have a reasonable probability of nearly 0 after not many iterations - I just haven't sat down to do the calculations on exactly how large they are but it occurred to me that you probably already have somewhere.)

Best,

Wrenna

--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/20230224075106.560655.qmail%40cr.yp.to.

D. J. Bernstein

unread,
Feb 24, 2023, 4:34:44 AM2/24/23
to pqc-forum
Wrenna Robson writes:
> I don't suppose you have a reference for how
> large each set of accepted bits (R_0, R_0 + R_1 (different R_0 I think?),
> etc. will be?

https://classic.mceliece.org/mceliece-rationale-20221023.pdf#subesction.4.5
includes an exact formula for the acceptance probability of each
FixedWeight iteration: the probability of getting past Step 3 is the sum
of the coefficients of x^t,x^(t+1),...,x^tau in (1-n/q+(n/q)x)^tau, and
then the conditional probability of getting past Step 4 is
(1-1/n)(1-2/n)...(1-(t-1)/n).

---D. J. Bernstein (speaking for myself)
signature.asc

Wrenna Robson

unread,
Feb 24, 2023, 4:51:21 AM2/24/23
to pqc-forum
Ah - thank you (sorry for asking questions partially/wholly answered by the documents, but thank you for having good documents!)

--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
Reply all
Reply to author
Forward
0 new messages