A uniform random dxd matrix over F_2 is invertible with probability
exactly (1-1/2)(1-1/4)(1-1/8)...(1-1/2^d). See, e.g., Theorem 99 in
Dickson's 1901 book on linear groups:
https://archive.org/details/lineargroupswith00ledi/page/n89/mode/2up
The probability is within 1/2^d of its limit as d->infinity. The limit
is
prod_{integers d >= 1} (1-1/2^d)
= sum_{integers k} (-1)^k 2^(-k(3k+1)/2)
= binary 0.010010011110111000000100001111111101111100000000001000
000111111111111011111110000000000000010000000011111111
111111110111111111000000000000000000100000000001111111...
= 0.288788095086602421278899721929230780088911904840685784114741...
by Euler's pentagonal-number theorem.
Public keys in the original McEliece cryptosystem, with the usual
parameter choices, are commonly conjectured to be indistinguishable from
uniform random matrices of the same size. This indistinguishability
implies indistinguishability of the leading square matrix from uniform,
in turn implying that an invertibility test doesn't distinguish the
leading square matrix from uniform, i.e., that the invertibility chance
is indistinguishable from (1-1/2)(1-1/4)(1-1/8)...(1-1/2^d).
Statistically pinning down the actual probability is a simple matter of
generating many McEliece matrices and seeing how often the leading
square matrix is invertible; or, for the reciprocal of the probability,
running keygen many times, as in the script below. (For experiments
using deterministic RNG seeds, change "fast" to "known" in the script.)
An experiment generating 1000000 keys for mceliece6960119 used 3466938
matrices in total.
The limited statement that the probability is >=25% implies that there
is a "security difference of at most 2 bits" (to quote the Classic
McEliece submission) between systematic-form public keys and arbitrary
public keys. For formal verification, it's best to include this limited
statement as a hypothesis, since the statement is directly statistically
verifiable, rather than deriving the statement from the hypothesis of
public-key indistinguishability, which is overkill for the security
analysis.
---Dan (speaking for myself)
m=mceliece6960119
mkdir goppasystematic
cd goppasystematic
wget
https://bench.cr.yp.to/supercop/supercop-20220213.tar.xz
tar -xf supercop-20220213.tar.xz
cd supercop-20220213
sed -i 1q okcompilers/c
: > okcompilers/cpp
chmod +t crypto_kem/$m/ref
touch crypto_kem/$m/used
for opi in crypto_kem/$m/*/
do
python3 -c '
import sys
gaussstate = 0
print("long long numgauss = 0;")
print("long long numsystematic = 0;")
for line in sys.stdin:
if gaussstate == 0 and line.find("gauss") >= 0:
gaussstate = 1
print("++numgauss;")
sys.stdout.write(line)
if gaussstate > 0:
gaussstate += line.count("{")
if line.count("}") > 0:
gaussstate -= line.count("}")
if gaussstate == 1:
print("++numsystematic;")
gaussstate = -1
' < "$opi/pk_gen.c" > "$opi/pk_gen.c.new" \
&& mv "$opi/pk_gen.c.new" "$opi/pk_gen.c"
done
./do-part init
./do-part keccak
./do-part crypto_sort int32
./do-part crypto_hash shake256
./do-part crypto_stream chacha20
./do-part crypto_rng
./do-part crypto_kem $m
(
echo '#include <stdio.h>'
echo '#include <stdlib.h>'
echo '#include "crypto_kem_'"$m"'.h"'
echo 'unsigned char pk[crypto_kem_mceliece6960119_PUBLICKEYBYTES];'
echo 'unsigned char sk[crypto_kem_mceliece6960119_SECRETKEYBYTES];'
echo 'void crypto_declassify(void *x,unsigned long long xlen)'
echo '{'
echo '}'
echo 'void randombytes_callback(void *x,unsigned long long xlen)'
echo '{'
echo '}'
echo 'extern long long numgauss,numsystematic;'
echo 'int main(int argc,char **argv)'
echo '{'
echo ' for (long long loop = 0;loop < atoll(argv[1] ? argv[1] : "100");++loop)'
echo ' crypto_kem_mceliece6960119_keypair(pk,sk);'
echo ' printf("gauss %lld systematic %lld\n",numgauss,numsystematic);'
echo ' return 0;'
echo '}'
) > experiment.c
gcc -I bench/*/include/*/constbranchindex -o experiment experiment.c \
bench/*/lib/*/fastrandombytes.o \
bench/*/lib/*/kernelrandombytes.o \
bench/*/lib/*/libsupercop.a \
bench/*/lib/*/libkeccak.a
./experiment 10000