Dear PQC forum,
we would like to add our two cents to this discussion with some considerations, given that we have worked on this topic not only for the LEDAcrypt proposal, but also for other research projects.
To the best of our knowledge, most papers on this topic need to make some assumptions on the decoder behavior, while the three works [Tillich2018, Santini2019, Santini2020] study the Decoding Failure Rate (DFR) of Bit-Flipping (BF) decoders for MDPC codes without any kind of assumption: the first two works evaluate the number of errors to have DFR=0, while the last one proposes a practical method to upper bound the DFR for any number of errors. Note that all these approaches only focus on the first decoder iteration and, consequently, the resulting code parameters are not competitive, since BF decoders achieve their best performance when multiple iterations are considered.
However, studying multiple decoder iterations requires to take the correlation among iterations into account. Indeed, the input of the first iteration is truly random, i.e., it is uniformly distributed among all the vectors with some fixed weight. However, starting from the second iteration and for all the subsequent ones, the distribution of the decoder input changes. For instance, the second iteration will receive a vector which is the outcome of the first iteration: clearly, its distribution is correlated to what happened in the first iteration. The same happens for all the subsequent iterations. At the current state of the art, we do not know how to efficiently deal with this correlation unless: i) some additional assumptions on the decoder behavior are introduced, or ii) somehow degraded decoders, with poorer performance, are considered (e.g., the Step-by-Step decoder in [Sendrier2019]).
Among the modifications we proposed for our LEDAcrypt submission during the second round, we considered a two-iteration BF decoder working on QC-MDPC codes. To study such a decoder, we relied on a statistical analysis that was based on some light assumptions (statistical independence of the parity checks), which are also employed in the second part of [Tillich2018]. The resulting parameters were worse than those of BIKE, because we were: i) considering only two iterations and ii) dealing with the second iteration in a pessimistic manner (i e., assuming that it corrects all patterns with some fixed maximum weight and fails for all the others). These choices are clearly conservative, because: i) we are limiting the number of iterations and ii) we are furthermore underestimating the correction capability of the second iteration. Yet, we chose to sacrifice the possibility of much more competitive parameters for the sake of a simpler decoder model, based on light assumptions which we judged trustworthy.
For what concerns extrapolation, honestly, we don't know whether the underlying concavity assumption (Assumption 1 in Vasseur’s Eprint) should be trusted or not. Namely, it is surely true that the relation r --> log ( DFR(r) ) is, for a lot of decoders, concave in some region (this is backed by numerical simulations), but we do not know whether this holds up to the point where DFR(r) >= 2^-\lambda.
On the other hand, it is true that no quantitative argument has been found to date that denies its validity.
In a recent paper, which was presented at CBCrypto 2021 (and that is available here with some updates: https://eprint.iacr.org/2021/1557 ), we independently obtained some of the results which have been discussed in [Vasseur2021]. Namely, we assessed the decoding failure probability of hard to correct errors, which constitute a subset of all the possible error decoder inputs. Our results provide two lower bounds on the DFR of a QC-MDPC codes, the tighter of which relies on counting the size of error sets which have significantly higher DFR than the average (taken over all the possible errors). Such a bound can be improved in tightness if lower DFRs for the said error sets are assessed via Monte-Carlo simulations.
Exploiting a recently published highly efficient implementation of the BIKE decoder [Chen2021], we were able to push the DFR lower bound further up, reporting values in the range of epsilon = 2^-141 -- 2^-140 for current Bike Category 1 parameters.
We also note that, since all quasi-cyclic shifts of the errors causing abnormally high DFR also result in the same abnormally high DFR, if we take quasi cyclicity into account, the DFR bound further raises by a factor p (the size of the QC blocks). Care however should be taken, as there is a low probability of an error coinciding with the cyclic shift of another, leading to double counts.
Nonetheless, we know for sure that the lower bound we are considering, assuming that we are able to get rid of double counts, is between the values epsilon’ = epsilon*p and epsilon. Our current results report values for epsilon’ in the range of 2^-126 -- 2^-127.
While this is not a hard argument to state that the current BIKE DFR claim is invalid, since epsilon’ does not provide a proper lower bound, it still brings the estimate significantly close. Note that this approach, as well as the one based on near-codewords proposed by Vasseur, is based on numerical simulations, and requires to estimate low DFRs. As a consequence, nontrivial numerical simulations should be run to improve it (current results required ~750 core-days). However, adding further computation time can only increase the value of the estimated DFR.
Note that, for LDPC codes, the topic of estimating the floor location has been studied for decades, via techniques to enumerate the so-called trapping sets. Interestingly enough, the problem is NP hard (see, for instance, Mcgregor2010) and existing approaches take a time which is exponential in the parity-check matrix column weight. Yet, we are not aware of such results for the class of MDPC codes, and the error patterns that Vasseur and ourselves have studied are, in principle, only a subclass of all the possible harmful error patterns.
Also, the validity of the extrapolation should be somehow associated to the decoding strategy. It is well known that, for LDPC codes, the same code can exhibit different error floor heights, depending on the considered decoder. Yet, it appears to us that extrapolation is assumed to hold for any kind of decoder: this is unlikely to be true. So, we feel like this is another point which should require some more study.
Long story short, we are not saying that extrapolation is not valid, nor that the BIKE proposed parameters cannot achieve the desired DFR. Our main concern is in the fact that DFR prediction techniques have not been deeply studied yet and, from our point of view, some aspects need to be clarified before extrapolation can be really trusted to be relying on valid assumptions. Vasseur’s PhD thesis contains some very interesting and useful results to investigate the problem, but does not represent the definitive solution.
References
[Tillich2018] Tillich, Jean-Pierre. "The decoding failure probability of MDPC codes," 2018 IEEE International Symposium on Information Theory (ISIT). IEEE, 2018.
[Santini2019] Santini, Paolo, et al. "Hard-decision iterative decoding of LDPC codes with bounded error rate." 2019 IEEE International Conference on Communications (ICC). IEEE, 2019.
[Santini2020] Santini, Paolo, et al. "Analysis of the error correction capability of LDPC and MDPC codes under parallel bit-flipping decoding and application to cryptography." IEEE Transactions on Communications 68.8 (2020): 4648-4660.
[Sendrier2019] Sendrier, Nicolas, and Valentin Vasseur. "On the decoding failure rate of QC-MDPC bit- flipping decoders." International Conference on Post-Quantum Cryptography. Springer, Cham, 2019.
[Vasseur2021] Vasseur, Valentin. "Post-quantum cryptography: a study of the decoding of QC-MDPC codes." Diss. Université de Paris, 2021.
[McGregor2010] McGregor, Andrew, and Olgica Milenkovic. "On the hardness of approximating stopping and trapping sets." IEEE Transactions on Information Theory 56.4 (2010): 1640-1650.
[Chen2021] Chen, Ming-Shing, et al. "Optimizing BIKE for the Intel Haswell and ARM Cortex-M4." IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021(3), 97-124.
From: 'Rafael Misoczki' via
pqc-forum <pqc-...@list.nist.gov>
Sent: Friday 12 november 2021 22:52
To: pqc-...@list.nist.gov
Cc: Carlos Aguilar Melchor
<Carlos.AGUI...@isae-supaero.fr>; Edoardo Persichetti <epersi...@fau.edu>;
Nicolas Sendrier <nicolas....@inria.fr>; Gaborit
<gab...@unilim.fr>; Gilles Zémor
<Gilles...@math.u-bordeaux.fr>; Jean-Christophe Deneuville
<jch.den...@gmail.com>; Loïc Bidoux <loic....@owndata.org>;
Nicolas Aragon <nicolas...@etu.unilim.fr>; Olivier Blazy
<olivie...@unilim.fr>; Paulo Barreto <pbar...@gmail.com>; Shay
Gueron <shay....@gmail.com>; Slim Bettaieb
<slim.b...@gmail.com>; Tim Güneysu <tim.gu...@rub.de>; Valentin
Vasseur <valentin...@inria.fr>; tillich
<jean-pier...@inria.fr>; Ghosh, Santosh
<santos...@intel.com>; Jan Richter-Brockmann
<jan.richte...@ruhr-uni-bochum.de>
Subject: [pqc-forum] BIKE DFR
Dear NIST PQC Forum,
On October 31st, we shared with NIST a document that covers important results
on the Decoding Failure Rate (DFR) estimates for BIKE.
We were waiting for the ePrint (submitted on October 29th) to appear
before sending this note to the forum. It is now published and available at: https://eprint.iacr.org/2021/1458
Topics such as weak keys and near-codewords are covered in detail. The content
comes essentially from our team member Valentin Vasseur's recently published
PhD thesis.
We hope that this document can clarify remaining questions about BIKE decoding
techniques and the associated DFR estimates before NIST makes its final
selection.
Please let us know if you have any questions or comments.
Best Regards,
The BIKE Team