Dear James,
thank you for your interest!
The submission document [3] reports "measurements of a separate FPGA
implementation of the core mathematical functions (not including, e.g.,
hashing)".
Page 34 of [3] says that encapsulation, decapsulation, and key
generation are fast in hardware, and then says that the "hardware speeds
of key generation and decoding are already demonstrated by our FPGA
implementation". Decoding is a major subroutine in decapsulation.
A natural next step is to build a complete implementation of the KEM.
There is plenty of literature on hardware implementations of SHAKE. We
expect that the total hardware resources for the KEM will be not much
more than the core mathematical operations, and that the additional cost
in terms of latency due to the KEM construction will be similar to other
KEM constructions.
On 18/05/2020 17:39, James Howe wrote:
> I wanted to see if you could add any more insights into the hardware
> designs quoted in your latest specifications [3]. The paper by Wang et al.
> [1], cited a lot in the specifications, is for Niederreiter with Goppa
> codes as an encryption scheme, so I'm interested to see the differences
> between this and the McEliece KEM submission.
I am aware of only two references to [1] in the submission document;
neither of them is part of the specification.
As the submission document explains, the FPGA implementation covers only
the core mathematical functions. For example, the implementation
includes decryption, but it does not include the hashing required for
full decapsulation.
> One of my main questions is, as seen in a lot of hardware/software
> implementations before, how much more SHAKE would cost you in terms of area
> and/or performance.
Please see the literature on SHAKE implementations. The focus in [1] and
[2] is on the core mathematical computations.
> Your specifications [3] seems to suggest [1] and/or [2]
> implement the McEliece KEM,
No, [3] does not suggest this. It states that the FPGA implementation is
"of the core mathematical functions (not including, e.g., hashing)".
> as on page 34 of [3] you state that
> "encapsulation and decapsulation are reasonably fast in software, and
> impressively fast in hardware, due to the simple nature of the objects
> (binary vectors) and operations (such as binary matrix-vector
> multiplications)".
This is correct. The next two sentences say "Key generation is also
quite fast in hardware. The hardware speeds of key generation and
decoding are already demonstrated by our FPGA implementation."
> So maybe I'm missing something?
The submission document [3] includes a specification. It also includes a
performance analysis. The performance claims are clear and specific. [1]
and [2] demonstrate the achievable FPGA performance for the core
mathematical operations.
> Perhaps you have a link to a paper
> describing your hardware design [2] that's more up-to-date than [1]?
> Because typically it's also interesting to see how much certain
> components/modules consume, what are the bottlenecks, throughput
> performances, and even % of area used on the device.
Please see [1], [2], and the SHAKE literature for analysis of the main
components.
> Looking at [1], which seems to be a basis for the McEliece KEM hardware
> designs, it seems to be CPA secure instead of CCA? Fig. 2 in [1] doesn't
> have a re-encryption phase and interestingly on page 15 of [1] they state
> that "note that we are using two simple 64-bit Xorshift PRNGs in our design
> to enable deterministic testing. For real deployment, these PRNGs must be
> replaced with a cryptographically secure random-number generator, e.g.,
> [8]". Checking [2] I also see the same 64-bit XOR shift PRNG verilog file.
> I assume you'd replace this with SHAKE256 to comply with the specifications
> and security estimates? Do you have any insights how this would affect
> area/performance?
I do not know what "the McEliece KEM hardware design" is referring to.
[1] and [2] provide an FPGA implementation of the Niederreiter
cryptosystem (as the paper "FPGA-based Niederreiter Cryptosystem using
Binary Goppa Codes" clearly states) including key generation, encryption
and decryption. This is not the same as the Classic McEliece KEM, but it
uses the same core mathematical operations. The performance of those
operations is reported in the Classic McEliece performance analysis.
As the quotes from [1] indicate, [1] describes a CPA secure version of
the Niederreiter PKE. We also have a CCA secure version of the
decryption operation that we can make available. The simple PRNG indeed
must be replaced by a secure PRNG as we point out in [1]. Using e.g.
SHAKE in key generation for a PRNG construction, the overhead in time
and area compared to the entire key generation is small; parallelism can
be exploited to reduce the impact on time, e.g., by performing matrix
row reduction during generating the next private key candidate.
> Anyway, some more clarifications on this would be helpful :)
I hope this helped to clear up some misunderstandings.
Best regards
Ruben (on behalf of the Classic McEliece team)
--
Ruben Niederhagen, PhD
Fraunhofer Institute for Secure Information Technology (SIT)
Head of Advanced Cryptographic Engineering (ACE)
Address: Rheinstraße 75, 64295 Darmstadt, Germany
Phone:
+49 6151 869 - 135
Homepage:
www.sit.fraunhofer.de/en/pqcryptography/