Round 2 Embedded Energy Results and general Comments

Markku-Juhani O. Saarinen

unread,

Apr 14, 2020, 8:07:28 AM4/14/20

to pqc-forum

Hello All,

This is not a brand new research result per se, but I wanted to offer some views on the Round 2 candidates on embedded / IoT targets, specifically on energy consumption and the quantitative work [1] I've done on that front.

The IEEE workshop where the write-up [1] will appear has been moved to August. The slides for the talk I that gave at the ETSI PQC workshop last November also cover the same subject matter [2]. The energy measurement laboratory experiments are intended to be reproducible, described in [3].

Anyway, some broad observations on NIST PQC Candidates on Embedded and Hardware:

- In 2017 substantial implementation work had already been done on lattice schemes but almost none on Isogeny systems (i.e. SIKE). This situation has since radically changed. In a series of papers, the embedded performance (and energy consumption) difference between ring/module lattice and Isogeny systems at the same security level has shrunk from 1000-fold to about 100-fold but is still very significant. Unfortunately, these better SIKE Cortex M4 implementations have not been made publicly available so I could not measure them. Due to the size of the performance gap, the same general conclusions remain.

- On microcontroller targets, one should focus solely on cycle counts and ignore millisecond measurements; ULP circuits such as Cortex M4 have been deliberately designed to have a wide range of clock frequencies and they can run slow specifically because that saves energy. Note that this is unrelated to sleep states. The dynamic power of most circuits is (locally) linearly dependant on clock frequency, but some MCUs also scale up the voltage with higher clock frequencies, hence resulting in higher than linear power consumption. Furthermore, Flash program memory is less likely to keep up, therefore requiring more cycles for the same task. On hardware implementations Cycles x Area is a reasonable energy estimation metric if more advanced measurements are not available.

- A hypothesis about a relatively high "transmission energy cost" of public keys and ciphertexts is still being propagated by some researchers. Transmission energy is measured in Joules/bit -- bit rates have vastly increased from what we had in the early 2000s and this, in turn, has decreased transmission energy. The write-up [1] goes into a bit more detail about this, but a cross-over point can be estimated on "how expensive transmission has to be" to justify greatly increased computation. Based on my research, it is difficult to find a scenario where CPU energy (cycles) is not the dominant cost in the energy budget for these public key algorithms (in mobile systems), so I fee that this hypothesis is largely a myth.

- We urge NIST to be cautious and pragmatic when viewing hardware results. Actual "real world" PQC implementations in smartcards, secure elements, and HSMs are much more likely to be of coprocessor (hardware/software codesign) type than monolithic ("hardware API"). Large monolithic designs that occupy the area equivalent of half a dozen lightweight CPU cores or (as has been proposed) the entire FPGA chip do not make much sense. Furthermore, hardware results based on "high-level synthesis" (HLS) do not tell anything meaningful about the algorithms being investigated, simply that a vendor tool was able to translate a C reference implementation into some HDL.

- A recommendation to those designing hardware support for PQC: Many algorithms benefit very significantly from a fast Keccak (SHA3/SHAKE) permutation; evidence indicates +50% for faster ring/module lattice algorithms, and even more for general lattice algorithms Round5 R5N1 and FrodoKEM. A good memory-mapped Keccak implementation also makes SPHINCS+ and XMSS signatures feasible on IoT and Embedded targets, speeding signatures up 20x or more. Since the area requirement of the permutation is rather large (mostly due to its 1600-bit state), it would not make much sense to package it into a monolithic hardware implementation with a PQC algorithm, but make it available to the CPU in a hardware/software codesign. For those considering hardware support for PQC algorithms, you can’t go wrong with a fast SHA3 core. Our experience in the RISC-V Crypto Task Group indicates that ISA extensions are not as helpful for SHA-3 as for some other algorithms, so a separate SHA3 hardware module is my current “first aid” suggestion when people are asking for PQC hardware acceleration. You can’t go wrong with it.

- On a final note, Dilithium and Falcon are really the preferable signature algorithms on IoT and Hardware targets as they are generally faster than RSA or ECC. The company I work for is also involved with composite (“hybrid”) PKI and signatures and these are the primary algorithms used in that work.

Stay safe everyone..

Cheers,

- markku

[1] M.-J. O. Saarinen: "Mobile Energy Requirements of the Upcoming NIST Post-Quantum Cryptography Standards." To appear in Proc. IEEE MobileCloud 2020. Preprint: https://arxiv.org/abs/1912.00916

[2] M.-J. O. Saarinen: "Towards μJoule PQC." ETSI/IQC Quantum-Safe Cryptography Workshop 2019. Slides: https://docbox.etsi.org/Workshop/2019/201911_QSCWorkshop/TECHNICAL_TRACK/05_PROTOTYPING_RESOURCE_CONSTRAINTS/PQSHIELD_SAARINEN.pdf

[3] M.-J. O. Saarinen: "PQPS (Post-Quantum Power Sandwich)." https://github.com/mjosaarinen/pqps

Dr. Markku-Juhani O. Saarinen <mj...@pqshield.com> PQShield, Oxford UK.

Kevin Chadwick

unread,

Apr 14, 2020, 8:57:30 AM4/14/20

to pqc-...@list.nist.gov

> Based on my research, it is difficult to find a scenario where CPU energy> (cycles) is not the dominant cost in the energy budget for these public key

I assume you mean the TLS model where public keys are processed often. That is
far too wasteful for many microcontroller deployments.

If public key exchanges happen once or infrequently then the energy cost of e.g.
post quantum resistant AES-256 encryption is then tiny in comparison to the data
transmission cost.

Markku-Juhani O. Saarinen

unread,

Apr 14, 2020, 10:10:58 AM4/14/20

to pqc-forum

On Tuesday, April 14, 2020 at 1:57:30 PM UTC+1, Kevin Chadwick wrote:

> Based on my research, it is difficult to find a scenario where CPU energy> (cycles) is not the dominant cost in the energy budget for these public key

I assume you mean the TLS model where public keys are processed often. That is
far too wasteful for many microcontroller deployments.

True. However this is the pqc-forum ran by NIST; microcontrollers are one of the stated evaluation targets in the post-quantum cryptography standardization process. This process evaluates public-key cryptography algorithms only (although they have a set of symmetric components such as SHAKE inside them).

Public-key cryptography is frequently used on microcontrollers, even though the present RSA and ECC standards are computationally less efficient than some of the new candidate algorithms (but more efficient than others). The subject matter of the posting was the energy measurements that I have made of the said algorithms on the NIST reference target, Cortex M4; only using such data can a cryptography engineer such as myself can determine whether or not an algorithm is too wasteful for a given use case, and which algorithm is the best fit for a given task.

It is true that public-key cryptography is not suitable for all microcontroller deployments, and also that there are some current public-key (ECC) deployments where it is very difficult to find a suitable PQC alternative, mainly due to hardcoded protocol limitations or low bandwidth. I am personally hoping that reduced energy consumption might also open up some new use cases for public-key cryptography.

In that paragraph I was discussing the total energy of computation and transmission (in case of an RF link), which appears to be dominated by the cost of computation with these new algorithms, despite having somewhat larger public keys and ciphertext expansion. It all depends on the connectivity technology being used; the current de facto IoT technologies such as BLE are much better than older ones which are still being cited.

If public key exchanges happen once or infrequently then the energy cost of e.g.
post quantum resistant AES-256 encryption is then tiny in comparison to the data
transmission cost.

Yes, this is often true. Unfortunately, this observation has been translated to mean that the transmission cost is also higher for public-key algorithms. This was the main myth I was trying to debunk.

For discussion about the relative merits of lightweight symmetric-key algorithms, I suggest checking out lwc-forum.

Cheers,

- markku

David Jao

unread,

Apr 14, 2020, 12:03:07 PM4/14/20

to pqc-...@list.nist.gov

Hi Markku,

The SIKE Cortex M4 implementations were posted on github just before your email came out, at https://github.com/solowal/SIKE_M4

The companion paper is https://eprint.iacr.org/2020/410 (note that these results are about 1.5x faster than previously announced).

Regarding HW/SW codesigns, the latest work for SIKE is https://eprint.iacr.org/2020/040 and the implementation will be available shortly.

I am planning a more comprehensive email describing these and other implementation advances in time for the April 15 deadline. This is just a short note to respond to the specific points you brought up.

Sorry that we were not able to get these posted sooner. As Dan mentioned before, many of us are dealing with previously unplanned child care responsibilities.

-David

--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/33ff1220-645d-476b-9f94-26a097de6cf2%40list.nist.gov.

Patrick Longa

unread,

Apr 14, 2020, 7:25:26 PM4/14/20

to David Jao, pqc-...@list.nist.gov

Hi all,

Just wanted to highlight that the hw/sw co-design implementation of SIKE described in https://eprint.iacr.org/2020/040 is available here: https://github.com/pmassolino/hw-sike

Best,

Patrick

To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/68bde302-0a4f-32a7-05f5-0b96e2f76d07%40math.uwaterloo.ca.

D. J. Bernstein

unread,

Apr 15, 2020, 3:10:50 AM4/15/20

to pqc-...@list.nist.gov

A 1000*bytes+cycles cost metric consists mostly of bytes for typical
lattice systems but mostly of cycles for SIKE. (See Section 5.4 of
https://ntruprime.cr.yp.to/nist/ntruprime-20190330.pdf for a data point
justifying this metric, and for a call for the community to agree on a
spectrum of such data points.)

For at least a year the SIKE team has been quite reasonably arguing that
long-term chip evolution will make SIKE's computation disadvantage less
and less noticeable compared to the communication advantage. This
doesn't mean that SIKE is about to beat lattice systems, but it does
mean that complaints about SIKE's cost need to be tempered by analysis
of how costs will evolve during the lifetime of the NISTPQC standards.

I don't see where the SIKE team ever endorsed the extreme claim that
"CPU/MCU energy consumption is automatically negligible when compared to
transmission energy", and I don't see where the SIKE team ever claimed
that SIKE's current microcontroller cycles are outweighed by its current
savings in "transmission energy cost". I also don't see how these
strawman claims help anyone analyze SIKE's current and future costs.

Related problem: From spot checks of https://arxiv.org/abs/1912.00916, I
extrapolate that the paper already has a problematic volume of obsolete
data. How are typical readers supposed to figure out which information
is obsolete, or to find updated information? The paper does not appear
to be attached to a framework that systematically includes new software.
The paper does not include even the most minimal effort to label
reference implementations. Even worse, the paper mischaracterizes
reference-implementation performance as inherent algorithm performance,
actively denying the reality that people then optimize implementations.

The pqm4 benchmarking system doesn't have the same structural flaws. It
doesn't measure power, but readers who use pqm4 cycle counts as a linear
predictor of power consumption are much less likely to make serious
mistakes than readers trusting https://arxiv.org/abs/1912.00916.

---Dan

signature.asc

Markku-Juhani O. Saarinen

unread,

Apr 15, 2020, 7:23:05 AM4/15/20

to pqc-forum, d...@cr.yp.to

On Wednesday, April 15, 2020 at 8:10:50 AM UTC+1, D. J. Bernstein wrote:

A 1000*bytes+cycles cost metric consists mostly of bytes for typical
lattice systems but mostly of cycles for SIKE. (See Section 5.4 of
https://ntruprime.cr.yp.to/nist/ntruprime-20190330.pdf for a data point
justifying this metric, and for a call for the community to agree on a
spectrum of such data points.)

Hi Dan,

Looking at that section, this is an equation that relates the ciphertext size and transmission speed on a 100 MHz fixed-line connections to the computation time of NTRU Prime on a 3 GHz system and was apparently used to justify some change in NTRU Prime. It is unrelated to embedded energy consumption (the subject of my paper) and a single data point is.. not a metric.

You see, mobile devices have an energy budget ("battery life") and both the RF unit and CPU use up this budget. Energy is measured in Joules. Hence the total energy estimations are not just unitless comparison metrics that somehow relate ciphertext size to cycles but can be used to aid broader system design and even protocol design. This is common in the industry.

Of course there are other ways to relate message sizes to computational cost; it all depends on the context. For example, I don't really know what "consists mostly of bytes for typical lattice systems but mostly of cycles for SIKE" even means but it also completely lacks context -- looking at where it came from, we see that you used it in a completely wrong context (see the subject line of your posting). Hence I 'm quite sure that such an equation should be used to replace evidence-based engineering data in mobile embedded engineering.

For at least a year the SIKE team has been quite reasonably arguing that
long-term chip evolution will make SIKE's computation disadvantage less
and less noticeable compared to the communication advantage. This
doesn't mean that SIKE is about to beat lattice systems, but it does
mean that complaints about SIKE's cost need to be tempered by analysis
of how costs will evolve during the lifetime of the NISTPQC standards.

My work was related to energy consumption, but in the lightweight / IoT context, another obstacle for SIKE adoption is the negative impact of wall-clock latency on user experience, and this will indeed dissipate with technological advances.

The evolution of communication efficiency has also been very significant in wireless networks and continues to improve with 5G and related developments. I'd be prepared to guess that the improvements in de facto Joule/bit will be larger than improvements in computation energy Joule/cycle (normalized equivalent) in the immediate future.

I don't see where the SIKE team ever endorsed the extreme claim that
"CPU/MCU energy consumption is automatically negligible when compared to
transmission energy", and I don't see where the SIKE team ever claimed
that SIKE's current microcontroller cycles are outweighed by its current
savings in "transmission energy cost". I also don't see how these
strawman claims help anyone analyze SIKE's current and future costs.

Yes, I don't think we should think that anyone has made such an extreme argument, right?

Related problem: From spot checks of https://arxiv.org/abs/1912.00916, I
extrapolate that the paper already has a problematic volume of obsolete
data. How are typical readers supposed to figure out which information
is obsolete, or to find updated information? The paper does not appear
to be attached to a framework that systematically includes new software.
The paper does not include even the most minimal effort to label
reference implementations. Even worse, the paper mischaracterizes
reference-implementation performance as inherent algorithm performance,
actively denying the reality that people then optimize implementations.

There are naturally caveats and omissions when presenting such a broad analysis in a compact form. That's why the work is supported by a web site with more information and details about the targets, code and laboratory set-up.

These are the best implementations available on Cortex M4 at the end of the second round. You can't measure things that you don't have (and an implementation engineer cannot be expected to have those either). Of course, the entire experiment is reproducible and the implementations are available at https://github.com/mjosaarinen/pqps

One ought to do more than "spot checks" before claiming what features the system has and doesn't have. There is an automated mechanism to import implementations from pqm4.

Cheers,

- markku

David Jao

unread,

Apr 15, 2020, 9:59:55 AM4/15/20

to pqc-...@list.nist.gov

> My work was related to energy consumption, but in the lightweight /
> IoT context, another obstacle for SIKE adoption is the negative impact
> of wall-clock latency on user experience, and this will indeed
> dissipate with technological advances.

I'm not a usability expert, but people who are usability experts
(https://www.nngroup.com/articles/response-times-3-important-limits/)
list three important thresholds for user response time: 0.1 second, 1
second, and 10 seconds. Also worth noting: "The basic advice regarding
response times has been about the same for thirty years."

SIKE on M4 does not currently meet the 0.1 second threshold, but at
sufficiently high clock speeds it does satisfy the 1 second threshold
(https://eprint.iacr.org/2020/410 Table 8, 0.56 seconds wall clock time
for SIKEp434 @168MHz decapsulation). These can be achieved today,
without relying on future technological advances.

In the hardware space SIKE comfortably outperforms even the 0.1 second
threshold, even under Markku's preferred hw/sw codesign setting. Given
that even ECC alone is already too heavyweight for pure software in
embedded applications (as evidenced by the existence of products such as
https://www.microchip.com/wwwproducts/en/ATECC608A), it seems quite
likely that hardware will be the dominant consideration for PQC in
lightweight environments.

> The evolution of communication efficiency has also been very
> significant in wireless networks and continues to improve with 5G and
> related developments. I'd be prepared to guess that the improvements
> in de facto Joule/bit will be larger than improvements in computation
> energy Joule/cycle (normalized equivalent) in the immediate future.

Our argument here is that future improvements in attacks, both
algorithmic advances and speed improvements from faster hardware, result
in the need for increased key sizes, partially offsetting future gains
in Joules/bit. Of course, we are speculating here about future quantum
computing hardware, and who knows how that will turn out. But I think
anybody in this space must at least implicitly accept some amount of
future progress in quantum computing hardware, or else we might as well
just keep using ECC.

-David

D. J. Bernstein

unread,

Apr 15, 2020, 4:30:52 PM4/15/20

to pqc-...@list.nist.gov

Markku-Juhani O. Saarinen writes:
> Looking at that section, this is an equation that relates the ciphertext size
> and transmission speed on a 100 MHz fixed-line connections to the computation
> time of NTRU Prime on a 3 GHz system and was apparently used to justify some
> change in NTRU Prime.

No.

The general principle is that bytes are much more expensive than cycles.
To quantify this principle, the section considers a quad-core 3GHz CPU
handling a 100Mbps connection---i.e., an environment handling at most

* 1.2*10^10 cycles/second and
* 1.2*10^7 bytes/second.

The ratio "suggests that each cycle used in decapsulation is 1000x less
expensive than each byte used in a ciphertext". The 1000*bytes+cycles
metric balances these two resources, and can be systematically applied
to all submissions.

Obviously one can point to environments where the 1000 is replaced by
other, smaller or larger, ratios; and technology trends are generally
pushing the ratios larger. The section says "It would be useful for the
community to agree upon a spectrum of data points regarding the costs of
communication and the costs of computation in various environments."

The section also applies these general observations to NTRU Prime,
explaining why NTRU Prime uses fixed-weight vectors (fewer bytes, more
cycles) and inversion mod 3 (fewer bytes, more cycles). Contrary to your
"change in NTRU Prime" text, NTRU Prime has always been designed this
way. This small lattice choice is reasonably robust against changes in
the ratio. Your SIKE-vs.-lattices comparison is much more fragile.

> It is unrelated to embedded energy consumption

I already pointed out (email dated 4 Jun 2019 16:52:56 -0000) that your
published guesses of 0.1 mJ for 1 million Cortex-M4 cycles and 75 nJ for
1 bit communicated through Bluetooth Low Energy say that 1 byte of
communication costs as much energy as 6000 cycles of computation in that
environment; i.e., your guesses model energy as 6000*bytes+cycles (times
a constant that's irrelevant to your SIKE-vs.-lattices comparison).

I also pointed out how fragile this 0.1 mJ guess is: "he mentions
numbers from ARM as low as 0.012 mJ but guesses that flash and SRAM add
much more to this---which is another type of communication cost that
would become much smaller inside a coprocessor". Why shouldn't the SIKE
team consider ratios larger than 6000? Is 100000 so hard to imagine?

> For example, I don't really know what "consists mostly of bytes for
> typical lattice systems but mostly of cycles for SIKE" even means

Consider receiving and decapsulating a ciphertext on a Cortex-M4. Model
energy as (6000*bytes+cycles)*irrelevantconstant, which is one of _your_
published guesses (see above).

Current pqm4 benchmarks are, e.g., around 0.98 million cycles for
kyber768 ("m4" implementation), and around 1.1 billion cycles for
sikep434 ("opt" implementation). Presumably pqm4 will soon be updated
for the new sikep434 numbers from https://github.com/solowal/SIKE_M4,
reportedly around 87 million cycles.

For kyber768 the ciphertext is 1088 bytes. 6000*bytes+cycles looks like
6000*1088+0.98*10^6, which is about 13% coming from the cycle count.

For sikep434 the ciphertext is 346 bytes. 6000*bytes+cycles looks like
6000*346+1.1*10^9 with the old numbers, about 0.2% coming from the byte
count; and only 6000*346+87*10^6 with the new numbers, about 11 times
better, and about 2% coming from the byte count.

The sikep434-vs.-kyber768 ratio in this metric has dropped from 146 to
12. The fact that the metric is mostly bytes for kyber768 but mostly
cycles for sikep434 also makes this comparison sensitive to changes in
the 6000.

> we see that you used it in a completely wrong context

I don't see how focusing on energy consumption eliminates, or even
reduces, the difficulties of defending your anti-SIKE hype.

> I'd be prepared to guess that the improvements in de facto Joule/bit
> will be larger than improvements in computation energy Joule/cycle
> (normalized equivalent) in the immediate future.

https://www.openfabrics.org/images/eventpresos/workshops2015/DevWorkshop/Tuesday/tuesday_10.pdf
has Intel stating the opposite trend from your guess: computation cost
will "scale well with process and voltage", while the energy cost of
11.20 pJ per 5 mm to move 8 bytes at 22nm is "more difficult to scale
down".

Switching from wired to wireless communication changes the numbers but
doesn't change the basic point that communication cost (over whatever
user-required distance) improves more slowly than computation cost. The
radio-marketing department can try to fabricate the opposite trend by
comparing old radio + old CPU to new radio + old CPU, but a sensible
prediction of cost ratios in 5 or 10 years has to consider new CPUs.

[ re obsolete data in https://arxiv.org/abs/1912.00916: ]

> There are naturally caveats and omissions when presenting such a broad
> analysis in a compact form. That's why the work is supported by a web
> site with more information and details about the targets, code and
> laboratory set-up.

I don't see how this handles the structural problems I described. For
example, how exactly does the SIKE team get you to update your numbers
for their newly published code, and how exactly are you addressing the
risk of readers taking your obsolete numbers? Sure, you _could_ re-run
everything and publish new numbers after taking new code (via pqm4), but
is this actually going to happen? When?

There's also the problem of inaccurate and deceptive labeling. As I
said before: "The paper does not include even the most minimal effort to

label reference implementations. Even worse, the paper mischaracterizes
reference-implementation performance as inherent algorithm performance,
actively denying the reality that people then optimize implementations."

---Dan

signature.asc

Markku-Juhani O. Saarinen

unread,

Apr 19, 2020, 1:31:57 AM4/19/20

to pqc-forum, d...@cr.yp.to

On Wednesday, April 15, 2020 at 9:30:52 PM UTC+1, D. J. Bernstei

I already pointed out (email dated 4 Jun 2019 16:52:56 -0000) that your
published guesses of 0.1 mJ for 1 million Cortex-M4 cycles and 75 nJ for

Hi Dan,

I had not started this research in June 2019, and I don't think that lifting numbers from a discussion thread 10 months ago is very relevant here. This was before I made any laboratory experiments and started looking a little bit more seriously at the energy costs of PQC algorithms. I would no longer be satisfied with such an anecdotal approach -- people who build low-power IoT systems know that "datasheets lie" as there are many additional system factors. Those numbers were from vendor information.

In the introduction to the lab description at https://github.com/mjosaarinen/pqps I write:

"My original motivation was to establish a straightforward model from "cycle counts" to "Joules" for new (post-quantum) asymmetric cryptographic algorithms. The null hypothesis was that the relationship is largely linear and algorithm-independent. This turned out not to be true for the Cortex M4 microcontrollers; there is 50+% variation in average power depending on what type of cryptographic primitive the MCU is processing."

For me, this was a little bit surprising -- that the specific instructions mix of an algorithm has such a large effect on its energy need as a whole. However, it is a direct consequence of the dynamic power equation and the way these ULP MCUs are designed (designers try to minimize switching since that consumes energy, hence try to not to clock subcircuits unnecessarily even when the core is stepping etc).

For high-level analyses (such as algorithm selection for a specific use case) this 50% variation makes little difference, though; in my mind cycles are still a sufficient indicator of the *magnitude* of energy required. A little bit like how we express security in "bits". Furthermore, some lucky optimization can affect it even more.

I don't see how focusing on energy consumption eliminates, or even
reduces, the difficulties of defending your anti-SIKE hype.

I am sorry if you interpret energy consumption research as anti-SIKE "hype". But I'd think that any criticism on a paper about energy consumption should indeed focus on energy consumption.

I have nothing specific against SIKE (we support it in hardware and software!) but it is fair to say that SIKE is more problematic the mobile device energy budget than many other alternatives.

> I'd be prepared to guess that the improvements in de facto Joule/bit
> will be larger than improvements in computation energy Joule/cycle
> (normalized equivalent) in the immediate future.

https://www.openfabrics.org/images/eventpresos/workshops2015/DevWorkshop/Tuesday/tuesday_10.pdf
has Intel stating the opposite trend from your guess: computation cost
will "scale well with process and voltage", while the energy cost of
11.20 pJ per 5 mm to move 8 bytes at 22nm is "more difficult to scale
down".

That's on-die wire. I hope it is clear that moving bits inside the CPU (at 22 nm) is still CPU energy consumption, not transmission energy? I fail to see the point that you're trying to make with this slide from Intel about the issues they were facing five years ago.

Again, my paper is specifically about mobile devices, hence it is about wireless communication *between* devices.

Later the slide set discusses transmission costs (as in dollars) but that is a different kind of budget from the energy budget. Arguably very important, but someone else can write a paper about that.

I don't see how this handles the structural problems I described. For

example, how exactly does the SIKE team get you to update your numbers
for their newly published code, and how exactly are you addressing the
risk of readers taking your obsolete numbers? Sure, you _could_ re-run
everything and publish new numbers after taking new code (via pqm4), but
is this actually going to happen? When?

Well. if people use numbers from https://arxiv.org/abs/1912.00916 , the usual thing in science is to cite where the numbers are from (which I encourage). For SIKE specifically, the paper already had a section explaining that there were faster implementations in the literature that were not available for measurement.

Note that SIKE folks have also published their own energy measurements now, which is great. What is even better is that others can now measure them too.

Future plans:

I'd think it makes more sense to redo the whole thing for Round 3 as there have been algorithm changes throughout as we enter it. The abstract states specifically that the measured algorithms are Round 2 algorithms. I'll correct editorial errors of course (like the misplaced decimal point for Falcon -- thanks Vadim L!).

As for the rest of the paper -- perhaps people find the proposed benchmark methodology or analysis approaches interesting as I view that as the main contribution. Also one can start looking at things like algorithm-characteristic power in more detail (quite different from energy). During the writing of the paper, I could not find many prior many reviews of PQC energy consumption specifically in mobile devices, but I am expecting more to come as we continue to round 3 and standardization.

Reply all

Reply to author

Forward