"90s" version parameter sets

1,254 views
Skip to first unread message

Simon Hoerder

unread,
Jul 23, 2020, 1:36:42 PM7/23/20
to pqc-forum
Hi,

some candidates that progressed to round 3 have proposed additional
"90s" version parameter sets that replace SHA-3/SHAKE with AES/SHA-2.
What is NIST's position on those parameter sets?

In particular, does NIST only consider them as a crutch to estimate
performance in the "what would it look like if chip x had a SHA-3/SHAKE
hw accelerator instead of AES" scenario? Or is NIST considering to
include those parameter sets in a standard given that AES/SHA-2 have
wide-spread hw support?

I understand that NIST probably doesn't have a definite answer to this
question. Just trying to get a little more clarity on the state of the
competition in that respect.

Thanks,
Simon

Perlner, Ray A. (Fed)

unread,
Jul 24, 2020, 4:37:44 PM7/24/20
to Simon Hoerder, pqc-forum
Hi Simon,

SHA2 and AES are as much NIST approved cryptographic primitives as SHA3 and SHAKE. When standardizing public key cryptography that uses symmetric primitives, NIST has historically tended to write our standards in such a way as to accommodate the use of any NIST approved symmetric primitive that implements the appropriate functionality. We anticipate that parties who implement our standards will have a variety of different preferences regarding NIST approved symmetric primitives, and NIST will likely continue to seek to accommodate those preferences as long as it doesn’t harm security. As such, we think it’s great that the submitters are thinking about the available options now.

--The NIST PQC team
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/345c82cb-302a-168a-d54a-3df5d35dc63e%40hoerder.net.

Nigel Smart

unread,
Jul 25, 2020, 4:10:56 AM7/25/20
to Perlner, Ray A. (Fed), 'Perlner, Ray A. (Fed)' via pqc-forum, Simon Hoerder
But early on the submitters went for SHA3 as following a question on this forum it was stated NIST preferred when using a RO in a XOF construction to have one which was actually one which could be proved indifferentiable. This leaves only the SHA3 based XOFs from the NIST standards.

Is this preference now changed in that if a XOF theoretically is required then one can use an AES based PRG instead?
Yours

Nigel
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Markku-Juhani O. Saarinen

unread,
Jul 25, 2020, 5:52:42 AM7/25/20
to pqc-forum, ray.p...@nist.gov, si...@hoerder.net
On Saturday, July 25, 2020 at 9:10:56 AM UTC+1, Nigel Smart wrote:
But early on the submitters went for SHA3 as following a question on this forum it was stated NIST preferred when using a RO in a XOF construction to have one which was actually one which could be proved indifferentiable. This leaves only the SHA3 based XOFs from the NIST standards.

Is this preference now changed in that if a XOF theoretically is required then one can use an AES based PRG instead?

Nigel, 

Are there AES based PRGs? Should such a mode be standardized in 2020? What would its security properties and acceptable limitations be? Because I'm sure it would have many limitations when compared to SHAKE and derived functions.

I first read that as an RNG, and indeed many have confused an XOF with a DRBG and even an RNG. I'm of course willing to let pass if an XOF is confused with a hash function or a stream cipher ("keystream generator") in some use cases -- but even a combination of those two ignores domain separation which the NIST XOF standards readily support (it's admittedly not in the mathematical description of a plain XOF).

Anyway, I'd think that it would be short-sighted to stare at current intel performance numbers as a justification for some kind of AES based XOF.  As an embedded & CPU architect I'd like to see this 90s crypto gone sooner the better. Hopefully from AEADs too, but there the industry has moved independently from NIST and the NIST LWC gives hope for "fipsified" versions.

Cheers,
- markku
 
Dr. Markku-Juhani O. Saarinen <mj...@pqshield.com> PQShield, Oxford UK.


On 24 July 2020 22:37:37 CEST, "'Perlner, Ray A. (Fed)' via pqc-forum" <pqc-...@list.nist.gov> wrote:
Hi Simon, 

SHA2 and AES are as much NIST approved cryptographic primitives as SHA3 and SHAKE. When standardizing public key cryptography that uses symmetric primitives, NIST has historically tended to write our standards in such a way as to accommodate the use of any NIST approved symmetric primitive that implements the appropriate functionality. We anticipate that parties who implement our standards will have a variety of different preferences regarding NIST approved symmetric primitives, and NIST will likely continue to seek to accommodate those preferences as long as it doesn’t harm security. As such, we think it’s great that the submitters are thinking about the available options now.

--The NIST PQC team

-----Original Message-----
From: pqc-...@list.nist.gov <pqc-...@list.nist.gov> On Behalf Of Simon Hoerder
Sent: Thursday, July 23, 2020 1:36 PM
To: pqc-forum <pqc-...@list.nist.gov>
Subject: [pqc-forum] "90s" version parameter sets

Hi,

some candidates that progressed to round 3 have proposed additional "90s" version parameter sets that replace SHA-3/SHAKE with AES/SHA-2.
What is NIST's position on those parameter sets?

In particular, does NIST only consider them as a crutch to estimate performance in the "what would it look like if chip x had a SHA-3/SHAKE hw accelerator instead of AES" scenario? Or is NIST considering to include those parameter sets in a standard given that AES/SHA-2 have wide-spread hw support?

I understand that NIST probably doesn't have a definite answer to this question. Just trying to get a little more clarity on the state of the competition in that respect.

Thanks,
Simon

--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-...@list.nist.gov.

Nigel Smart

unread,
Jul 25, 2020, 6:41:07 AM7/25/20
to pqc-...@list.nist.gov
Hi

In the context of the NIST competition a lot of proofs do
the following.

I have a randomized encryption scheme E(m;r) then I
de-randomize it using...
E(m; H(r))
Then we need to instantiate H.

Or in other instances you compress the A matrix in LWE
via..
A = Hash(seed)

VERY early on in the NIST competition it was asked what
should the choice of H/Hash be. The answer was it should be
something which is in a NIST standard [otherwise it
would cause complexities in standardization]. Now if
H/Hash in the proof needs to be a RO, then basically this
means you should use the SHA-3 based XOF/Hash whatever.

BUT as everyone knows currently this is faster if you
use AES based DRNGs. Which have also been standardized
by NIST, but a long long time ago.

The email from Ray seems to imply to me that the earlier
"advice" to prefer SHA-3 based is seen as less important
now.

I would support the earlier advice to stick. Namely
use SHA-3 as opposed to the older (and already standardized
things). You have a choice of two NIST standards, so
pick the best, not the fastest
- Speed will be solved later.

Nigel


On 25/07/2020 11:52, Markku-Juhani O. Saarinen wrote:
> On Saturday, July 25, 2020 at 9:10:56 AM UTC+1, Nigel Smart wrote:
>
> But early on the submitters went for SHA3 as following a question on
> this forum it was stated NIST preferred when using a RO in a XOF
> construction to have one which was actually one which could be
> proved indifferentiable. This leaves only the SHA3 based XOFs from
> the NIST standards.
>
> Is this preference now changed in that if a XOF theoretically is
> required then one can use an AES based PRG instead?
>
>
> Nigel, 
>
> Are there AES based PRGs? Should such a mode be standardized in 2020?
> What would its security properties and acceptable limitations be?
> Because I'm sure it would have many limitations when compared to SHAKE
> and derived functions.
>
> I first read that as an RNG, and indeed many have confused an XOF with a
> DRBG and even an RNG. I'm of course willing to let pass if an XOF is
> confused with a hash function or a stream cipher ("keystream generator")
> in some use cases -- but even a combination of those two ignores *domain
> separation* which the NIST XOF standards readily support (it's
> admittedly not in the mathematical description of a plain XOF).
>
> Anyway, I'd think that it would be short-sighted to stare at current
> intel performance numbers as a justification for some kind of AES based
> XOF.  As an embedded & CPU architect I'd like to see this 90s crypto
> gone sooner the better. Hopefully from AEADs too, but there the industry
> has moved independently from NIST and the NIST LWC gives hope for
> "fipsified" versions.
>
> Cheers,
> - markku
>  
> Dr. Markku-Juhani O. Saarinen <mj...@pqshield.com
> <mailto:mj...@pqshield.com>> PQShield, Oxford UK.
>
>
> On 24 July 2020 22:37:37 CEST, "'Perlner, Ray A. (Fed)' via
> pqc-forum" <pqc-...@list.nist.gov <javascript:>> wrote:
>
> Hi Simon,
>
> SHA2 and AES are as much NIST approved cryptographic primitives as SHA3 and SHAKE. When standardizing public key cryptography that uses symmetric primitives, NIST has historically tended to write our standards in such a way as to accommodate the use of any NIST approved symmetric primitive that implements the appropriate functionality. We anticipate that parties who implement our standards will have a variety of different preferences regarding NIST approved symmetric primitives, and NIST will likely continue to seek to accommodate those preferences as long as it doesn’t harm security. As such, we think it’s great that the submitters are thinking about the available options now.
>
> --The NIST PQC team
>
> -----Original Message-----
> From: pqc-...@list.nist.gov <javascript:> <pqc-...@list.nist.gov <javascript:>> On Behalf Of Simon Hoerder
> Sent: Thursday, July 23, 2020 1:36 PM
> To: pqc-forum <pqc-...@list.nist.gov <javascript:>>
> Subject: [pqc-forum] "90s" version parameter sets
>
> Hi,
>
> some candidates that progressed to round 3 have proposed additional "90s" version parameter sets that replace SHA-3/SHAKE with AES/SHA-2.
> What is NIST's position on those parameter sets?
>
> In particular, does NIST only consider them as a crutch to estimate performance in the "what would it look like if chip x had a SHA-3/SHAKE hw accelerator instead of AES" scenario? Or is NIST considering to include those parameter sets in a standard given that AES/SHA-2 have wide-spread hw support?
>
> I understand that NIST probably doesn't have a definite answer to this question. Just trying to get a little more clarity on the state of the competition in that respect.
>
> Thanks,
> Simon
>
> --
> You received this message because you are subscribed to the Google Groups "pqc-forum" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pqc-...@list.nist.gov <javascript:>.
> To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/345c82cb-302a-168a-d54a-3df5d35dc63e%40hoerder.net. <https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/345c82cb-302a-168a-d54a-3df5d35dc63e%40hoerder.net.>
>
> --
> You received this message because you are subscribed to the Google
> Groups "pqc-forum" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to pqc-forum+...@list.nist.gov
> <mailto:pqc-forum+...@list.nist.gov>.
> To view this discussion on the web visit
> https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/af15154a-0a25-4ff3-bdb2-40ba3ad105a9o%40list.nist.gov
> <https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/af15154a-0a25-4ff3-bdb2-40ba3ad105a9o%40list.nist.gov?utm_medium=email&utm_source=footer>.

signature.asc

D. J. Bernstein

unread,
Jul 25, 2020, 9:15:29 AM7/25/20
to pqc-...@list.nist.gov
Markku-Juhani O. Saarinen writes:
> Anyway, I'd think that it would be short-sighted to stare at current intel
> performance numbers as a justification for some kind of AES based XOF.  As an
> embedded & CPU architect I'd like to see this 90s crypto gone sooner the better.

The Kyber documentation similarly claims that "the 90s variant is only
really attractive to use if AES hardware support is available".

However, the actual performance numbers on Cortex-M4 (from pqm4) show
the AES version of Kyber being considerably faster than the SHAKE
version. For example, at (maybe) level 3:

kyber768: 887618 keygen, 1047720 enc, 985976 dec
kyber768-90s: 624485 keygen, 695432 enc, 708683 dec

Cortex-M4 is one of NIST's selected embedded platforms.

Some of the AES inputs are secret, and this pqm4 AES code uses tables,
so the benchmarks of anything using the pqm4 AES code are benchmarks of
implementations violating the constant-time security policy. It's
amazing to see AES continuing to bite users after 20 years! To address
this, there's a team putting together optimized constant-time AES-256
code on Cortex-M4. I expect this to end up around 150 cycles/byte, which
is a big slowdown compared to the current 50 cycles/byte.

This doesn't mean that implementors will suddenly be happy with the
SHAKE performance---it means that every NIST-approved option is bad. I
generally agree with Nigel's "Speed will be solved later" comment, but
unfortunately NIST is putting massive weight upon performance numbers
and (questionable) claims regarding what this means for applications, so
submissions that want to succeed and that would spend serious time on
SHAKE are under huge pressure to include something faster.

It's tempting for round-3 submissions to replace AES-256 with ChaCha20:
bigger security margin, sufficiently stable security picture after many
attack papers (see https://cr.yp.to/snuffle.html), higher security in
typical applications because of the bigger block size, much faster on
Cortex-M4 (13 cycles/byte), only a minor slowdown on Intel, widespread
adoption in TLS, and easy constant-time implementations. Would NIST
discriminate for non-performance reasons against submissions using
ChaCha20? The call for proposals said

If the scheme uses a cryptographic primitive that has not been
approved by NIST, the submitter shall provide an explanation for why
a NIST-approved primitive would not be suitable.

which seems easily answered by the ongoing AES implementation-security
problems and SHAKE performance problems. Will NIST then say "It's
_allowed_, but we'll complain about it and downgrade you for it"?

---Dan
signature.asc

Peter Schwabe

unread,
Jul 25, 2020, 9:33:31 AM7/25/20
to pqc-...@list.nist.gov
"D. J. Bernstein" <d...@cr.yp.to> wrote:

Dear Dan, dear all,

> Markku-Juhani O. Saarinen writes:
> > Anyway, I'd think that it would be short-sighted to stare at current intel
> > performance numbers as a justification for some kind of AES based XOF.  As an
> > embedded & CPU architect I'd like to see this 90s crypto gone sooner the better.
>
> The Kyber documentation similarly claims that "the 90s variant is only
> really attractive to use if AES hardware support is available".
>
> However, the actual performance numbers on Cortex-M4 (from pqm4) show
> the AES version of Kyber being considerably faster than the SHAKE
> version. For example, at (maybe) level 3:
>
> kyber768: 887618 keygen, 1047720 enc, 985976 dec
> kyber768-90s: 624485 keygen, 695432 enc, 708683 dec
>
> Cortex-M4 is one of NIST's selected embedded platforms.
>
> Some of the AES inputs are secret, and this pqm4 AES code uses tables,
> so the benchmarks of anything using the pqm4 AES code are benchmarks of
> implementations violating the constant-time security policy. It's
> amazing to see AES continuing to bite users after 20 years! To address
> this, there's a team putting together optimized constant-time AES-256
> code on Cortex-M4. I expect this to end up around 150 cycles/byte, which
> is a big slowdown compared to the current 50 cycles/byte.

That's true, but at least for Kyber, a large portion of the AES
computations are used to expand the public matrix A, so there using a
table-based implementation is fine.

As this seems like a good moment to mention this: as soon as we have
optimized constant-time AES for the M4, the pqm4 project will offer two
AES APIs: the default one not using lookup tables and a separate one
that should only be used for public inputs.


All the best,

Peter

Thomas Peyrin

unread,
Jul 25, 2020, 10:38:17 AM7/25/20
to Peter Schwabe, pqc-...@list.nist.gov
Dear all,

Since there is some interest in optimized AES implementations for ARM Cortex M3/M4, we would like to announce (joint work with Alexandre Adomnicai) that we have applied the fixslicing implementation technique (originally applied to GIFT block cipher here: https://eprint.iacr.org/2020/412) to AES and we obtained constant-time implementation of AES-128 that runs at 83 c/B on ARM Cortex M3 (when subkeys are precomputed, thus not for very small messages). We didn't yet implement AES-256, but we believe exactly the factor 1.4 should apply and we should get something around 115 c/B for AES-256.

This work in under submission, but we plan to put our code online very soon (we can share the code, simply request it privately).

Regards,

Thomas.


--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/20200725133302.GS1856%40disp2634.

Ilya Mironov

unread,
Jul 25, 2020, 2:48:55 PM7/25/20
to Peter Schwabe, pqc-...@list.nist.gov
Peter,

That's true, but at least for Kyber, a large portion of the AES
computations are used to expand the public matrix A, so there using a
table-based implementation is fine.

I agree with this reasoning, where XOF is used as a RO.  In particular, it means that the choice of the primitive for XOF instantiation should be decoupled from the rest of the security argument. You proposed using SHAKE-128 in your main proposal, and AES-256 in your "90s parameters". Why not AES-128 then?

--Ilya
 

Kevin Chadwick

unread,
Jul 25, 2020, 5:27:57 PM7/25/20
to pqc-...@list.nist.gov
On 2020-07-25 13:15, D. J. Bernstein wrote:
> Markku-Juhani O. Saarinen writes:
>> Anyway, I'd think that it would be short-sighted to stare at current intel
>> performance numbers as a justification for some kind of AES based XOF.  As an
>> embedded & CPU architect I'd like to see this 90s crypto gone sooner the better.
> The Kyber documentation similarly claims that "the 90s variant is only
> really attractive to use if AES hardware support is available".
>
>
> It's tempting for round-3 submissions to replace AES-256 with ChaCha20:
> bigger security margin, sufficiently stable security picture after many
> attack papers (see https://cr.yp.to/snuffle.html), higher security in
> typical applications because of the bigger block size, much faster on
> Cortex-M4 (13 cycles/byte), only a minor slowdown on Intel, widespread
> adoption in TLS, and easy constant-time implementations. Would NIST
> discriminate for non-performance reasons against submissions using
> ChaCha20? The call for proposals said
>
> If the scheme uses a cryptographic primitive that has not been
> approved by NIST, the submitter shall provide an explanation for why
> a NIST-approved primitive would not be suitable.

Perhaps I am missing something but in my mind, this isn't being fair to NISTS
position.

There are TRNGS based on AES like silabs has provided for years now. With AES
you can do MACs like CMAC that Microsoft have actually switched to, for
performance reasons for SMB on Windows. I guess this demonstrates that AES is
better accelerated than SHA or they didn't want to wait for SHA hw support.

You can get crypto micros from ST that just add constant time hw accelerated
AES/SHA/ECC that could be added to some old 8bit design.

Most new cortex m3/m4 chips now have AES constant time available.

In many designs, constant time isn't even an issue for micros (physical access)
and a constant delay could even be designed in, though why wouldn't you use AES
hw support?

CHACHA is only faster in software right, to be clear? In my mind software
metrics of AES, are pointless. Especially now that almost all Intel hardware is
basically flawed and needs fixing with non intel or future-gen Intel, due to
spectre.

The 90s crypto is proven and still secure, so wanting to get rid of it is simply
irrational or at best elitist, without stating why succinctly.

Maybe CHACHA or other alternatives are better and can be hw accelerated even
better, but that needs to be weighed against the time and cost before that
becomes ubiquitious and the risks of whether it can or will be and whether that
would actually prevent takeup of post quantum standards or people like myself
may just use AES anyway and ignore the standard?

If the cost of fairly recent hw accel has not yet been recovered by the embedded
chip makers. Who knows if they would invest? Certainly not without very powerful
reasoning.

D. J. Bernstein

unread,
Jul 26, 2020, 11:56:23 AM7/26/20
to pqc-...@list.nist.gov
Looking forward to NIST's comments on whether NIST will, for
non-performance reasons, penalize NISTPQC submissions that upgrade from
AES-256-CTR to ChaCha20.

Kevin Chadwick writes:
> The 90s crypto is proven and still secure, so wanting to get rid of it
> is simply irrational or at best elitist, without stating why succinctly.

Surely everyone agrees that rationales should be provided. Here goes.
I'll focus here on AES, not SHA-2.

One problem with AES is that AES was designed to be, and often is,
implemented with table lookups. Table-lookup AES software, in turn, is
dangerous to deploy, because such software has been shown again and
again (see, e.g., https://eprint.iacr.org/2005/271) to leak secret keys
through timing. (The AES designers and NIST had incorrectly claimed that
table lookups are "not susceptible" to timing attacks.)

It's not that disaster is guaranteed. Sometimes attackers can't get
enough timing information; there are platforms where table lookups take
constant time; AES is often implemented in other ways. But if a standard
is specified using AES then this choice will cause frequent security
problems that would have been avoided by SHAKE or ChaCha20. It can't be
comfortable for Kyber to be in the position of

* attracting NIST attention on the basis of performance while

* having its top Cortex-M4 speeds using table-lookup AES code, code
that presumably will lead to security problems if deployed.

_Some_ Cortex-M4 CPUs have constant-time table lookups, but the same
code also works on a variety of CPUs with caches.

It also can't be comfortable for NIST to be in the position of
_encouraging_ these security problems, through a combination of
pressuring submitters to use NIST symmetric crypto (so not ChaCha20),
and pressuring submitters to use something fast in the benchmarks that
NIST is looking at (so not SHAKE).

As far as I know, NIST has never accepted any blame for predictable
implementation-security failures as long as it's _possible_ to implement
things in a safe way. I haven't, for example, seen NIST issuing security
alerts regarding the AES standard; I haven't even seen NIST issuing an
erratum for its incorrect "not susceptible" claim. But I'm optimistic
that all this can be corrected. You shouldn't take NIST's current lack
of AES warnings as representing any sort of security consensus.

Another problem with AES is the 128-bit block size. As a direct result
of this block size, typical constructions built from AES have

* surprisingly low limits on their levels of provable security (see,
e.g., https://isg.rhul.ac.uk/~kp/TLS-AEbounds.pdf) and

* sometimes surprisingly low actual security (e.g., extrapolate from
64 bits in https://sweet32.info to 128 bits).

There isn't much literature exploring what this means in the NISTPQC
context, and a careful security reviewer has to ask whether there's a
security problem here. For example:

* Kyber-1024 claims to be category 5, i.e., as hard to break as
single-target AES-256 key search. I don't see where the "90s"
variant is excluded from this claim.

* The Kyber theorems drop all security claims if the attacker
succeeds in breaking the "PRF" security property of the underlying
cipher.

* The "90s" variant uses AES-256-CTR as a PRF. Because of the small
block size, the PRF advantage of a cheap attack is _much_ higher
than 1/2^256: it's more like 1/2^117.

Should a submission claiming 2^256 pre-quantum security be allowing
low-cost non-quantum single-target attacks that work with probability
around 1/2^117? NIST says that we can assume at most 2^64 ciphertexts
per user, and presumably we can also assume at most 2^64 users, so an
attacker running an attack against many targets in parallel can't gain
more than a factor 2^128---but 1/2^117 is above 1/2^128.

The bottom line is that, as a direct result of the AES block size,
kyber90s1024 doesn't claim that attack probabilities are noticeably
below 1 against a group of 2^53 users. For comparison, NIST said in

https://eprint.iacr.org/2020/455

that---for another submission claiming 2^256 security---a fast attack
breaking a user's key with probability 1/2^47.72 is a "major, practical
break".

One can object that attacking 2^64 ciphertexts from each of 2^53 users
is many orders of magnitude more expensive than realistic attacks---but
performing 2^256 computations is many, many orders of magnitude beyond
that. NIST is suddenly asking _every_ submission to provide an option
for 2^256 security, which the call for proposals didn't do; if NIST is
serious about 2^256 security, then why would it ignore an attack that's
vastly cheaper than 2^256 computations? Yes, there are rules such as at
most 2^64 ciphertexts per user, but what I've explained above fits
within every limit that NIST has announced.

To be clear, at the moment this is only a _proof gap_ for Kyber, and
maybe this proof gap can be fixed, while the other submission has an
actual attack. But maybe Kyber's proof gap reflects an actual Kyber
attack! (Btw, I didn't find warnings about this anywhere in the Kyber
documentation.) The point is that the 128-bit AES block size raises
questions for careful security reviewers, and figuring out whether
there's a security problem here is time taken away from analyzing other
risks. It's hard enough to get post-quantum cryptography right without
this sort of distraction.

I don't mean to suggest that Kyber is uniquely problematic here. The
starting difficulty is that LPR (Product NTRU) isn't a deterministic
PKE. This forces the system specification to include derandomization, so
there has to be a specification of how to expand a short message into
all necessary randomness. NIST's pressures then push people towards AES,
as explained above. (One then might as well also use AES to expand the
public seed---this isn't where the implementation-security concerns are
coming from, and the possibility that AES is less secure than SHAKE here
is very far down my list of risks.) One sees the same effects in, e.g.,
NTRU LPRime---but not in Streamlined NTRU Prime, since Quotient NTRU
_is_ a deterministic PKE.

As a side note, it's interesting that LPR not being deterministic is
also what's stopping LPR from having a tight QROM IND-CCA2 proof even
assuming IND-CPA, whereas Streamlined NTRU Prime (and Quotient NTRU more
generally) has a tight QROM IND-CCA2 proof merely from OW-Passive via
https://eprint.iacr.org/2019/590. (There's only a square-root change in
the success probability, no extra factor such as the computation depth.)
Maybe improved QROM proof techniques will somehow produce a tight LPR
proof, but maybe a tight proof isn't possible, and maybe there's an
actual security loss. I'm not aware of any literature trying to build
this type of attack.

> Most new cortex m3/m4 chips now have AES constant time available.

You mean as coprocessors, not as part of the ARM Cortex-M4 core, right?
Does "most" mean just 51%, or something stronger? Does it mean by model,
or by volume? Are there actual statistics somewhere, or is this just a
guess? Can you show us public software that takes advantage of this, and
documentation of the resulting speeds? How many of these CPUs does the
same software support?

Everyone agrees that AES _can be_ implemented in constant time on every
existing CPU, and that appropriate hardware support makes this the
fastest way to implement AES on _some_ CPUs. The big problem with AES
from a systems-security perspective is that these facts don't add up to
a credible plan to eliminate table-lookup implementations of AES:

* People who simply want to copy AES software that works are likely
to find software using table lookups. Why wouldn't they use it?
(See, e.g., https://github.com/search?q=aes and try several links.)

* People who want to reimplement AES rather than copying it are
likely to find documentation of the table-lookup approach. (For
example, the AES standard points to the AES proposal for
implementation advice, and that proposal tells implementors in
detail how to use S-tables and T-tables.)

* People who hear about bitslicing, CPU-specific assembly, etc. will
by default reject these on simplicity grounds.

* Performance pressure _sometimes_ convinces CPU manufacturers to add
AES hardware, and _sometimes_ convinces software people to use the
AES hardware, but this is a fragile chain that often fails.

* Security warnings _sometimes_ reach implementors, and _sometimes_
convince them to do whatever is necessary to avoid table-lookup AES
implementations, but again this is fragile.

Constant-time Cortex-M4 AES code will let Kyber advertise safe
performance, while losing nearly 100000 cycles. Requiring the platform
to have an AES coprocessor might do better in speed, while forcing more
forks in the code. None of this changes the bigger problem that a
post-quantum system specifying AES will end up with some table-lookup
implementations.

> CHACHA is only faster in software right, to be clear?

No. Hardware implementations of ChaCha20 are much more efficient than
hardware implementations of AES. The starting point is that ChaCha20
counter-mode encryption is 126 bit operations per bit of plaintext,
while AES-256-CTR is nearly 300 bit operations per bit of plaintext
(depending on the exact S-box strategy). Similar gaps show up in, e.g.,
energy metrics, the amount of hardware required to reach 1Gbps, etc.

The reason that the gap is even larger on ARM Cortex-M4, 13 cycles/byte
for ChaCha20 versus >100 cycles/byte for constant-time AES-256, is that
ChaCha20 takes advantage of the CPU's built-in addition circuits while
AES doesn't.

Variable-time AES can take advantage of the CPU's built-in table-lookup
circuits to get down to pqm4's 50 cycles/byte for AES-256-CTR. This is
still more than you'd expect by extrapolating from ChaCha20's 13
cycles/byte and the ratio of bit operations, because the benefit of
reusing table-lookup circuits in AES isn't as big as the benefit of
reusing addition circuits in ChaCha20.

On non-ARM chips, ChaCha20 software pays for rotations (which in
hardware are just wiring) but still benefits from additions, while
variable-time AES benefits from table lookups. It isn't a surprise for
the speed ratios to end up at roughly the ratio of bit operations.

Basically, the only chance for AES-256-CTR to be faster than ChaCha20 is
to have AES _hardware_ competing with ChaCha20 _software_. This still
isn't a guaranteed win, as the following numbers illustrate:

* 2018 Intel Cannon Lake (cannon): 0.56 ChaCha20, 0.68 AES-256-CTR.
* 2017 Intel Cascade Lake (pmnod076): 0.57 ChaCha20, 0.88 AES-256-CTR.
* 2017 Intel Skylake-X (genji548): 0.57 ChaCha20, 0.89 AES-256-CTR.
* 2016 Intel Kaby Lake (kizomba): 1.16 ChaCha20, 0.98 AES-256-CTR.
* 2015 Intel Skylake (samba): 1.16 ChaCha20, 0.95 AES-256-CTR.
* 2014 Intel Broadwell (bolero): 1.20 ChaCha20, 1.01 AES-256-CTR.
* 2013 Intel Haswell (hiphop): 1.24 ChaCha20, 1.03 AES-256-CTR.

All numbers are from SUPERCOP, and of course anyone who has better
speeds for anything is free to submit code at any time. All these
AES-256-CTR numbers are from Romain Dolbeau's code, beating OpenSSL.

Is it possible to invest so much hardware area into AES that it runs
faster than ChaCha20 software? Yes, of course, and AMD has done exactly
this. But this is an unusual corner case. I don't think AES would be the
top choice on _any_ of the selected NISTPQC platforms if NIST weren't
pressuring people to use existing NIST standards.

It seems clear that NIST's pursuit of top benchmark numbers for NISTPQC
won't change during the NISTPQC process. It's terribly risky; it's of
far less real-world value than NIST suggests; if something is horribly
broken in (say) 5 years then users will justifiably curse NIST for being
so careless; but all of this has been amply explained before, and we can
all see that NIST is emphasizing benchmark numbers anyway. So submission
teams will naturally look around for ways to save time. In particular,
submissions currently spending significant time on AES-256-CTR will get
a speed boost on Cortex-M4 by upgrading to ChaCha20, with similar Intel
performance, while dodging the AES security problems mentioned above.
But this raises the question of whether NIST will penalize submissions
using ChaCha20 simply because it isn't a NIST standard.

---Dan
signature.asc

Kevin Chadwick

unread,
Jul 26, 2020, 4:36:04 PM7/26/20
to pqc-...@list.nist.gov
On 2020-07-26 15:56, D. J. Bernstein wrote:

I'm wondering if I understand the criticism or definition of "90s crypto" now.
Obviously there are details that I am missing, that you would understand. I
simply wanted to stress that the importance of AES hw support should not be
underestimated.

I therefore wonder if having long term hw(hopefully)/sw and short term hw/aes
variants might be a good way to go?

>> Most new cortex m3/m4 chips now have AES constant time available.
> You mean as coprocessors, not as part of the ARM Cortex-M4 core, right?

I mentioned a separate processor but energy micros chips that silabs acquired,
have had memory mapped AES hardware on their cortex-m3s for > 10 years.

> Does "most" mean just 51%, or something stronger? Does it mean by model,
> or by volume? Are there actual statistics somewhere, or is this just a
> guess?

I haven't done an analysis but I know that our server product has AES hw
acceleration in both it's main processor and it's microchips. I find it hard to
believe that anyone would produce a design without hw AES being available these
days. I even implemented a CHACHA PRNG in sw but then dropped it when the new
chips with AES TRNGS, came out.


> Can you show us public software that takes advantage of this, and
> documentation of the resulting speeds?

https://www.silabs.com/documents/public/application-notes/AN0955.pdf

The above document may be old but states

CBC encrypt 2418 cycles
CBC decrypt 3480 cycles

That isn't the entire picture for us though as the hw support means that ua
rather than ma are used during that time with the main processor clock shut off,
during the block encryption.

The latest code has been deprecated on github but is part of simplicity studio.
I wrote my own without the for loops but here is a silabs example.

"https://github.com/a-v-s/Gecko_SDK/blob/master/platform/emlib/src/em_aes.c"

They do provide the following public example but it has a lot more abstractions
that hides the simplicity of.

Configure 256/128
write key to registers
write 4 data blocks in 32 bit chunks to register
run op and wait
read data from register

"https://github.com/SiliconLabs/peripheral_examples/tree/master/series2/cryptoacc/cryptoacc_aescrypt"

There is also the following, but again contains much abstraction.

"https://docs.silabs.com/mbed-tls/latest/"

I can't afford to spend the time producing metrics, sorry.

> How many of these CPUs does the same software support?

I'm not sure that I understand this question? I'm not actually a huge fan of
AES-GCM and prefer AES-SIV myself, but then I am not running a huge cloud service.

Donald Costello

unread,
Jul 26, 2020, 5:47:12 PM7/26/20
to Kevin Chadwick, pqc-...@list.nist.gov, dcos...@cse.unl.edu
I wonder if you folks have a grasp on reality.
I am a well-trained Cryptology Professor who thought of himself and was considered by others as an expert in the field.
I read " I'm wondering if I understand the criticism or definition of "90s crypto" now.
Obviously there are details that I am missing, that you would understand. I simply wanted to stress that the importance of AES hw support should not be underestimated" and know that somebody is being misunderstood and want shout" I have no idea what you are talking about"
Do you really think that technical staff in the field know what you are talking about or have time to learn these subtleties.

You have to take time and get reality in check and stop wasting time and money on the " "90s" version parameter sets"!




-----Original Message-----
From: pqc-...@list.nist.gov <pqc-...@list.nist.gov> On Behalf Of Kevin Chadwick
Sent: Sunday, July 26, 2020 4:17 PM
To: pqc-...@list.nist.gov
Subject: Re: [pqc-forum] "90s" version parameter sets

On 2020-07-26 15:56, D. J. Bernstein wrote:

I'm wondering if I understand the criticism or definition of "90s crypto" now.
Obviously there are details that I am missing, that you would understand. I simply wanted to stress that the importance of AES hw support should not be underestimated.

I therefore wonder if having long term hw(hopefully)/sw and short term hw/aes variants might be a good way to go?

>> Most new cortex m3/m4 chips now have AES constant time available.
> You mean as coprocessors, not as part of the ARM Cortex-M4 core, right?

I mentioned a separate processor but energy micros chips that silabs acquired, have had memory mapped AES hardware on their cortex-m3s for > 10 years.

> Does "most" mean just 51%, or something stronger? Does it mean by
> model, or by volume? Are there actual statistics somewhere, or is this
> just a guess?

I haven't done an analysis but I know that our server product has AES hw acceleration in both it's main processor and it's microchips. I find it hard to believe that anyone would produce a design without hw AES being available these days. I even implemented a CHACHA PRNG in sw but then dropped it when the new chips with AES TRNGS, came out.


> Can you show us public software that takes advantage of this, and
> documentation of the resulting speeds?

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.silabs.com_documents_public_application-2Dnotes_AN0955.pdf&d=DwIBaQ&c=Cu5g146wZdoqVuKpTNsYHeFX_rg6kWhlkLF8Eft-wwo&r=3Dkkk4pPWwnC1h6QZPCoMATlDIimTLkvGSdzt4YZoLU&m=GVGIaHJJnakiP4_4gbp51QmyqeudsTLt5qJXMcMhnT8&s=E7KOPH4Fu6gjA6nt1-fk7Ms7SAg5ozJ0ZlmR7T330sg&e=

The above document may be old but states

CBC encrypt 2418 cycles
CBC decrypt 3480 cycles

That isn't the entire picture for us though as the hw support means that ua rather than ma are used during that time with the main processor clock shut off, during the block encryption.

The latest code has been deprecated on github but is part of simplicity studio.
I wrote my own without the for loops but here is a silabs example.

"https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_a-2Dv-2Ds_Gecko-5FSDK_blob_master_platform_emlib_src_em-5Faes.c&d=DwIBaQ&c=Cu5g146wZdoqVuKpTNsYHeFX_rg6kWhlkLF8Eft-wwo&r=3Dkkk4pPWwnC1h6QZPCoMATlDIimTLkvGSdzt4YZoLU&m=GVGIaHJJnakiP4_4gbp51QmyqeudsTLt5qJXMcMhnT8&s=AxMD5QMJBIc1kIvId2PaxQ24745cjDtP0nYA0ZaqZEg&e= "

They do provide the following public example but it has a lot more abstractions that hides the simplicity of.

Configure 256/128
write key to registers
write 4 data blocks in 32 bit chunks to register run op and wait read data from register

"https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SiliconLabs_peripheral-5Fexamples_tree_master_series2_cryptoacc_cryptoacc-5Faescrypt&d=DwIBaQ&c=Cu5g146wZdoqVuKpTNsYHeFX_rg6kWhlkLF8Eft-wwo&r=3Dkkk4pPWwnC1h6QZPCoMATlDIimTLkvGSdzt4YZoLU&m=GVGIaHJJnakiP4_4gbp51QmyqeudsTLt5qJXMcMhnT8&s=vMZdIZXfbV0ncMpxtkCAQgJRLUXsO8uL9--sS51pBrI&e= "

There is also the following, but again contains much abstraction.

"https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.silabs.com_mbed-2Dtls_latest_&d=DwIBaQ&c=Cu5g146wZdoqVuKpTNsYHeFX_rg6kWhlkLF8Eft-wwo&r=3Dkkk4pPWwnC1h6QZPCoMATlDIimTLkvGSdzt4YZoLU&m=GVGIaHJJnakiP4_4gbp51QmyqeudsTLt5qJXMcMhnT8&s=l-xSlemKyhJTmxlf_aXeQck3qszxMqZTLKUWNVCShZM&e= "

I can't afford to spend the time producing metrics, sorry.

> How many of these CPUs does the same software support?

I'm not sure that I understand this question? I'm not actually a huge fan of AES-GCM and prefer AES-SIV myself, but then I am not running a huge cloud service.

--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_a_list.nist.gov_d_msgid_pqc-2Dforum_0a438e96-2D66b0-2D7cfb-2De7d3-2D01b061d7c54a-2540gmail.com&d=DwIBaQ&c=Cu5g146wZdoqVuKpTNsYHeFX_rg6kWhlkLF8Eft-wwo&r=3Dkkk4pPWwnC1h6QZPCoMATlDIimTLkvGSdzt4YZoLU&m=GVGIaHJJnakiP4_4gbp51QmyqeudsTLt5qJXMcMhnT8&s=tS9oabPxnA9kQICImscaL0s84wKzknmPbutKOXilHWs&e= .

Markku-Juhani O. Saarinen

unread,
Jul 26, 2020, 6:18:36 PM7/26/20
to pqc-forum
On Sunday, July 26, 2020 at 9:36:04 PM UTC+1, Kevin Chadwick wrote:
(..) 
Obviously there are details that I am missing, that you would understand. I 
simply wanted to stress that the importance of AES hw support should not be
underestimated.

I therefore wonder if having long term hw(hopefully)/sw and short term hw/aes
variants might be a good way to go?

Hi Kevin,

I tend to side with Nigel's point here and not sacrifice security for such short-term gains in performance due to legacy hardware features. Microchip generations come and go. And even AES-256 does not quite reach the security level or versatility of, say, SHAKE-128 and its variants in these use cases.

Some additional commentary:

>> Most new cortex m3/m4 chips now have AES constant time available.
> You mean as coprocessors, not as part of the ARM Cortex-M4 core, right?

I mentioned a separate processor but energy micros chips that silabs acquired,
have had memory mapped AES hardware on their cortex-m3s for > 10 years.

The problem with such ARM Cortex vendor-specific memory-mapped AES implementations is that they're often ignored by middleware. There is no universal feature discovery mechanism I know of for these things. It is typical that vendors do brain surgery on crypto libraries (eg MbedTLS) to support them and those modifications disappear on the next MbedTLS update or when a product line (with shared firmware) has that one chip that doesn't have the exact same type of AES hardware. Additional many such "AES engines" are limited to AES-128 or CBC.

But you can go that way with SHA-3 too and have it memory-mapped. I don't quite know why MCU vendors are so reluctant to implement new standard (FIPS 202!) cryptography since microcontroller makers typically have chip models galore in all kinds of memory and IO configurations.. :) Anyway, since we build firmware ourselves, our MCU core has a memory-mapped SHA3/SHAKE f1600 permutation so I know how useful it is, especially for this PQC stuff. It's easy to add to an ARM core too -- actually much, much easier to build than SHA-2 hardware support (2x, thanks to two separate SHA-2 algorithms), which I would want to jettison even before AES, to be honest. Quantitatively SHA-2s is not really used that much and can be implemented securely in software (all of those round constants kind of make you want to).

I don't think that it would take too long for microcontroller manufacturers to add f1600/SHAKE/SHA3 to their designs if they want to enter the post-quantum market. It would be a competitive advantage. After all, a Keccak accelerator gives you typically 2x+ speed on lattice KEM finalists and something like 20x for XMSS and hash-based signatures. It's easy to keep in sleep mode when not used. So perhaps SiLabs can just look at adding f1600 perhaps?

Changing the ISA is not as straightforward but there is progress there too. For RISC-V the Crypto TG is currently adding 32- and 64-bit rotations (from bitmanip) and fused code sequences to support ChaCha and FIPS 202 f1600. Given the "quantitative" philosophy of RISC-V, this nod to ChaCha is easy to justify due to it being in the Linux RNG, TLS 1.3, WireGuard, etc. The Keccak thing is more about future-proofing, but it's just rotations! Especially on RV64IB with 32 general-purpose registers, Keccak is just fantastic (when rotations are available). Of course, other FIPS algorithms AES and SHA-2 have support on the ISA level too but that hardware can't really be used for much else. We'll have to keep having this AES hardware perpetually because it's difficult to implement securely in software even if it's not used very much -- and those software implementations are large and cumbersome. Fortunately for RV32 AES ISA support can be minimized to a single S-Box implementing a T-tables style flow in hardware, so the price is not super high. The Crypto ISA is still unratified, of course, so all of this may change.


I haven't done an analysis but I know that our server product has AES hw
acceleration in both it's main processor and it's microchips. I find it hard to
believe that anyone would produce a design without hw AES being available these
days. I even implemented a CHACHA PRNG in sw but then dropped it when the new
chips with AES TRNGS, came out.

In post-quantum cryptography, these algorithms are not used for random number generation. The algorithms are used inside these PQC algorithms in a way that would make such variants incompatible with each other.

I think NIST caused some unnecessary confusion by mixing the SP 800-90A DRBG standards with PQC seed expansion. A FIPS-certified DRBG module can't really be in this function. Anyway, that standard has not been designed as an XOF. The seeding (input) mechanism is exceedingly weak in comparison.


> How many of these CPUs does the same software support?

I'm not sure that I understand this question? I'm not actually a huge fan of
AES-GCM and prefer AES-SIV myself, but then I am not running a huge cloud service.

Yep, I don't think that anyone is a fan of the carryless multiply required by GCM (and some versions of SIV), but realities are such that it is actually required by TLS 1.3 as it is the only mandatory algorithm in it. That circuitry is not used for much else and it's a big multiplier!

Kevin Chadwick

unread,
Jul 26, 2020, 7:28:37 PM7/26/20
to pqc-...@list.nist.gov
On 2020-07-26 22:18, Markku-Juhani O. Saarinen wrote:


> I tend to side with Nigel's point here and not sacrifice security for such
> short-term gains in performance due to legacy hardware features. Microchip
> generations come and go. And even AES-256 does not quite reach the security
> level or versatility of, say, SHAKE-128 and its variants in these use cases.

I certainly want the best possible PQ crypto and would suggest NIST algo usage
should only be a tie breaker on reflection.

I guess PQC should only be needed in quite a long time from now. However it's
certainly true that we have some products with an intended long life that may
not be upgradeable to PQC, unless it is based on AES, due to energy usage. It
may be true that workarounds such as long delays or board replacement are
acceptable in the worst case.

>
> The problem with such ARM Cortex vendor-specific memory-mapped AES
> implementations is that they're often ignored by middleware. There is no
> universal feature discovery mechanism I know of for these things. It is typical
> that vendors do brain surgery on crypto libraries (eg MbedTLS) to support them
> and those modifications disappear on the next MbedTLS update or when a product
> line (with shared firmware) has that one chip that doesn't have the exact same
> type of AES hardware. Additional many such "AES engines" are limited to AES-128
> or CBC.
>

AES with a hw engine is actually very simple though. ECC is certainly harder to
maintain hw support for because it is done with MUL instructions and various
other. Rather than an all in one AES instruction and that does mean that timing
attacks need consideration and the code readability suffers. I don't actually
use the Silabs code as I implemented ecdh before them but I do reference it and
it is maintained by them currently.

> But you can go that way with SHA-3 too and have it memory-mapped. I don't quite
> know why MCU vendors are so reluctant to implement new standard (FIPS 202!)
> cryptography since microcontroller makers typically have chip models galore in
> all kinds of memory and IO configurations.. :)

Perhaps it is also the cost of documentation and support for microchips that is
often incorrect.

> Anyway, since we build firmware
> ourselves, our MCU core has a memory-mapped SHA3/SHAKE f1600 permutation so I
> know how useful it is, especially for this PQC stuff. It's easy to add to an ARM
> core too -- actually much, much easier to build than SHA-2 hardware support (2x,
> thanks to two separate SHA-2 algorithms), which I would want to jettison even
> before AES, to be honest. Quantitatively SHA-2s is not really used that much and
> can be implemented securely in software (all of those round constants kind of
> make you want to).

The silabs devices from the last ~4 years, have a SHA2 instruction, it is
slightly more code to use than AES-ctr alone.

> After all, a Keccak accelerator gives you typically 2x+ speed on lattice KEM
> finalists and something like 20x for XMSS and hash-based signatures. It's easy
> to keep in sleep mode when not used. So perhaps SiLabs can just look at adding
> f1600 perhaps?

Interesting, I guess it would speed up SPHINCS too?

My Sphincs+ based on the reference code was a lot slower than the PQC projects.
Not sure how power usage would compare. Thankfully, we only need it very
occasionally.

I guess crypto and security are more widely regarded these days and may well be
upgraded more regularly than I was thinking.

Regards, Kc

Donald Costello

unread,
Jul 26, 2020, 8:44:45 PM7/26/20
to Kevin Chadwick, pqc-...@list.nist.gov, dcos...@cse.unl.edu
Could some of you take the time to tell me where the remarks I made recently are wrong?

-----Original Message-----
From: pqc-...@list.nist.gov <pqc-...@list.nist.gov> On Behalf Of Kevin Chadwick
Sent: Sunday, July 26, 2020 7:10 PM
To: pqc-...@list.nist.gov
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_a_list.nist.gov_d_msgid_pqc-2Dforum_02ca9238-2Daca6-2D3f29-2D0d84-2Dd000a6935af5-2540gmail.com&d=DwIBaQ&c=Cu5g146wZdoqVuKpTNsYHeFX_rg6kWhlkLF8Eft-wwo&r=3Dkkk4pPWwnC1h6QZPCoMATlDIimTLkvGSdzt4YZoLU&m=3Ny4RMTc9PXsSsVhxJX6gEN9Js9EypXJK24I82Ga9YQ&s=dosoTIlWUbLT-HDcUzExQIbMQdeKv9t1xNgo9BZfDxM&e= .

D. J. Bernstein

unread,
Jul 27, 2020, 3:13:36 AM7/27/20
to pqc-...@list.nist.gov
> > How many of these CPUs does the same software support?
> I'm not sure that I understand this question?

I understand that the software that you linked to will work on Silicon
Labs EFM32 CPUs. Will it work on 51% (let's say by volume) of new
Cortex-M4 CPUs? Or even 10%?

The standard path to compatibility of coprocessors across CPUs starts
with ARM's official instructions for talking to coprocessors (mcr etc.),
and continues with CPU designers copying further interface details. The
code you linked to instead uses memory-mapped I/O, which tends to
indicate that the CPU designer never set a goal of compatibility with
CPUs from other designers. (Sure, it's _possible_ that CPU designers
have magically synchronized memory addresses and other details.)

AES hardware can be much faster than AES table-lookup software, but if
the question is "can a NISTPQC standard safely use AES?" then one has to
ask about all the ways that people _won't_ be using constant-time AES
implementations. The combination of

* NIST selecting Cortex-M4 for embedded benchmarks,
* NIST advertising Kyber's performance in general, and
* Kyber's top advertised Cortex-M4 speeds using table-lookup AES

demonstrates that this is a nonzero risk.

---Dan
signature.asc

Kevin Chadwick

unread,
Jul 27, 2020, 4:49:21 AM7/27/20
to pqc-...@list.nist.gov
On 2020-07-27 07:13, D. J. Bernstein wrote:
> The standard path to compatibility of coprocessors across CPUs starts
> with ARM's official instructions for talking to coprocessors (mcr etc.),
> and continues with CPU designers copying further interface details. The
> code you linked to instead uses memory-mapped I/O, which tends to
> indicate that the CPU designer never set a goal of compatibility with
> CPUs from other designers. (Sure, it's _possible_ that CPU designers
> have magically synchronized memory addresses and other details.)
>

Of course no, they wanted a competitive advantage. ARM only license blue prints
anyway. It is upto the chip makers to decide what to implement, in the entirety.

> AES hardware can be much faster than AES table-lookup software, but if
> the question is "can a NISTPQC standard safely use AES?" then one has to
> ask about all the ways that people _won't_ be using constant-time AES
> implementations. The combination of
>
> * NIST selecting Cortex-M4 for embedded benchmarks,
> * NIST advertising Kyber's performance in general, and
> * Kyber's top advertised Cortex-M4 speeds using table-lookup AES
>
> demonstrates that this is a nonzero risk.

Of course the counter argument is that using the AES engine is easier to
implement than say tweetnacl or CHACHA and has inherent guarantees of no memory
fragmentation or timing issues.

Maybe sw implementations should be discouraged or uncertified. The hw engines
and TRNGS come with NIST certifications.

At the same time, I don't want crypto being held back. Hiding it away in
hardware, may do harm in that respect or others?

Kevin Chadwick

unread,
Jul 27, 2020, 5:02:33 AM7/27/20
to pqc-...@list.nist.gov
On 2020-07-27 07:13, D. J. Bernstein wrote:
> demonstrates that this is a nonzero risk.

I wonder. Is it not true that a similar argument of guaranteeing that there are
no timing issues in sw exists. What language will processors be running? Will it
be C when the compilers have consistently prioritised performance over security,
for decades.

Or will there be implementations in Rust, Z-lang, tinyGo? I am looking into
whether tinyGo can be used in a way that avoids memory fragmentation.

Peter Schwabe

unread,
Jul 27, 2020, 5:06:27 AM7/27/20
to Ilya Mironov, Peter Schwabe, pqc-...@list.nist.gov
Ilya Mironov <mir...@gmail.com> wrote:
> Peter,

Dear Ilya, dear all,
I'm answering only on behalf of myself, not the Kyber team.

It would probably be fine to use AES-128 or something even more
lightweight as long as the output "looks uniformly random and doesn't
interact in weird ways with the lattice problem". I'm not aware of any
proper formalization of what is required from the seed expansion; using
a XOF like SHAKE-128 is maybe overkill, but also feels clean. As Kyber
uses a 32-byte seed and as we needed AES-256 as PRF for the secrets
anyway, we just went with AES-256 for the 90s version.

Regarding the 90s version of Kyber, my recommendation is not to use it,
i.e., I'm with Nigel and recommend the use of Kyber with the
Keccak-derived primitives. As we stated before, the reason to introduce
Kyber-90s was mainly to illustrate how fast Kyber is if the symmetric
crypto is supported by HW acceleration.


All the best,

Peter
signature.asc

Moody, Dustin (Fed)

unread,
Jul 30, 2020, 11:48:58 AM7/30/20
to D. J. Bernstein, pqc-forum

"Looking forward to NIST's comments on whether NIST will, for 

non-performance reasons, penalize NISTPQC submissions that upgrade from 
AES-256-CTR to ChaCha20."

 

Indeed, we will also “penalize” NISTPQC submissions that upgrade to Haraka, Ascon, Whirlpool, MD5, BLAKE2, Simon, SM3, or Bass-O-Matic cipher in exactly the same way.  

 

The PQC standardization process is not a competition to choose new symmetric primitives to standardize.  (We have one of those going on, too—the lightweight cryptography standard.)  Chacha is not a NIST standard, any more than Simon or SM3.  Perhaps this is a terrible oversight, and we should have standardized all three.  Perhaps we should consider standardizing them in the future.  But right now, none of them are NIST standard algorithms, and so we’re not going to treat them as though they were a NIST standard.  There are a *lot* of perfectly fine crypto algorithms that aren’t NIST standards.

 

We want to measure the performance of these PQC algorithms with NIST standard algorithms, so that we can assess how they will work out when used in FIPS compliant hardware or software.  We want, as much as possible, for the analysis of these submissions to be something that stands alone, rather than something that turns on some additional cryptographic primitive that we have to analyze at the same time as we analyze all the other parts of the submission. 

 

Now, we aren’t offended if you want to do an implementation of SPHINCS+ based on Haraka (as you have done) and show its performance numbers.  That’s kind-of interesting to see.  But we’re not going to compare that performance to the performance of other schemes that used SHA2 or SHA3, because that wouldn’t be a fair comparison.  

 

If we eventually standardize (for example) SPHINCS+, we will be expecting FIPS compliant implementations to use NIST standard hashes--SHA2 or SHA3.  That’s true, even though Haraka may be just fine in security terms in this application.  It’s true, even though some other people not worried about FIPS compliance will implement SPHINCS+ with Blake2 or SM3 or Streebog.  The performance numbers that matter for us are the ones using NIST standard algorithms, both for the headline stuff like hashing, and for more complicated stuff like expanding a seed into a large string.  Because when we standardize some of these schemes, their performance is going to have to be based on NIST standard algorithms, because that’s what they’ll be using.  

 

There are places where using a nonstandard algorithm is defensible--consider the use of LowMC in Picnic.  But also note that the use of this nonstandard algorithm is a major source of concern for us in any eventual standardization decision, and also that the Picnic designers have to spend extra time justifying the way LowMC is used in Picnic (it’s not really assumed to be a general-purpose secure cipher--the attacker basically gets one plaintext/ciphertext pair under an unknown key and can forge signatures if he can find any key that maps the plaintext to the ciphertext.).  

 

Suppose someone sticks a PRG based on Simon into their PQC submission to improve its performance.  We are not obliged to accept the claim that this is just as good as using AES unless we can find an attack.  (“Hey, we got it from a very trusted source--when have they ever done us wrong?”)  Instead, I think we can just say “no, we need you to use a NIST standard algorithm for this function.”  And the same is true for Chacha, Whirlpool, Snefru, Streebog, Speck, IDEA, Twofish, and any number of other crypto primitives that are probably quite secure, but that aren’t NIST standard algorithms and so we don’t default to trusting them.   

 

The NIST PQC team





From: pqc-...@list.nist.gov on behalf of D. J. Bernstein
Sent: Sunday, July 26, 2020 11:56 AM
To: pqc-forum

Subject: Re: [pqc-forum] "90s" version parameter sets

> Most new cortex m3/m4 chips now have AES constant time available.

You mean as coprocessors, not as part of the ARM Cortex-M4 core, right?
Does "most" mean just 51%, or something stronger? Does it mean by model,
or by volume? Are there actual statistics somewhere, or is this just a
guess? Can you show us public software that takes advantage of this, and
documentation of the resulting speeds? How many of these CPUs does the
same software support?
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

David A. Cooper

unread,
Jul 30, 2020, 12:13:59 PM7/30/20
to pqc-...@list.nist.gov
There seems to be a concern in this thread that submitters may feel pressured to use primitives such as AES and SHA-2 in their parameter sets even they would prefer to use other NIST-approved primitives, such as SHAKE. A scheme that uses primitives such as AES that can take advantage of current hardware will be faster than one that uses SHAKE, and there seems to be a concern that NIST will simply select the scheme that advertises that fastest performance. While the difference in computational performance between a parameter set that uses AES and SHA-2 compared to one that uses Keccak is useful input, it seems very unlikely that this difference will have any impact on the final decision of which KEM to standardize.

The call for papers mentions many criteria that will be used in evaluating the algorithms, with security being the most important. Cost, including computational efficiency, is just one of the many other criteria that are being considered, so when comparing two schemes that are considered secure and that have somewhat similar costs other criteria, such as flexibility, are far more likely to be deciding factors than relatively small differences in computational efficiency.

Looking as an example at Kyber-768 (since this parameter set in particular was mentioned), https://bench.cr.yp.to/results-kem.html#amd64-hiphop offers the following numbers:


Key Gen (cycles)
Enc (cycles)
Dec (cycles)
Public Key (bytes)
Ciphertext (bytes)
Kyber-768-90s
27316
29476
25248
1184
1088
Kyber-768
53464
74040
63916
1184
1088

Simply comparing the cost of key generation plus encapsulation and decapsulation would suggest that Kyber-768 is more than twice as costly as Kyber-768-90s, a very significant difference. However, as has been noted in this forum, in the call for proposals, and in NISTIR 8309, the overall cost of a KEM in most cases also includes the cost of transmitting a public key and a ciphertext. The actual cost of sending data relative to the cost of performing computations may vary significantly depending on the environment, but if we use an estimate of 1000 cycles/bytes, then the overall cost of generating a key pair, sending the public key, performing an encapsulation, sending the ciphertext, and performing a decapsulation is only about 4.5% more for Kyber-768 than for Kyber-768-90s. At a cost of 2000 cycles/byte, the difference drops to 2.3%.

Benchmark numbers for Kyber-768 and Kyber-768-90s on Cortex-M4 were mentioned, and using those numbers the overall performance difference is greater -- about 14% at 2000 cycles/byte and about 20% at 1000 cycles/byte. While this difference is larger, it is still not very large compared to the differences in overall costs between Kyber-768 and other schemes that offer a similar security level.

As noted in NISTIR 8309, NIST intends to select at most one of Kyber, NTRU, and Saber for standardization. While we cannot say at this point exactly what factors will result in the selection of one of these three very strong candidates over the others, it seems highly unlikely that the relatively small difference in the overall cost between Kyber-768 and Kyber-768-90s will make a difference in the final selection.

Rainer Urian

unread,
Jul 30, 2020, 1:13:57 PM7/30/20
to David A. Cooper, pqc-...@list.nist.gov
A Smartcard with 100Mhz CortexM3/M4-like CPU and VHBR NFC interface with 6.8 MBaud is quite standard. Now the performance difference gets significant.


Viele Grüße / Best regards,
Rainer


On 30. Jul 2020, at 18:14, 'David A. Cooper' via pqc-forum <pqc-...@list.nist.gov> wrote:

 There seems to be a concern in this thread that submitters may feel pressured to use primitives such as AES and SHA-2 in their parameter sets even they would prefer to use other NIST-approved primitives, such as SHAKE. A scheme that uses primitives such as AES that can take advantage of current hardware will be faster than one that uses SHAKE, and there seems to be a concern that NIST will simply select the scheme that advertises that fastest performance. While the difference in computational performance between a parameter set that uses AES and SHA-2 compared to one that uses Keccak is useful input, it seems very unlikely that this difference will have any impact on the final decision of which KEM to standardize.
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

David A. Cooper

unread,
Jul 31, 2020, 1:21:19 PM7/31/20
to Rainer Urian, pqc-...@list.nist.gov
Hello Rainer,

As you note, there may be environments in which, relatively speaking, computations are very slow and communication is very fast. Perhaps if Kyber ends of being selected for standardization, there may be some environments in which there would be a significant benefit to using the 90s parameter sets rather than the parameter sets that use Keccak. However, the main point of my message is that many factors other than computational efficiency will be considered when making a decision about which scheme to standardize, and that any belief that NIST is "pressuring submitters to use something fast in the benchmarks that NIST is looking at (so not SHAKE)" is mistaken.

In order to provide more data for consideration, the table below shows overall costs when transmitting data costs 85 cycles/byte. In order to more accurately reflect the smart card environment, I excluded the cost of key generation, since it seems highly unlikely that an ephemeral key would be generated on a smart card as part of a transaction. The numbers in the table were taken from https://github.com/mupq/pqm4/blob/master/benchmarks.md, with the exception of the second row for Saber, which was taken from https://eprint.iacr.org/2020/268.pdf. The numbers in the first row for Saber and for ntruhps4096821 are crossed out since the amount of memory required by the described PQM4 implementations is more than I would expect a smart card to have.



Enc (cycles)
Dec (cycles)
Public key (bytes)
Ciphertext (bytes)
Overall cost
Kyber-768-90s
695,432
708,683
1184
1088
2,226,816
Kyber-768
1,047,720
985,976
1184
1088
1,597,235
Saber
1,162,082
1,198,307
992
1088
2,537,189
Saber
1,616,000
1,759,000
992
1088
3,551,800
ntruhps4096821
1,077,000
1,073,275
1230
1230
2,359,375

So, even using these numbers, and even though Kyber-768 is 40% slower than Kyber-768-90s, the ordering of the algorithms is not changed by using one versus the other. Even if the ordering was changed, it is unlikely that that would matter.

Even if all of these implementations are constant time, it is unlikely that they are resistant to other types of side-channel attacks, as a smart card implementation would need to be. https://eprint.iacr.org/2020/733.pdf presents a side-channel resistant implementation of Saber, and it says that the "masked implementation features a 2.5x overhead factor." The masked implementation also requires more than 11kB of dynamic memory, which is more than many smart cards will have. The overhead costs of side-channel resistant implementations of Kyber and NTRU may be greater or less than this, which could make comparisons based on the table above meaningless. It is not even clear (to me) whether there could be a different amount of overhead for a required side-channel resistant implementation of Kyber-768-90s compared to one for Kyber-768.

Even if the numbers were comparing side-channel resistant implementations that could also fit on a smart card, these numbers would still represent just one piece of information among many that would be considered when making a standardization decision. As we have stated, there are many evaluation criteria, and security is the most important. NISTPQC will not simply be selecting the scheme that has the best performance numbers. Having performance information for different platforms is very useful information, but it is not the only information that will be considered when making a selection. We also need to take into consideration that the implementations that are currently being used in benchmarks may not be the more efficient possible implementations -- there may be implementations that are faster and/or that use less memory.

While it is unlikely that the difference in computational efficiency between Kyber and the 90s version of Kyber would affect the decision of which scheme to select for standardization, as was noted in the email that Ray Perlner sent, if Kyber is selected, it is very possible both parameter sets would be included in the standard. So, if in a particular environment there were a compelling reason to choose the 90s version over the one that uses Keccak, the option may be available. We will of course take into account the concerns raised by Dan Bernstein, Nigel Smart, and Peter Schwabe, and any additional community feedback we receive, in making any final decisions about whether, and if so how, to endorse the 90s versions of Kyber, if Kyber is selected for standardization.

Mike Hamburg

unread,
Jul 31, 2020, 3:58:30 PM7/31/20
to David A. Cooper, Rainer Urian, pqc-...@list.nist.gov
Hello David,

> On Jul 31, 2020, at 10:21 AM, 'David A. Cooper' via pqc-forum <pqc-...@list.nist.gov> wrote:
>
> It is not even clear (to me) whether there could be a different amount of overhead for a required side-channel resistant implementation of Kyber-768-90s compared to one for Kyber-768.

There certainly will be a different ratio, and I expect it to favor the SHAKE versions in most cases. This is because the CCA transform must be protected, which includes most of the hashing and a high fraction of the AES computations. Since SHA-3 is easier to protect than AES and especially SHA-2, I would expect the 90s parameter sets to have less advantage for power- and EM- side-channel-resistant applications.

The existing comparison is not apples-to-apples: it compares AES in hardware (or T-tables on microcontrollers) against SHAKE in software using arithmetic only. For side-channel-resistant applications, this comparison is no longer valid, because accelerators in typical CPUs resist only timing attacks but not other side-channel attacks, and a T-table approach will likely not resist power or EM channels either.

The new comparison point would need to be SCA-resistant software in both cases or (more likely) SCA-resistant hardware in both cases, which IIUC is likely to favor SHAKE at least for speed. I’m not sure about size in hardware: there’s a larger area overhead for protecting AES and SHA2, but fast SHAKE cores start out with a larger area.

The same would be true for a Saber-90s variant, since it also uses a variant of the Fujisaki-Okamoto transform that relies on protected hashing. I expect that it would also apply to hypothetical variants using ChaCha or Blake2, which are is amazing in software but take a large hit from DPA protection (this is typical of ARX constructions). It might also apply to Haraka, which is based on AES and so has a higher overhead than SHAKE.

Cheers,
— Mike Hamburg

John Mattsson

unread,
Jul 7, 2022, 3:47:42 AM7/7/22
to pqc-...@list.nist.gov

Hi,

 

As it is now clear that NIST is recommending CRYSTALS-Kyber and CRYSTALS–Dilithium as the primary algorithms it might be good to revive this thread.

 

The current CRYSTALS specifications use the following algorithms:

 

Kyber: SHAKE128, SHAKE256, SHA2-256, SHA3-512

Kyber 90s: AES-256, SHA-256, SHA-512, SHAKE256

Dilithium: SHAKE256, SHAKE128

Dilithium 90s: SHAKE256, AES-256

 

Kyber 90s looks very messy with four different primitives and will likely lead to more implementations with side-channel vulnerabilities. I would strongly prefer if NIST only standardized CRYSTALS with Keccak. That would hasten general support for hardware acceleration of Keccak. The best would have been if NIST said clearly 5 years ago that PQC will only use Keccak. That many CPUs today have acceleration of SHA-1 but not SHA-3 is just tragic and should not be rewarded.

 

The only reason to standardize the 90s versions would be short-term speed improvements before Keccak hardware acceleration is generally available. I am not sure that extra speed is necessary. Kyber and Dilithium already have very good performance and the performance will improve when Keccak hardware acceleration is generally available. Specifying 90s versions would delay general availability of Keccak hardware acceleration and reward vendors that are stuck in the 90s.

 

I also strongly think NIST should go ahead and specify the AEAD mode of Keccak that we were promised. For vendors wanting crypto-agility and NIST compliance, the lack of an NIST approved Keccak-based AEAD mode is problematic. A lot of constrained hardware and software implementations would like to implement a single NIST approved cryptographic primitive and use that for CCA-encryption, CPA-encryption, variable-length hash, variable-length MAC, KDF, etc. Currently that is not possible.

 

Cheers,

John Prueß Mattsson

Ruben Niederhagen

unread,
Jul 7, 2022, 4:09:27 AM7/7/22
to pqc-...@list.nist.gov
On 07/07/2022 15:47, 'John Mattsson' via pqc-forum wrote:
> [...] I would strongly prefer if NIST only standardized CRYSTALS
> with Keccak. [...]
>
> The only reason to standardize the 90s versions would be short-term
> speed improvements before Keccak hardware acceleration is generally
> available. [...]
Keccak has a relatively large state which might be an issue for embedded
devices with strong resource restrictions. This also applies to the size
of hardware accelerators.

Hence, some diversity in the hash-function choice might be desirable for
some applications.

Ruben

John Mattsson

unread,
Jul 7, 2022, 4:55:09 AM7/7/22
to Ruben Niederhagen, pqc-...@list.nist.gov
I agree but the solution is probably not to force implementation of all the four primitives AES-256, SHA-256, SHA-512, and SHAKE256. The future LWC ”winner” might be a good candidate for this.

Cheers,
John

From: pqc-...@list.nist.gov <pqc-...@list.nist.gov> on behalf of Ruben Niederhagen <ru...@polycephaly.org>
Sent: Thursday, July 7, 2022 10:09:01 AM
To: pqc-...@list.nist.gov <pqc-...@list.nist.gov>
Subject: Re: [pqc-forum] "90s" version parameter sets
 
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
Reply all
Reply to author
Forward
0 new messages