Our comments on Rijndael-256-256 and similar ciphers - and a proposal for a way to go forward

616 views

Skip to first unread message

Roberto Avanzi

unread,

Jun 26, 2025, 5:58:45 AMJun 26

to ciphermodes-forum

Hello

for the information of everybody and to encourage a debate in this forum, we report the email we sent to NIST a few minutes ago, in text format.

Authors:
Roberto Avanzi [1,2],
Avik Chakraborti [3],
Bishwajit Chakraborty [4]
Eik List [5],
Stefan Lucks [6]
and William Whyte [1]

Affiliations:
[1] Qualcomm {ravanzi,wwhyte}@qti.qualcomm.com
[2] Caesarea Rothschild Institute, University of Haifa, Israel roberto...@icloud.com
[3] Institute for Advancing Intelligence, TCG Crest, Kolkata, India avik.cha...@tcgcrest.org
[4] School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore bishu.m...@gmail.com
[5] Independent Researcher el...@posteo.net
[6] University of Weimar, Germany stefan...@uni-weimar.de

Comments:

When the AES was standardized, a block size of 128 bits and key sizes of 128, 192, or 256 bits were selected.
The Rijndael algorithm on which the AES is based, however, supports any block and key size that are multiples of 32 bits, up to 256 bits.

The 128-bit block size versions have undergone extensive cryptanalysis.
This analysis has led to significant advances in techniques for evaluating block cipher security.

At the third NIST workshop on cipher modes, it was suggested that NIST standardize Rijndael with 256-bit blocks.
While some participants supported this idea, some even enthusiastically, we believe that standardizing the version described 25 years ago would be unwise for the following reasons:

1. Rijndael-256-256 can be implemented using AES instructions available in most commodity processors. While we believe that reusing the AES instructions is an excellent idea, we note that Rijndael-256-256 requires undoing the ShiftRows operation performed by AES instructions before applying the specific ShiftRows operation for the 256-bit variant. Vector ISAs are optimized to operate within 128-bit lanes, even when longer vectors are available. Hence, Rijndael-256-256 requires the use of whole cross-lane shuffle operations – these are available but usually have high latency. As a result, Rijndael-256-256 runs 2-5 times slower than AES-256. These extra operations waste computational resources. While we can easily envision that future iterations of the ISAs would add specific variants of the AES instructions, all currently deployed machines would be penalized.

2. The security of Rijndael-256-256 has not received the same scrutiny as the 128-bit variants. Most cryptanalysis has focused on the 128-bit versions, leaving the security of Rijndael 256-256 largely unexplored. The two ciphers share only weak security relationships.

3. The AES key schedule shows its age after 25 years. While the round function remains strong, the key schedule is now knows to be a weak design. AES-192 and AES-256 have weak related-key resistance. Recent work by Leurent et al. (Eurocrypt 2021) revealed that the key schedule’s apparent diffusion actually operates in four separate 32-bit lanes under a suitable linear transformation. Modern automated tools now enable the search for simpler, more robust key schedules using only cell shuffles and round constants (for the AES, for instance, see Derbez et al. (Eurocrypt 2013), Boura et al. (ACNS 2024)).

4. The Rijndael family requires storing the key schedule in RAM for good performance — because changing key is an expensive process. This creates vulnerability to cold boot attacks. Kamal and Youssef (SECURWARE 2010) demonstrated that having the full key schedule in memory makes key recovery much easier despite bit decay. Any new design should improve upon this.

5. A robust and lightweight key schedule facilitates the development of modes of operation where the key is also changed in various parts of the cipher (even in the actual encryption) since no related-key relations can be used against the mode. In particular, the design of BBB modes of operation would be simplified. Full related- key security would even allow the used of part of the key as a tweak, or as a field of the key to which to add a shorter tweak (in other words, a tweakey), without the need of a separate tweak schedule.

6. A fast, reversible key schedule enables bidirectional use of the cipher in modes of operation. The current AES key schedule has restricted mode design. Using the original Rijndael-256-256 key schedule would further limit this area of research.

7. Returning to the 128-bit versions, we note that the properties of the “Megabox” consisting of (say) 2 rounds of AES have been investigated very thoroughly — starting with Daemen and Rijmen’s own work (IET Information Security, Volume 1, No. 1, March 2007, also IACR ePrint 2006/039) 20 years ago. This implies that one can instead use it as a 128-bit keyed S-Box, and design ciphers around it. Examples abound, such as Haraka v2, Pholkos, the comparative study by Shiba et al. (doi:10.1049/ISE2.12053) and, more recently, Vistrutah (IACR ePrint 2025/976). In fact, no attacks on Pholkos have been successfully mounted during the last 5 years, and other papers have studied similar constructions – in fact this area of research is now at least 11 years old (see Point 9 below).

8. We propose that the NIST considers iterative ciphers built from “steps” consisting of two parallel AES rounds on 128-bit state lanes (or “slices”) followed by a simple layer that mixes the cells from the lanes. Modern CPUs have several hard-wired permutations that can be used to mix the cells from the lanes without the heavy impact of the aforementioned general purpose cross-lane permutations.

9. For 512-bit constructions with suitable mixing layers, there exist isomorphisms from AES-128 onto such constructions. These mappings, introduced by Biryukov and Khovratovich (ISC 2014), map AES-128 state cells to 32-bit columns (and then diagonals), columns to 128-bit lanes, and rounds to steps. Attacks on AES map to the 512-bit cipher with complexity roughly raised to the fourth power — or more due to the fact that the Megaboxes are keyed. The current best attacks on the AES cover 7 rounds, and they would map to attacks on 14 rounds of the 512-bit cipher. An 18-round version would provide high security margins while maintaining speed comparable to Rijndael-256-256, because of the reduced number of state permutation operations.

10. While 256-bit versions lack these isomorphisms, modern automated tools facilitate adapting cryptanalysis techniques (see IACR ePrint 2025/976 and references therein).

11. NIST should solicit proposals for key schedules for such 256-bit and (possibly) 512-bit constructions, evaluating them based on security level and implementation efficiency. With such a specific focus, the selection process can still be made shorter, within a reasonable time-frame of one year.

We also provide an example of a block cipher, which follows the design criteria we recommend. This is https://eprint.iacr.org/2025/976 .

Jacob Christian Munch-Andersen

unread,

Jun 29, 2025, 5:11:27 AMJun 29

to ciphermodes-forum, Roberto Avanzi

On Thursday, June 26, 2025 at 11:58:45 AM UTC+2 Roberto Avanzi wrote:

4. The Rijndael family requires storing the key schedule in RAM for good performance — because changing key is an expensive process. This creates vulnerability to cold boot attacks. Kamal and Youssef (SECURWARE 2010) demonstrated that having the full key schedule in memory makes key recovery much easier despite bit decay. Any new design should improve upon this.

I really don't think making a fringe attack scenario slightly more challenging is worth compromising the design in other ways. In any case, rather than making a lightweight key schedule I'd say a much better approach is to simply use a CSPRNG to expand the key, that will give no advantage in a cold boot attack, and the cryptographic properties should be top notch.

Roberto Avanzi

unread,

Jun 29, 2025, 6:35:00 AMJun 29

to ciphermodes-forum, Jacob Christian Munch-Andersen, Roberto Avanzi

Hi Jacob, there are a few errors here, if I may.

First, this is not a fringe attack: Cold boot attacks have been shown in practice, and against structured data they are they are effective,.

This means that against a single key they have trouble, against a "normal' key expansion they recover the key. Far from being fringe, they are one of the reasons encrypted memory is requirement. They are often easier than interposing the memory bus.

Second, changing the key schedule with one with good proven mathematical properties does not compromise the design. With the current key schedule the design may on the other hand be compromised already. People fear changes to ciphers, because in the past cipher design was mostly "let us try something, and let us hope this does not get broken". But, thanks also to the AES design and 25 years of cryptanalysis, we now know much better, and when ciphers are broken it is almost always because of one of the following: too sparse round constants (MIDORI, Simpira v1, Haraka v1); too few rounds (in order to get a paper published, some designers work a bit too much with very tight margins — I always try to be conservative); poor diffusion (see yoyo attacks). The AES key schedule is know to be bad and just jotted down at the time, Leurent's work proves in fact that not much thought was put into it. For AES-192 and AES-256 it leads to known weaknesses, and for Rijndael-256-256 it has never been properly analyzed.

Third, lightweight does not mean light security. Often a simple solution is actually better, because one CAN actually prove lower security bounds, whereas for a very complex key schedule it may only be possible to prove lower bounds, unless one works with bit-wise models of the cipher, which, for a 256-bit wide cipher, are normally not feasible.

Now, using a CSPRNG means that for each key we need to recompute an expensive key schedule. Apart from the fact that we cannot prove related key security in those case without additional heuristic assumptions, this also presents a problem in HW implementations, where the key schedule for the AES is commonly computed on the fly from the start OR the end values for encryption and decryption, so only two values need to be stored. Using a CSPRNG would require to store 30 128-bit values in order to get decent performance, which has a very negative effect on HW requirements — imagine a system that needs to work with multiple keys, some even need to juggle hundred of keys. This would make the cipher impossible to use in certain scenarios. The AES is already making some stuff such as link encryption on chipset-based architectures too expensive (I could provide numbers), and we need lighter, not heavier ciphers (and ASCON is also out of the question because it is meant to be lightweight as in small, but not scalable). If we want something like Rijndael-256-256, at least in an interim phase before we standardize something actually better, let us not make it into a resource hog.

In fact, if you want some "randomness", then encrypt the key before using the encrypted value (this is, incidentally, a way of constructing tweakable ciphers). then any related key relation cannot be observed.

Finally, note that this was not my only argument. In fact, security and flexibility of usage are my main arguments.

Roberto

Jacob Christian Munch-Andersen

unread,

Jun 29, 2025, 9:12:56 AMJun 29

to ciphermodes-forum, Roberto Avanzi, Jacob Christian Munch-Andersen

On Sunday, June 29, 2025 at 12:35:00 PM UTC+2 Roberto Avanzi wrote:

Hi Jacob, there are a few errors here, if I may.

The attack has been demoed, never heard of it used in the wild, there are physical access attacks that are way easier to pull off, like hardware keyloggers. In any case, you do not actually prevent it, it simply takes a slightly earlier boot, or multiple attacks, each recovering part of the data.

On your second point I completely agree, and that is also why I feel comfortable recommending switching to a CSPRNG key schedule for an AES-like construction, unless there is some very specific design constraint that makes random data unsuitable there is really no way the CSPRNG could be worse than whatever else you might suggest, at worst the security is equal.

Look, if you can prove anything in symmetric primitives it generally means you are wasting computation trying to fit the algorithm to a proof, and you still only end up with a proof of type "secure if [unprovable conjecture] is true". Saying you can prove something with a lightweight schedule but not with a CSPRNG is the exact opposite position of your second point.

As for hardware acceleration, the model of simply providing a round function seems to work really well, the potential performance is more than plenty, so I don't see why CPU manufacturers would waste silicon building all-rounds instructions.

Roberto Avanzi

unread,

Jun 29, 2025, 9:35:17 AMJun 29

to ciphermodes-forum, Jacob Christian Munch-Andersen, Roberto Avanzi

Il giorno domenica 29 giugno 2025 alle 15:12:56 UTC+2 Jacob Christian Munch-Andersen ha scritto:

On Sunday, June 29, 2025 at 12:35:00 PM UTC+2 Roberto Avanzi wrote:
Hi Jacob, there are a few errors here, if I may.

The attack has been demoed, never heard of it used in the wild, there are physical access attacks that are way easier to pull off, like hardware keyloggers. In any case, you do not actually prevent it, it simply takes a slightly earlier boot, or multiple attacks, each recovering part of the data.

This is exactly the opposite of a sound security mentality. The attack exists. So you cannot know whether it has been mounted against someone or not. "never heard of it used in the wild" proves nothing. It is a relatively easy attack.

Also, hardware keyloggers are for a completely different threat scenario.

Yes, they are easier. But they do not recover keys.

On your second point I completely agree, and that is also why I feel comfortable recommending switching to a CSPRNG key schedule for an AES-like construction, unless there is some very specific design constraint that makes random data unsuitable there is really no way the CSPRNG could be worse than whatever else you might suggest, at worst the security is equal.

But performance and/or memory usage are reasons to NOT recommend a CSPRNG.

Look, if you can prove anything in symmetric primitives it generally means you are wasting computation trying to fit the algorithm to a proof, and you still only end up with a proof of type "secure if [unprovable conjecture] is true".

I am not sure you understand what you are speaking about.

Most attacks have a complexity bounded by those of a differential cryptanalysis attack — some may be ore efficient but there is normally a bound. So you use various mathematical tools to find the most efficient non related-key/related-key differential cryptanalysis attack, and there give you a bound on the complexity. I am quite sure that Nicky Mouha would have something to say about these techniques to estimate attack complexity.

Saying you can prove something with a lightweight schedule but not with a CSPRNG is the exact opposite position of your second point.

Nope. If your reference for cryptography is Schneier's "Applied Cryptography" then I have to disappoint you, but it is dated.

A CSPRNG can give you the ability to ignore related-key attacks (and makes key recovery in traditional attacks more difficult) *IF* you can assume that its output is random, you have to prove it first – by using the same techniques I mentioned above or some ad hoc one. In theory you can take AES-256 and with the provided key you encrypt the numbers from 0 to 29, and then use the resulting 30 values as the 128-bit round key halves for Rijdael-256-256. This is a NIST approved PRNG. Are you sure you want something that heavy?

As for hardware acceleration, the model of simply providing a round function seems to work really well, the potential performance is more than plenty, so I don't see why CPU manufacturers would waste silicon building all-rounds instructions.

I am speaking about HW implementations of a cipher. It is obvious that if a cipher is standardized — and the requirement to use the AES instructions is there to accelerate SW implementations — then there will be also HW implementations. Used in HW circuits, not from software. And my remark applied to this.

However, you say something really, really wrong: "so I don't see why CPU manufacturers would waste silicon building all-rounds instructions." Then you are ignoring all the use cases where SW should not see the keys used by the primitive. There is a reason I sit on the Risc V TG about "High Assurance Cryptography" and even intel has instructions to atomically perform AES encryption with keys from a "Key Locker": Concrete use cases with specific requirements.

Roberto

Markku-Juhani O. Saarinen

unread,

Jun 29, 2025, 9:55:37 AMJun 29

to Jacob Christian Munch-Andersen, ciphermodes-forum, Roberto Avanzi

On Sun, Jun 29, 2025 at 2:12 PM Jacob Christian Munch-Andersen <ebusin...@gmail.com> wrote:

> The attack has been demoed, never heard of it used in the wild, there are physical access attacks that are way easier to pull off, like hardware keyloggers. In any case, you do not actually prevent it, it simply takes a slightly earlier boot, or multiple attacks, each recovering part of the data.

Hi Jacob,

I have used this in the wild, back when it was still possible. Cold boot attacks are taken very seriously by cryptographers, especially those involved with hardware design and operating system security. They were one the main reasons why things like the Intel key locker were introduced ( https://cdrdv2-public.intel.com/671438/343965-intel-key-locker-specification.pdf ). Major operating systems incorporate additional countermeasures against their exploitation, which is the main reason why they are no longer being exploited. You can no longer extract a disk encryption key from dram for the simple reason that it is not there.

> I feel comfortable recommending switching to a CSPRNG key schedule for an AES-like construction, unless there is some very specific design constraint that makes random data unsuitable there is really no way the CSPRNG could be worse than whatever else you might suggest, at worst the security is equal.

I don't know precisely what you mean with a "CSPRNG" here, but I assume that it would be something requiring a relatively large circuit area, some memory (AES key schedule can be implemented "on the fly" requires no memory), and increased key switching latency. This sounds like a bad idea from a practical use perspective for many, many reasons. A hardware module would require hundreds of thousands of gates. Furthermore it would not be easy to use a cipher like that very embedded systems, or in a device that requires rapid context switching, such as a CDN server or a network router that has to handle thousands of simultaneous connections.

Cheers,

-markku

Dr. Markku-Juhani O. Saarinen <mj...@iki.fi>

--
To unsubscribe from this group, send email to ciphermodes-fo...@list.nist.gov

View this message at https://list.nist.gov/ciphermodes-forum
To unsubscribe from this group and stop receiving emails from it, send an email to ciphermodes-fo...@list.nist.gov.

Jacob Christian Munch-Andersen

unread,

Jun 29, 2025, 2:00:03 PMJun 29

to ciphermodes-forum, Markku-Juhani O. Saarinen, ciphermodes-forum, Roberto Avanzi, Jacob Christian Munch-Andersen

I have used this in the wild, back when it was still possible.

Wild thing to admit in public, but ok.

In any case, it sounds like this attack is being taken care of in the place where it should be, namely the hardware and OS, putting a measure that doesn't really work in the cipher should be moot.

> Most attacks have a complexity bounded by those of a differential cryptanalysis attack — some may be ore efficient but there is normally a bound. So you use various mathematical tools to find the most efficient non related-key/related-key differential cryptanalysis attack, and there give you a bound on the complexity. I am quite sure that Nicky Mouha would have something to say about these techniques to estimate attack complexity.

That gives you an estimate, not a proof, and if your estimate says that attacks should be easier against a CSPRNG than a lightweight key scheduler there is probably something wrong with your estimate.

> I don't know precisely what you mean with a "CSPRNG" here

Anything meeting the standard cryptographic requirements of random number generation.

> but I assume that it would be something requiring a relatively large circuit area

Mainly I just wouldn't dedicate specific circuitry for the function, it shouldn't need to run that often.

A normal CDN server will have absolutely no problem storing an expanded key per connected user, and generating 30x128 bits or however much it is is trivial relative to the handshake.

The Intel key locker is an interesting case, but do note that there is not actually that much silicon in it, it is one master key in a special register, and then a bunch of microcode to use the normal ALUs for the operations. A more elaborate key schedule would be slower, but it would not require more hardware. Then again, if you wanted to make this operation as fast as possible you might cache the expanded key in the hardware register file anyway, in which case the key schedule overhead only occurs on the first of a series of calls.

Overall I guess there might be cases where a lighter key schedule is desirable, but you are both way overstating it, it has to be a case where you for some reason can't store the expanded key, and you don't operate in bursts long enough that one key expansion is negligible. If we are talking usage in TLS I don't see a case where it matters, the thing you want hardware security for is the asymmetric keys, not the ephemeral symmetric keys.

On a broader note I will add that the optimal cipher in my view is something like AEGIS, encryption and authentication packed in one, the static key replaced with a mutating state, and accelerated by instructions present in most modern CPUs. The only thing I might ask for is that some of the speed gain could have been traded for a higher security margin. And I assume we all agree that the 256 bit Rijndael proposal is plain bad, there are just parts of your reasoning I didn't find convincing.

Roberto Avanzi

unread,

Jun 29, 2025, 4:49:33 PMJun 29

to Jacob Christian Munch-Andersen, ciphermodes-forum, Markku-Juhani O. Saarinen

On 29. Jun 2025, at 20:00, Jacob Christian Munch-Andersen <ebusin...@gmail.com> wrote:

I have used this in the wild, back when it was still possible.
Wild thing to admit in public, but ok.

What if one has done it to help law enforcement? I have no idea why Markku has done it, but it does not have to be a shady motive ;-)

In any case, it sounds like this attack is being taken care of in the place where it should be, namely the hardware and OS, putting a measure that doesn't really work in the cipher should be moot.

You miss the point. A lot of programs and components mitigate such attacks for their own purposes, but they are not mitigating cold boot attacks in general. For instance, the RoT of many systems performs some critical operations, such as boot image verification, in secure hardware that never exposes the keys to DRAM, but such accelerators also have very expensive set up costs. That’s why there are general purpose instructions and why people like myself work on “High Assurance Cryptography” features. If an implementation of the “new” cipher stores these values in memory, then these can be recovered. The only thing that could help is memory encryption, but on many client devices, in order to avoid the large performance penalties associated with the AES, memory encryption is not present. Of course, this is different on high end chips, where such a feature can be activated.

However, this is *not* the most important point. So maybe we should not debate this ad infinitum.

> Most attacks have a complexity bounded by those of a differential cryptanalysis attack — some may be ore efficient but there is normally a bound. So you use various mathematical tools to find the most efficient non related-key/related-key differential cryptanalysis attack, and there give you a bound on the complexity. I am quite sure that Nicky Mouha would have something to say about these techniques to estimate attack complexity.

That gives you an estimate, not a proof, and if your estimate says that attacks should be easier against a CSPRNG than a lightweight key scheduler there is probably something wrong with your estimate.

There is nothing wrong with this, at least to a cryptanalyst. It is computationally feasible to model a relatively simple key schedule, and therefore you CAN get lower bounds for the security.

However, with very complex ones, the complexity of the models increases to a point that they usually run indefinitely.

There is significant literature about modeling ciphers with MILP, SAT, CP tools.

It is mostly a matter of philosophy: I prefer to have *provable* numbers. I had the same discussion with other people that prefer random-looking key schedules.

> I don't know precisely what you mean with a "CSPRNG" here

Anything meeting the standard cryptographic requirements of random number generation.

> but I assume that it would be something requiring a relatively large circuit area

Mainly I just wouldn't dedicate specific circuitry for the function, it shouldn't need to run that often.

A normal CDN server will have absolutely no problem storing an expanded key per connected user, and generating 30x128 bits or however much it is is trivial relative to the handshake.

You are speaking of one application. Just because one should be “easy” to do does not mean that all are.

I mentioned the case of hardware accelerators — some need to be able to juggle several keys (even dozens) simultaneously, for instance memory encryption units.

The cost for the SRAM needed to store up to 512 or even 4096 keys is a MAJOR cost. Imagine having dozens of such accelerators, for instance one for memory channel.

Suppose that, instead of just the first and last round keys, one had to store an entire key schedule of 30 128-bit values, for 512 to 4096 keys.

The cost for hardware manufacturers would be untenable.

The Intel key locker is an interesting case, but do note that there is not actually that much silicon in it, it is one master key in a special register, and then a bunch of microcode to use the normal ALUs for the operations. A more elaborate key schedule would be slower, but it would not require more hardware. Then again, if you wanted to make this operation as fast as possible you might cache the expanded key in the hardware register file anyway, in which case the key schedule overhead only occurs on the first of a series of calls.

You are talking about a Cryptographically Secure PRNG, which means something of the order of 30 AES calls to create a key schedule.

Maybe one can just generate 5+6=11 values, and then add them in pairs, but this is still quite significant , or even just 5 values and add them according to the expansions of the numbers from 1 to 30.

But in this case *I* would have some issues…

The point of Intel’s key locker, however, is that unwrapping the key must have an acceptable cost. So, this would be incompatible with a seeded CS PRNG.

Overall I guess there might be cases where a lighter key schedule is desirable, but you are both way overstating it, it has to be a case where you for some reason can't store the expanded key, and you don't operate in bursts long enough that one key expansion is negligible. If we are talking usage in TLS I don't see a case where it matters, the thing you want hardware security for is the asymmetric keys, not the ephemeral symmetric keys.

On a broader note I will add that the optimal cipher in my view is something like AEGIS, encryption and authentication packed in one, the static key replaced with a mutating state, and accelerated by instructions present in most modern CPUs. The only thing I might ask for is that some of the speed gain could have been traded for a higher security margin. And I assume we all agree that the 256 bit Rijndael proposal is plain bad,

AEGIS has a very large initial latency and /also/ uses a “trivial” key schedule…

there are just parts of your reasoning I didn't find convincing.

This is the reason we have debates. I tend to be very… fond of my arguments. But if I am proven wrong, I have no problem accepting the better argument.

I am just a strong proponent of lightweight key schedules (but not TOO light), that's it, because if they are sufficient to achieve a target security level, there is no need to overdo.

Roberto

Jacob Christian Munch-Andersen

unread,

Jun 29, 2025, 6:37:45 PMJun 29

to ciphermodes-forum, Roberto Avanzi, ciphermodes-forum, Markku-Juhani O. Saarinen, Jacob Christian Munch-Andersen

> It is computationally feasible to model a relatively simple key schedule

This is a paradox, we are trying to make a specific task computationally infeasible, that is really what cryptography is about. While we are talking about two different tasks, both are influenced by the complexity of the key schedule. So do we want to make them both easier or them both harder? I will agree that there is some challenge to designing "blind" in this fashion, but it mainly comes down to padding your security margin a bit in the face of the unknown. You generally gain some performance to security ratio with the blind approach, but you have to be careful not to try to convert too much of it to performance.

Do memory encryption units need more than one key? (I guess if you do one key per VM, not sure how important that is.) In any case I'd consider that a special purpose, they don't need to be compatible with anything else, so I wouldn't worry about them for a general purpose cipher.

> You are talking about a Cryptographically Secure PRNG, which means something of the order of 30 AES calls to create a key schedule.

These 30 128 bit numbers is a bit excessive, but also you can design a significantly cheaper CSPRNG.

> AEGIS has a very large initial latency and /also/ uses a “trivial” key schedule…

I'd argue that that initial computation kind of is the key schedule, it does a similar job. But I'd also like to challenge your subjective "very large" claim, a few hundred bytes into a message you have already saved the initialisation back again simply by AEGIS requiring way less computation than AES. I think you need to recalibrate how you weigh once costs versus per-byte costs, in most cases the once costs we are talking about are completely irrelevant, and if you can gain even a tiny amount of security by taking on a greater once cost it is generally a better deal than adding to the per-byte cost.

Roberto Avanzi

unread,

Jun 30, 2025, 4:04:31 AMJun 30

to Jacob Christian Munch-Andersen, ciphermodes-forum, Markku-Juhani O. Saarinen, Jacob Christian Munch-Andersen

Ok, deep breath…:

> Am 30.06.2025 um 00:37 schrieb Jacob Christian Munch-Andersen <ebusin...@gmail.com>:
>
> > It is computationally feasible to model a relatively simple key schedule
>
> This is a paradox, we are trying to make a specific task computationally infeasible, that is really what cryptography is about. While we are talking about two different tasks, both are influenced by the complexity of the key schedule. So do we want to make them both easier or them both harder?

Let me try again. There are ways to describe how a cipher works. One can model the cipher using tools from operations research, for instance. These tools do not actually attack an instance of the cipher but show the “paths” through the cipher that an attack has to follow
Then one can try to minimize certain quantities related to the costs of these paths through the cipher, which in turn correspond to lower bounds on the complexity of certain classes of attacks. The designer runs these programs on several variants of the cipher until a desired security level is achieved, plus some margin. Before these tools were introduced in cryptanalysis. Such paths were found manually or with ad hoc programs.

Now, the running time of these OR programs is usually much lower than those of the actual attacks, but it is also often exponential in the number of variables and relations.

More complex key schedules require more variables and relations and therefore they can make the corresponding optimisation problem effectively unsolvable.

My approach is, simply speaking, if I can achieve the desired security level and if I can actually prove it for certain classes of attacks with a lighter design, then I prefer the lighter design.

Note that random key schedules have often be proven not to yield a significantly better level of security then more structured ones, for instance the DES. In the olden times of the early symmetric cryptography they have been in fact quite a few results of the type. Many attacks work round by round and a simple key schedule will automatically deliver many bits of the next round the key once the previous round keys have been recovered, but also deterministic, complex functions can be used as filters to reduce the choices for the next step. However, since most of the complexity lies in recovering key bits from the very first rounds attacked, this fact is mostly a side observation. Of course, exceptions always exist and that’s why all the components of a cipher must be designed properly. The ShiftRows in the 256 but wide version of Rijndael is one such.

> I will agree that there is some challenge to designing "blind" in this fashion, but it mainly comes down to padding your security margin a bit in the face of the unknown. You generally gain some performance to security ratio with the blind approach, but you have to be careful not to try to convert too much of it to performance

Yes, and this padding is done :-)

> Do memory encryption units need more than one key? (I guess if you do one key per VM, not sure how important that is.)

If we were using memory protection methods that prevent memory corruption and memory replay, as well as semantic security, then one key would be enough. However, for performance reasons and the unwillingness of the semiconductor industry to implement complex state machines, as well as for marketing reasons, each VM must have its own different key or, as the Arm CCA states, a different tweak (I pushed for this alternative while I was there). AMD and Intel chips support between 512 and 4096 different keys. The Arm CCA allows the key index to be up to 16 bits (fewer is the norm).

But this is not the only use case, also network accelerators must be able to juggle more keys at the same time, even hundreds.

> In any case I'd consider that a special purpose, they don't need to be compatible with anything else, so I wouldn't worry about them for a general purpose cipher.

There is a big issue here. Despite DRAM being data in use, and the AES (esp AES-XTS) having been designed for data at rest, customers of semiconductor companies require such algorithms to protect the memory. I have colleagues that wait for the 256 bit wide cipher to “better” protect memory and you can easily imagine the technical issues if we had to store thousands of fully unrolled schedules in the internal SRAM of each memory controller. Ignoring the fact that the NIST Nestor should actually follow the example of the BSI and have a methodology for evaluation of cryptographic methods to encrypt memory and allow low latency ciphers to be used for that. Standardising only general purpose ciphers is actually limiting progress in certain areas. But this is an other story.

Putting memory encryption aside, are network accelerators enough of a motivation?

> > You are talking about a Cryptographically Secure PRNG, which means something of the order of 30 AES calls to create a key schedule.
>
> These 30 128 bit numbers is a bit excessive, but also you can design a significantly cheaper CSPRNG.

But then how do you prove it is a *CS* PRNG? Maybe with a simpler method you can do the above modelling - but what if you achieve the same level of security with a simpler schedule?

> > AEGIS has a very large initial latency and /also/ uses a “trivial” key schedule…
>
> I'd argue that that initial computation kind of is the key schedule, it does a similar job. But I'd also like to challenge your subjective "very large" claim, a few hundred bytes into a message you have already saved the initialisation back again simply by AEGIS requiring way less computation than AES. I think you need to recalibrate how you weigh once costs versus per-byte costs, in most cases the once costs we are talking about are completely irrelevant, and if you can gain even a tiny amount of security by taking on a greater once cost it is generally a better deal than adding to the per-byte cost.

AEGIS become faster after 1k-2K bytes but the difference is not that large. Since we want robustness and (ideally) full commitment an accordion or similar + a wide general purpose cipher seems to me to be the best option to get some flexibility. However, either AEGIS or Accordion+wide ciphers will make the needs of special cases even more acute, and I am worried that we are not responding to them.

Jacob Christian Munch-Andersen

unread,

Jun 30, 2025, 5:42:24 AMJun 30

to ciphermodes-forum, Roberto Avanzi, ciphermodes-forum, Markku-Juhani O. Saarinen, Jacob Christian Munch-Andersen

I do understand all that, I do understand that a more complex key schedule is not an automatic improvement. But I also understand that what you can prove is only resistance to known types of attacks, when we pad algorithms with more rounds it is to guard against the unknown, and I'd argue a beefy key schedule does a similar job, you obviously can't prove that this is the difference that will save the day, but it looks like a pretty good general purpose hindrance to me.

What do you mean "network accelerator"? Are we talking network cards? Switches? Are you suggesting they should do TLS? Or a network layer encryption on top of that?

AEGIS uses 4 rounds per 16 bytes, AES GCM uses 10, and separate authentication on top, I'd say that is a pretty big difference. Actual results obviously wary between implementations, CPUs and chose level of parallelism.

Roberto Avanzi

unread,

Jun 30, 2025, 10:45:11 AMJun 30

to Jacob Christian Munch-Andersen, ciphermodes-forum, Markku-Juhani O. Saarinen, Jacob Christian Munch-Andersen

Wells thank you for the debate. I still prefer to save resources and have more flexible primitives at the same time ;-)

> Am 30.06.2025 um 11:42 schrieb Jacob Christian Munch-Andersen <ebusin...@gmail.com>:
>
> What do you mean "network accelerator"? Are we talking network cards? Switches? Are you suggesting they should do TLS? Or a network layer encryption on top of that?

There are HW accelerators for TLS, and also for other types of links (CXL etc). They often have to juggle several keys because they are to be used for several links at once. Key agility is *very* important there.

> AEGIS uses 4 rounds per 16 bytes, AES GCM uses 10, and separate authentication on top, I'd say that is a pretty big difference. Actual results obviously wary between implementations, CPUs and chose level of parallelism.

Yes, but there are additional instructions and I remember that in the AEGIS paper the differences were not that large, definitely not 2.5x.

Reply all

Reply to author

Forward

0 new messages