Wider Variant of AES - FPGA Implementation and GPU Optimization

76 views
Skip to first unread message

Ahmet MALAL

unread,
Jul 25, 2025, 1:35:32 PMJul 25
to ciphermodes-forum
Just released a preprint of our research (link) on implementing a wider-block variant of AES (Rijndael-256) on both FPGA and GPU platforms. The paper explores parallelization strategies, hardware-specific optimizations, and performance comparisons across platforms. 

We’re also interested in evaluating alternative wide-block ciphers, Vistrutah algorithm proposed by Roberto Avanzi and his team, and plan to analyze its performance metrics as part of our ongoing work. 
Feedback is very welcome! 

ABSTRACTIn response to the recent NIST call for a wider variant of the AES algorithm, we developed a fully pipelined, high-throughput FPGA implementation of the 256-bit block size AES, referred to as WAES-256. This design targets both 7th generation and UltraScale+ FPGAs, focusing on maximizing throughput and efficient hardware utilization. Our work supports AES-128, AES-256, and WAES-256, employing composite field arithmetic in the S-box to reduce critical path delay. All AES layers are fully pipelined, enabling multiple levels of parallelism with minimal architectural changes.

Our AES-128 implementations achieved the best throughput-per-slice (TPS) ratios reported in the literature for fair comparisons on the same FPGA platforms. For WAES-256, our designs reached 75.73 Gbps on Spartan-7, 72.32 Gbps on Artix-7, 199.46 Gbps on Zynq UltraScale+, and 206.11 Gbps on Kintex UltraScale+. Additionally, our multi-core parallel WAES-256 designs achieved 426.66 Gbps with x2 cores and 742.63 Gbps with x4 cores on the Kintex UltraScale+ platform, demonstrating the scalability of our approach. These results highlight the efficiency and scalability of our architectures, offering high-throughput performance without relying on BRAM, making them well-suited for next-generation cryptographic applications.

Moreover, we optimized WAES-256 on GPUs and achieved performance comparable to the best AES-256 results. For instance, we achieved 3053.5 Gbps WAES-256 encryption in counter mode of operation on an RTX 4090. Our results show that using FPGAs or GPUs as co-processors for WAES-256 render encryption free and transition from AES-256 to WAES-256 results in no observable slowdowns.


Best regards, 

Ahmet MALAL


Roberto Avanzi

unread,
Jul 25, 2025, 1:54:00 PMJul 25
to ciphermodes-forum, Ahmet MALAL
Thank you very much for your interest. However, Vistrutah is not by me and my team: Vistrutah is a design by a team of peers that includes me. I do add value, literally: I increase the average age of the team.

 Roberto

Ahmet MALAL

unread,
Aug 8, 2025, 5:06:28 PMAug 8
to ciphermodes-forum, Roberto Avanzi, Ahmet MALAL
Before I begin implementing it on FPGA platforms, I wanted to ask if you could kindly share the source code or, if not , provide test vectors to help guide my implementation. 
Best regards,
Ahmet MALAL

Roberto Avanzi

unread,
Aug 8, 2025, 6:29:48 PMAug 8
to Ahmet MALAL, ciphermodes-forum, Ahmet MALAL
We do intend to release the portable c ref code and some optimised versions, but it is not clean enough at this very moment. But: I can send you the portable C ref code, let me extract it from the complete test suite.

 Roberto 

Am 08.08.2025 um 23:06 schrieb Ahmet MALAL <ahmet...@metu.edu.tr>:

Before I begin implementing it on FPGA platforms, I wanted to ask if you could kindly share the source code or, if not , provide test vectors to help guide my implementation. 
--
To unsubscribe from this group, send email to ciphermodes-fo...@list.nist.gov
 
View this message at https://list.nist.gov/ciphermodes-forum
To unsubscribe from this group and stop receiving emails from it, send an email to ciphermodes-fo...@list.nist.gov.
Reply all
Reply to author
Forward
0 new messages