Contribution Proposal: SIMD (AVX512) Acceleration for BN_add() and BN

Subhrajit Das

unread,

Oct 31, 2025, 12:28:49 AMOct 31

to openssl-project

Dear OpenSSL Dev Team,

We are a group of researchers from the University of Edinburgh and the Indian Institute of Technology Gandhinagar, who are working on developing a novel algorithm that employs SIMD instructions to significantly accelerate large-integer addition (BN_add()) and subtraction (BN_sub()), and we would like to contribute this work to the OpenSSL project.

The Challenge: Serial Carry Propagation

The large-integer addition in OpenSSL is currently implemented with serial add-with-carry instructions. This carry-propagation dependency prevents the direct use of data-parallel SIMD instructions, limiting throughput.

Our SIMD-based Solution

Our algorithm parallelizes large-integer arithmetic to leverage SIMD units.

Implementation: We've coded the algorithm in C using AVX512 intrinsics for x86-64 CPUs. Ours is a generic implementation that does not perform any microarchitectural optimizations, unlike other libraries such as the GNU Multiple Precision Library (GMP), and it gives good performance across the CPUs used for experimentation.

Scalability: The approach is scalable to any vector length (128-bit, 256-bit), but we've observed the best performance gains with AVX512.

Integration: We have successfully integrated this into OpenSSL 3.5.2 redirecting calls within BN_add() (specifically bn_add_n()) to our optimized functions.

Performance Results

We benchmarked our modified OpenSSL 3.5.2 against the original build on four different CPUs.

BN_add() Speedups

On average, we achieve ~2.4x speedup across operand sizes. The figure below shows the speedups across the four CPUs over operand sizes ranging from 256 to 65536 bits.

Screenshot 2025-10-31 at 9.54.25 AM.png

Validation and Broader Impact

Correctness: We validated our implementation using the full OpenSSL test suite, and all tests passed. We are also in the process of formally verifying its correctness.
BN_sub(): We observed similar, significant speedups for subtraction.
Instruction Count: We also observe up to 60% instruction count reduction for both operations.
BN_mul(): Because multiplication recursively relies on addition/subtraction (e.g., for Karatsuba), our changes resulted in an average performance improvement of 13% for BN_mul() across various operand sizes.

We have submitted this work to a Systems conference and are eager to contribute the code to future OpenSSL builds.

We can provide a patch, a link to our fork, or the conference paper for your review. Please let us know the best way to proceed.

Looking forward to hearing from you.

--

Subhrajit Das
PhD Scholar
Computer Science and Engineering
Indian Institute of Technology, Gandhinagar

Matt Caswell

unread,

Oct 31, 2025, 7:42:13 AMOct 31

to Subhrajit Das, openssl-project

On Fri, 31 Oct 2025 at 04:28, 'Subhrajit Das' via openssl-project <openssl...@openssl.org> wrote:

On average, we achieve ~2.4x speedup across operand sizes. The figure below shows the speedups across the four CPUs over operand sizes ranging from 256 to 65536 bits

Does this translate to a measurable speed up of common cryptographic operations? E.g. do we see a speed up in RSA, DH, ECDSA etc?

Validation and Broader Impact
Correctness: We validated our implementation using the full OpenSSL test suite, and all tests passed. We are also in the process of formally verifying its correctness.
BN_sub(): We observed similar, significant speedups for subtraction.
Instruction Count: We also observe up to 60% instruction count reduction for both operations.
BN_mul(): Because multiplication recursively relies on addition/subtraction (e.g., for Karatsuba), our changes resulted in an average performance improvement of 13% for BN_mul() across various operand sizes.
We have submitted this work to a Systems conference and are eager to contribute the code to future OpenSSL builds.
We can provide a patch, a link to our fork, or the conference paper for your review. Please let us know the best way to proceed.

A github pull request is the correct way to propose changes to OpenSSL so that it can be properly reviewed. We would also need CLAs from all contributors before it could be accepted.

Matt

Looking forward to hearing from you.
--
Subhrajit Das
PhD Scholar
Computer Science and Engineering
Indian Institute of Technology, Gandhinagar

--
You received this message because you are subscribed to the Google Groups "openssl-project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openssl-proje...@openssl.org.
To view this discussion visit https://groups.google.com/a/openssl.org/d/msgid/openssl-project/0ecd214c-bb58-47c8-affa-3317bd948e80n%40openssl.org.

Michael Richardson

unread,

Oct 31, 2025, 2:16:33 PMOct 31

to openssl-project

Matt Caswell <ma...@openssl.org> wrote:
>> On average, we achieve ~2.4x speedup across operand sizes. The figure
>> below shows the speedups across the four CPUs over operand sizes
>> ranging from 256 to 65536 bits
>>

> Does this translate to a measurable speed up of common cryptographic
> operations? E.g. do we see a speed up in RSA, DH, ECDSA etc?

Contribution Proposal: SIMD (AVX512) Acceleration for BN_add() and BN_sub()

Subhrajit Das

Matt Caswell

Michael Richardson