Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors

84 views
Skip to first unread message

Manuel Bluhm via RT

unread,
Aug 28, 2013, 1:06:14 AM8/28/13
to
Hello all,

This patch is a contribution to OpenSSL.

It offers an efficient and constant-time implementation of the elliptic
curve point multiplication, for the following standard NIST/SECG binary
elliptic curves:
sect163k1, sect163r1, sect163r2, sect193r1, sect193r2, sect233k1,
sect233r1, sect239k1, sect283k1, sect283r1, sect409k1, sect409r1,
sect571k1, and sect571r1.

The patch implements several improvements at the algorithmic and the
coding levels (using SSE/AVX and PCLMULQDQ instructions).

Depending on the curve and architecture, this patch offers a speedup of
between 4x to 10x for ECDH and ECDSA, compared to the current
implementation of OpenSSL 1.0.1e.
Additionally, it adds side channel protection to avoid (cache) timing
attacks using a number of mechanisms.

The code is written in C and uses compiler intrinsics, for simplicity
and portability. The following results were obtained with gcc 4.8.1.

For detailed explanations of the rationale and algorithms of this code
refer to [1].


ECDH performance
--------------------------------------------------------------------------

The performance was measured by using openssl speed utility as follows:

$ openssl speed ecdh


The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) in ECDH op/s:

Curve || OpenSSL 1.0.1e || This patch || Speedup ||
------------||----------------||-------------||----------||
|| || || ||
(nistk163) || 6586.9 || 67029.6 || 10.18 ||
(nistk233) || 5121.9 || 39441.3 || 7.70 ||
(nistk283) || 2825.7 || 27718.5 || 9.81 ||
(nistk409) || 1745.8 || 11634.2 || 6.66 ||
(nistk571) || 763.2 || 5930.9 || 7.77 ||
(nistb163) || 6382.5 || 60729.6 || 9.52 ||
(nistb233) || 4881.9 || 35230.4 || 7.22 ||
(nistb283) || 2651.6 || 24456.4 || 9.22 ||
(nistb409) || 1640.3 || 10228.6 || 6.24 ||
(nistb571) || 693.8 || 5172.1 || 7.45 ||
|| || || ||
------------||----------------||-------------||----------||


The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) in ECDH op/s:

Curve || OpenSSL 1.0.1e || This patch || Speedup ||
------------||----------------||-------------||----------||
|| || || ||
(nistk163) || 3271.5 || 28087.3 || 8.59 ||
(nistk233) || 2504.9 || 15106.0 || 6.03 ||
(nistk283) || 1317.0 || 9030.5 || 6.86 ||
(nistk409) || 772.1 || 3880.8 || 5.03 ||
(nistk571) || 327.3 || 1821.1 || 5.56 ||
(nistb163) || 3067.9 || 24357.1 || 7.94 ||
(nistb233) || 2424.9 || 3147.3 || 5.42 ||
(nistb283) || 1227.0 || 7765.1 || 6.33 ||
(nistb409) || 709.7 || 3319.9 || 4.68 ||
(nistb571) || 296.2 || 1563.9 || 5.28 ||
|| || || ||
------------||----------------||-------------||----------||



ECDSA performance
--------------------------------------------------------------------------

The performance was measured by using openssl speed utility as follows:

$ openssl speed ecdsa


The results for a Core i7-4770 CPU @ 3.40GHz (Haswell):

Curve || OpenSSL 1.0.1e || This patch || Speedup ||
-----------||-----------------||-------------------||-----------------||
|| sign/s verify/s || sign/s verify/s || sign/s verify/s ||
||-----------------||-------------------||-----------------||
(nistk163) || 6,465.3 3,159.5 || 36,872.6 26,508.4 || 5.70 8.39 ||
(nistk233) || 3,259.2 2,419.8 || 22,998.4 15,557.1 || 7.06 6.43 ||
(nistk283) || 2,204.7 1,355.7 || 16,884.9 11,003.2 || 7.66 8.12 ||
(nistk409) || 977.0 839.1 || 8,150.0 4,845.0 || 8.34 5.77 ||
(nistk571) || 466.4 368.3 || 4,424.1 2,533.6 || 9.49 6.88 ||
(nistb163) || 6,487.3 3,043.9 || 35,110.0 24,904.8 || 5.41 8.18 ||
(nistb233) || 3,279.2 2,348.0 || 21,468.8 14,095.6 || 6.55 6.00 ||
(nistb283) || 2,196.4 1,283.5 || 15,602.7 9,888.5 || 7.10 7.70 ||
(nistb409) || 976.3 786.9 || 7,423.1 4,361.9 || 7.60 5.54 ||
(nistb571) || 466.6 341.0 || 3,977.0 2,251.6 || 8.52 6.60 ||
|| || || ||
-----------||-----------------||-------------------||-----------------||


The results for a Core i5-3210M CPU @ 2.50 GHz (Ivy Bridge):

Curve || OpenSSL 1.0.1e || This patch || Speedup ||
-----------||-----------------||-------------------||-----------------||
|| sign/s verify/s || sign/s verify/s || sign/s verify/s ||
||-----------------||-------------------||-----------------||
(nistk163) || 3,749.9 1,578.6 || 17,721.8 11,688.1 || 4.73 7.40 ||
(nistk233) || 1,881.7 1,211.6 || 10,359.0 6,439.4 || 5.51 5.31 ||
(nistk283) || 1,267.5 639.3 || 6,688.9 3,951.1 || 5.28 6.18 ||
(nistk409) || 542.2 361.9 || 3,140.9 1,757.1 || 5.79 4.86 ||
(nistk571) || 257.6 159.9 || 1,556.0 834.6 || 6.04 5.22 ||
(nistb163) || 3,766.5 1,514.5 || 16,203.5 10,453.8 || 4.30 6.90 ||
(nistb233) || 1,893.1 1,150.4 || 9,386.5 5,711.9 || 4.96 4.97 ||
(nistb283) || 1,265.7 594.2 || 5,962.3 3,445.5 || 4.71 5.80 ||
(nistb409) || 539.3 344.2 || 2,763.4 1,522.4 || 5.12 4.42 ||
(nistb571) || 257.2 145.7 || 1,354.8 724.9 || 5.27 4.98 ||
|| || || ||
-----------||-----------------||-------------------||-----------------||



Changes to OpenSSL-1.0.1e
--------------------------------------------------------------------------

crypto/bn:

bn_gf2m_xmm.c : New file, contains XMM GF2m implementation
bn.h : Added new function declarations
bn_gf2m.c : Added constant time bn operations
Makefile : Added bn_gf2m_xmm.c to makefile

crypto/ec:

ec2_nist_mult.c: New file, implements Montgomery point multiplication
ec2_nist.c : New file, implements EC methods
ec2_nist_prec.c: New file, implements method to get precomputated values

ec.h : Added function declarations (ec_methods)
ec_lcl.h : Added function declarations (all functions in the ec_method)
ec_curve.c: Added new EC methods to builtin curves

Makefile : Added new files to makefile




Configuration flags
--------------------------------------------------------------------------

-DOPENSSL_FAST_EC2M : Enable the fast implementation of binary curves
-DFAST_PCLMUL : Enable the pclmul reduction for pentanomial curves

-mpclmul : Enable pclmulqdq
-msse4 : Enable SSE4
-mavx : Enable AVX
-mavx2 : Enable AVX2
-march=native : Enable all instruction subsets


The results above have been created with the following configurations:

(1) Core i7-4770 @ 3.40GHz (Haswell):

./config -mavx2 -mpclmul -DFAST_PCLMUL -DOPENSSL_FAST_EC2M

(2) Core i5-3210M @ 2.50 GHz (Ivy Bridge):

./config -mavx -mpclmul -DOPENSSL_FAST_EC2M




[1] M. Bluhm, S. Gueron, Fast Software Implementation of Binary Elliptic
Curve Cryptography (2013; to be published)

Developers and authors:
***************************************************************************
Manuel Bluhm (1) and Shay Gueron (2, 3)
(1) Ruhr University Bochum, Germany
(2) Intel Corporation, Israel Development Center, Haifa, Israel
(3) University of Haifa, Israel
***************************************************************************

FAST_EC2M.patch

Andrey Kulikov

unread,
Sep 2, 2013, 2:47:19 PM9/2/13
to
Dear Manuel,

Exciting news!
While your paper still unpublished, could you please advice, it there anything even nearly similar possible for curves over primary fields?
(e.g. curves secp* )

Best regards,
Andrey

Manuel Bluhm

unread,
Sep 3, 2013, 4:05:32 AM9/3/13
to
Dear Andrey,

the scope of this work is limited to binary curves only. However, there might be ways to
speed up prime curves with vector instructions, but not the same way as we did for
binary curves. Most of the performance gain comes from the fast binary field arithmetic
implementation, which is different from the prime field arithmetic.

Best regards,
Manuel

Manuel Bluhm

unread,
Sep 2, 2013, 4:56:58 PM9/2/13
to
Dear Andrey,

the scope of this work is limited to binary curves only, however, there might be ways to
speed up prime curves with vector instructions.

David Jacobson

unread,
Sep 2, 2013, 10:57:00 PM9/2/13
to
Let me chime in with an amendment to Audrey's message.  It would be nice if the tables included performance numbers for prime modulus curves, even if the technique's of Manuel's patch are not applicable there.  Many people would like to know whether there is significant performance gains to be had by switching from GF(p) to GF(2^k) curves.

    --David Jacobson

handshak3

unread,
Apr 20, 2017, 5:36:10 PM4/20/17
to
Reviving this old thread as i am trying to explore ways to improve the performance of 163k curve.

Was this match ever accepted in openssl?

-Rohit
0 new messages