Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH RFC -v2] Add support to Intel AES-NI instruction set for

128 views
Skip to first unread message

Huang Ying

unread,
Dec 23, 2008, 2:01:59 AM12/23/08
to

--=-wxlSkMZ7lEbVOuYRYYOP
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

This patch adds support to Intel AES-NI instruction set for x86_64
platform.

Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD)
instructions that are going to be introduced in the next generation of
Intel processor, as of 2009. These instructions enable fast and secure
data encryption and decryption, using the Advanced Encryption Standard
(AES), defined by FIPS Publication number 197. The architecture
introduces six instructions that offer full hardware support for
AES. Four of them support high performance data encryption and
decryption, and the other two instructions support the AES key
expansion procedure.

The white paper can be downloaded from:

http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-=
Set_WP.pdf


AES-NI support is implemented as an engine in crypto/engine/.


ChangeLog:

v2:

- AES-NI support is implemented as an engine instead of "branch".

- ECB and CBC modes are implemented in parallel style to take
advantage of pipelined hardware implementation.

- AES key scheduling algorithm is re-implemented with higher performance.


Known issues:

- How to add conditional compilation for eng_intel_asm.pl? It can not
be compiled on non-x86 platform.

- NID for CTR mode can not be found, how to support it in engine?

- CFB1, CFB8, OFB1, OFB8 modes are not supported. If it is necessary
to add AES-NI support for them, I can add them.


Signed-off-by: Huang Ying <ying....@intel.com>

---
crypto/engine/Makefile | 11=20
crypto/engine/eng_all.c | 3=20
crypto/engine/eng_intel.c | 589 ++++++++++++++++++++++++++
crypto/engine/eng_intel_asm.pl | 918 ++++++++++++++++++++++++++++++++++++=
+++++
4 files changed, 1519 insertions(+), 2 deletions(-)

--- /dev/null
+++ b/crypto/engine/eng_intel.c
@@ -0,0 +1,589 @@
+/*
+ * Support for Intel AES-NI intruction set
+ * Author: Huang Ying <ying....@intel.com>
+ *
+ * Some code is copied from engines/e_padlock.c
+ *
+ * cfb and ofb mode code is copied from crypto/aes/aes_cfb.c and
+ * crypto/aes/aes_ofb.c.
+ */
+
+/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+ * Copyright (c) 1999-2001 The OpenSSL Project. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * 3. All advertising materials mentioning features or use of this
+ * software must display the following acknowledgment:
+ * "This product includes software developed by the OpenSSL Project
+ * for use in the OpenSSL Toolkit. (http://www.OpenSSL.org/)"
+ *
+ * 4. The names "OpenSSL Toolkit" and "OpenSSL Project" must not be used t=
o
+ * endorse or promote products derived from this software without
+ * prior written permission. For written permission, please contact
+ * lice...@OpenSSL.org.
+ *
+ * 5. Products derived from this software may not be called "OpenSSL"
+ * nor may "OpenSSL" appear in their names without prior written
+ * permission of the OpenSSL Project.
+ *
+ * 6. Redistributions of any form whatsoever must retain the following
+ * acknowledgment:
+ * "This product includes software developed by the OpenSSL Project
+ * for use in the OpenSSL Toolkit (http://www.OpenSSL.org/)"
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY
+ * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE OpenSSL PROJECT OR
+ * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+ *
+ * This product includes cryptographic software written by Eric Young
+ * (e...@cryptsoft.com). This product includes software written by Tim
+ * Hudson (t...@cryptsoft.com).
+ *
+ */
+
+
+#include <openssl/opensslconf.h>
+
+#if !defined(OPENSSL_NO_HW) && !defined(OPENSSL_NO_HW_INTEL_AES_NI) && !de=
fined(OPENSSL_NO_AES)
+
+#define INTEL_AES_MIN_ALIGN 16
+#define ALIGN(x,a) (((unsigned long)(x)+(a)-1)&(~((a)-1)))
+#define INTEL_AES_ALIGN(x) ALIGN(x,INTEL_AES_MIN_ALIGN)
+
+#include <stdio.h>
+#include <string.h>
+#include <assert.h>
+#include <openssl/crypto.h>
+#include <openssl/dso.h>
+#include <openssl/engine.h>
+#include <openssl/evp.h>
+#include <openssl/aes.h>
+#include <openssl/err.h>
+#include <cryptlib.h>
+
+int intel_AES_set_encrypt_key(const unsigned char *userKey, const int bits=
,
+ AES_KEY *key);
+int intel_AES_set_decrypt_key(const unsigned char *userKey, const int bits=
,
+ AES_KEY *key);
+
+void intel_AES_encrypt(const unsigned char *in, unsigned char *out,
+ const AES_KEY *key);
+void intel_AES_decrypt(const unsigned char *in, unsigned char *out,
+ const AES_KEY *key);
+
+void intel_AES_ecb_encrypt(const unsigned char *in,
+ unsigned char *out,
+ const unsigned long length,
+ const AES_KEY *key,
+ const int enc);
+void intel_AES_cbc_encrypt(const unsigned char *in,
+ unsigned char *out,
+ const unsigned long length,
+ const AES_KEY *key,
+ unsigned char *ivec, const int enc);
+static void intel_AES_cfb128_encrypt(const unsigned char *in,
+ unsigned char *out,
+ const unsigned long length,
+ const AES_KEY *key,
+ unsigned char *ivec, int *num,
+ const int enc);
+static void intel_AES_ofb128_encrypt(const unsigned char *in,
+ unsigned char *out,
+ const unsigned long length,
+ const AES_KEY *key,
+ unsigned char *ivec, int *num);
+
+/* AES-NI is available *ONLY* on some x86 CPUs. Not only that it
+ doesn't exist elsewhere, but it even can't be compiled on other
+ platforms! */
+#undef COMPILE_HW_INTEL_AES_NI
+#if (defined(__x86_64) || defined(__x86_64__) || defined(_M_AMD64)) && !de=
fined(I386_ONLY)
+#define COMPILE_HW_INTEL_AES_NI
+static ENGINE *ENGINE_intel_aes_ni (void);
+#endif
+
+void ENGINE_load_intel_aes_ni (void)
+{
+/* On non-x86 CPUs it just returns. */
+#ifdef COMPILE_HW_INTEL_AES_NI
+ ENGINE *toadd =3D ENGINE_intel_aes_ni();
+ if (!toadd)
+ return;
+ ENGINE_add (toadd);
+ ENGINE_free (toadd);
+ ERR_clear_error ();
+#endif
+}
+
+#ifdef COMPILE_HW_INTEL_AES_NI
+/* Function for ENGINE detection and control */
+static int intel_aes_ni_init(ENGINE *e);
+
+/* Cipher Stuff */
+static int intel_aes_ni_ciphers(ENGINE *e, const EVP_CIPHER **cipher,
+ const int **nids, int nid);
+
+/* Engine names */
+static const char *intel_aes_ni_id =3D "INTEL_AES_NI";
+static char *intel_aes_ni_name =3D "INTEL_AES_NI";
+
+/* =3D=3D=3D=3D=3D Engine "management" functions =3D=3D=3D=3D=3D */
+
+/* Prepare the ENGINE structure for registration */
+static int
+intel_aes_ni_bind_helper(ENGINE *e)
+{
+ if (!(OPENSSL_ia32cap_P & (1UL << 57)))
+ return 0;
+
+ /* Register everything or return with an error */
+ if (!ENGINE_set_id(e, intel_aes_ni_id) ||
+ !ENGINE_set_name(e, intel_aes_ni_name) ||
+
+ !ENGINE_set_init_function(e, intel_aes_ni_init) ||
+ !ENGINE_set_ciphers (e, intel_aes_ni_ciphers))
+ return 0;
+
+ /* Everything looks good */
+ return 1;
+}
+
+/* Constructor */
+static ENGINE *
+ENGINE_intel_aes_ni(void)
+{
+ ENGINE *eng =3D ENGINE_new();
+
+ if (!eng) {
+ return NULL;
+ }
+
+ if (!intel_aes_ni_bind_helper(eng)) {
+ ENGINE_free(eng);
+ return NULL;
+ }
+
+ return eng;
+}
+
+/* Check availability of the engine */
+static int
+intel_aes_ni_init(ENGINE *e)
+{
+ return 1;
+}
+
+#if defined(NID_aes_128_cfb128) && ! defined (NID_aes_128_cfb)
+#define NID_aes_128_cfb NID_aes_128_cfb128
+#endif
+
+#if defined(NID_aes_128_ofb128) && ! defined (NID_aes_128_ofb)
+#define NID_aes_128_ofb NID_aes_128_ofb128
+#endif
+
+#if defined(NID_aes_192_cfb128) && ! defined (NID_aes_192_cfb)
+#define NID_aes_192_cfb NID_aes_192_cfb128
+#endif
+
+#if defined(NID_aes_192_ofb128) && ! defined (NID_aes_192_ofb)
+#define NID_aes_192_ofb NID_aes_192_ofb128
+#endif
+
+#if defined(NID_aes_256_cfb128) && ! defined (NID_aes_256_cfb)
+#define NID_aes_256_cfb NID_aes_256_cfb128
+#endif
+
+#if defined(NID_aes_256_ofb128) && ! defined (NID_aes_256_ofb)
+#define NID_aes_256_ofb NID_aes_256_ofb128
+#endif
+
+/* List of supported ciphers. */
+static int intel_aes_ni_cipher_nids[] =3D {
+ NID_aes_128_ecb,
+ NID_aes_128_cbc,
+ NID_aes_128_cfb,
+ NID_aes_128_ofb,
+
+ NID_aes_192_ecb,
+ NID_aes_192_cbc,
+ NID_aes_192_cfb,
+ NID_aes_192_ofb,
+
+ NID_aes_256_ecb,
+ NID_aes_256_cbc,
+ NID_aes_256_cfb,
+ NID_aes_256_ofb,
+};
+static int intel_aes_ni_cipher_nids_num =3D
+ (sizeof(intel_aes_ni_cipher_nids)/sizeof(intel_aes_ni_cipher_nids[0]));
+
+/* Function prototypes ... */
+static int intel_aes_init_key(EVP_CIPHER_CTX *ctx, const unsigned char *ke=
y,
+ const unsigned char *iv, int enc);
+static int intel_aes_cipher(EVP_CIPHER_CTX *ctx, unsigned char *out,
+ const unsigned char *in, size_t inl);
+
+typedef struct
+{
+ AES_KEY ks;
+ unsigned int _pad1[3];
+} INTEL_AES_KEY;
+
+#define AES_BLOCK_SIZE 16
+
+#define EVP_CIPHER_block_size_ECB AES_BLOCK_SIZE
+#define EVP_CIPHER_block_size_CBC AES_BLOCK_SIZE
+#define EVP_CIPHER_block_size_OFB 1
+#define EVP_CIPHER_block_size_CFB 1
+
+/* Declaring so many ciphers by hand would be a pain.
+ Instead introduce a bit of preprocessor magic :-) */
+#define DECLARE_AES_EVP(ksize,lmode,umode) \
+static const EVP_CIPHER intel_aes_##ksize##_##lmode =3D { \
+ NID_aes_##ksize##_##lmode, \
+ EVP_CIPHER_block_size_##umode, \
+ ksize / 8, \
+ AES_BLOCK_SIZE, \
+ 0 | EVP_CIPH_##umode##_MODE, \
+ intel_aes_init_key, \
+ intel_aes_cipher, \
+ NULL, \
+ sizeof(INTEL_AES_KEY), \
+ EVP_CIPHER_set_asn1_iv, \
+ EVP_CIPHER_get_asn1_iv, \
+ NULL, \
+ NULL \
+}
+
+DECLARE_AES_EVP(128,ecb,ECB);
+DECLARE_AES_EVP(128,cbc,CBC);
+DECLARE_AES_EVP(128,cfb,CFB);
+DECLARE_AES_EVP(128,ofb,OFB);
+
+DECLARE_AES_EVP(192,ecb,ECB);
+DECLARE_AES_EVP(192,cbc,CBC);
+DECLARE_AES_EVP(192,cfb,CFB);
+DECLARE_AES_EVP(192,ofb,OFB);
+
+DECLARE_AES_EVP(256,ecb,ECB);
+DECLARE_AES_EVP(256,cbc,CBC);
+DECLARE_AES_EVP(256,cfb,CFB);
+DECLARE_AES_EVP(256,ofb,OFB);
+
+static int
+intel_aes_ni_ciphers (ENGINE *e, const EVP_CIPHER **cipher,
+ const int **nids, int nid)
+{
+ /* No specific cipher =3D> return a list of supported nids ... */
+ if (!cipher) {
+ *nids =3D intel_aes_ni_cipher_nids;
+ return intel_aes_ni_cipher_nids_num;
+ }
+
+ /* ... or the requested "cipher" otherwise */
+ switch (nid) {
+ case NID_aes_128_ecb:
+ *cipher =3D &intel_aes_128_ecb;
+ break;
+ case NID_aes_128_cbc:
+ *cipher =3D &intel_aes_128_cbc;
+ break;
+ case NID_aes_128_cfb:
+ *cipher =3D &intel_aes_128_cfb;
+ break;
+ case NID_aes_128_ofb:
+ *cipher =3D &intel_aes_128_ofb;
+ break;
+
+ case NID_aes_192_ecb:
+ *cipher =3D &intel_aes_192_ecb;
+ break;
+ case NID_aes_192_cbc:
+ *cipher =3D &intel_aes_192_cbc;
+ break;
+ case NID_aes_192_cfb:
+ *cipher =3D &intel_aes_192_cfb;
+ break;
+ case NID_aes_192_ofb:
+ *cipher =3D &intel_aes_192_ofb;
+ break;
+
+ case NID_aes_256_ecb:
+ *cipher =3D &intel_aes_256_ecb;
+ break;
+ case NID_aes_256_cbc:
+ *cipher =3D &intel_aes_256_cbc;
+ break;
+ case NID_aes_256_cfb:
+ *cipher =3D &intel_aes_256_cfb;
+ break;
+ case NID_aes_256_ofb:
+ *cipher =3D &intel_aes_256_ofb;
+ break;
+
+ default:
+ /* Sorry, we don't support this NID */
+ *cipher =3D NULL;
+ return 0;
+ }
+
+ return 1;
+}
+
+/* Prepare the encryption key for AES NI usage */
+static int
+intel_aes_init_key (EVP_CIPHER_CTX *ctx, const unsigned char *user_key,
+ const unsigned char *iv, int enc)
+{
+ int ret;
+ AES_KEY *key =3D (AES_KEY *)INTEL_AES_ALIGN(ctx->cipher_data);
+
+ if ((ctx->cipher->flags & EVP_CIPH_MODE) =3D=3D EVP_CIPH_CFB_MODE
+ || (ctx->cipher->flags & EVP_CIPH_MODE) =3D=3D EVP_CIPH_OFB_MODE
+ || enc)
+ ret=3Dintel_AES_set_encrypt_key(user_key, ctx->key_len * 8, key);
+ else
+ ret=3Dintel_AES_set_decrypt_key(user_key, ctx->key_len * 8, key);
+
+ if(ret < 0) {
+ EVPerr(EVP_F_AES_INIT_KEY,EVP_R_AES_KEY_SETUP_FAILED);
+ return 0;
+ }
+
+ return 1;
+}
+
+static int
+intel_aes_cipher(EVP_CIPHER_CTX *ctx, unsigned char *out,
+ const unsigned char *in, size_t inl)
+{
+ AES_KEY *key =3D (AES_KEY *)INTEL_AES_ALIGN(ctx->cipher_data);
+
+ switch (EVP_CIPHER_CTX_mode(ctx)) {
+ case EVP_CIPH_ECB_MODE:
+ intel_AES_ecb_encrypt(in, out, inl, key, ctx->encrypt);
+ break;
+ case EVP_CIPH_CBC_MODE:
+ intel_AES_cbc_encrypt(in, out, inl, key,
+ ctx->iv, ctx->encrypt);
+ break;
+ case EVP_CIPH_CFB_MODE:
+ intel_AES_cfb128_encrypt(in, out, inl, key, ctx->iv,
+ &ctx->num, ctx->encrypt);
+ break;
+ case EVP_CIPH_OFB_MODE:
+ intel_AES_ofb128_encrypt(in, out, inl, key,
+ ctx->iv, &ctx->num);
+ break;
+ default:
+ return 0;
+ }
+
+ return 1;
+}
+
+/* The input and output encrypted as though 128bit cfb mode is being
+ * used. The extra state information to record how much of the
+ * 128bit block we have used is contained in *num;
+ */
+
+static void
+intel_AES_cfb128_encrypt(const unsigned char *in, unsigned char *out,
+ size_t length, const AES_KEY *key,
+ unsigned char *ivec, int *num, const int enc)
+{
+ unsigned int n;
+ size_t l =3D 0;
+
+ assert(in && out && key && ivec && num);
+
+ n =3D *num;
+
+#if !defined(OPENSSL_SMALL_FOOTPRINT)
+ if (AES_BLOCK_SIZE%sizeof(size_t) =3D=3D 0) { /* always true actually */
+ if (enc) {
+ if (n) {
+ while (length) {
+ *(out++) =3D ivec[n] ^=3D *(in++);
+ length--;
+ if(!(n =3D (n + 1) % AES_BLOCK_SIZE))
+ break;
+ }
+ }
+#if defined(STRICT_ALIGNMENT)
+ if (((size_t)in|(size_t)out|(size_t)ivec)%sizeof(size_t) !=3D 0)
+ goto enc_unaligned;
+#endif
+ while ((l + AES_BLOCK_SIZE) <=3D length) {
+ unsigned int i;
+ intel_AES_encrypt(ivec, ivec, key);
+ for (i=3D0;i<AES_BLOCK_SIZE;i+=3Dsizeof(size_t)) {
+ *(size_t*)(out+l+i) =3D
+ *(size_t*)(ivec+i) ^=3D *(size_t*)(in+l+i);
+ }
+ l +=3D AES_BLOCK_SIZE;
+ }
+
+ if (l < length) {
+ intel_AES_encrypt(ivec, ivec, key);
+ do { out[l] =3D ivec[n] ^=3D in[l];
+ l++; n++;
+ } while (l < length);
+ }
+ } else {
+ if (n) {
+ while (length) {
+ unsigned char c;
+ *(out++) =3D ivec[n] ^ (c =3D *(in++)); ivec[n] =3D c;
+ length--;
+ if(!(n =3D (n + 1) % AES_BLOCK_SIZE))
+ break;
+ }
+ }
+#if defined(STRICT_ALIGNMENT)
+ if (((size_t)in|(size_t)out|(size_t)ivec)%sizeof(size_t) !=3D 0)
+ goto dec_unaligned;
+#endif
+ while (l + AES_BLOCK_SIZE <=3D length) {
+ unsigned int i;
+ intel_AES_encrypt(ivec, ivec, key);
+ for (i=3D0;i<AES_BLOCK_SIZE;i+=3Dsizeof(size_t)) {
+ size_t t =3D *(size_t*)(in+l+i);
+ *(size_t*)(out+l+i) =3D *(size_t*)(ivec+i) ^ t;
+ *(size_t*)(ivec+i) =3D t;
+ }
+ l +=3D AES_BLOCK_SIZE;
+ }
+
+ if (l < length) {
+ intel_AES_encrypt(ivec, ivec, key);
+ do { unsigned char c;
+ out[l] =3D ivec[n] ^ (c =3D in[l]); ivec[n] =3D c;
+ l++; n++;
+ } while (l < length);
+ }
+ }
+ *num =3D n;
+ return;
+ }
+#endif
+
+ /* this code would be commonly eliminated by x86* compiler */
+ if (enc) {
+#if defined(STRICT_ALIGNMENT) && !defined(OPENSSL_SMALL_FOOTPRINT)
+ enc_unaligned:
+#endif
+ while (l<length) {
+ if (n =3D=3D 0) {
+ intel_AES_encrypt(ivec, ivec, key);
+ }
+ out[l] =3D ivec[n] ^=3D in[l];
+ l++;
+ n =3D (n+1) % AES_BLOCK_SIZE;
+ }
+ } else {
+#if defined(STRICT_ALIGNMENT) && !defined(OPENSSL_SMALL_FOOTPRINT)
+ dec_unaligned:
+#endif
+ while (l<length) {
+ unsigned char c;
+ if (n =3D=3D 0) {
+ intel_AES_encrypt(ivec, ivec, key);
+ }
+ out[l] =3D ivec[n] ^ (c =3D in[l]); ivec[n] =3D c;
+ l++;
+ n =3D (n+1) % AES_BLOCK_SIZE;
+ }
+ }
+
+ *num=3Dn;
+}
+
+/* The input and output encrypted as though 128bit ofb mode is being
+ * used. The extra state information to record how much of the
+ * 128bit block we have used is contained in *num;
+ */
+static void intel_AES_ofb128_encrypt(const unsigned char *in,
+ unsigned char *out,
+ size_t length, const AES_KEY *key,
+ unsigned char *ivec, int *num)
+{
+ unsigned int n;
+ size_t l=3D0;
+
+ assert(in && out && key && ivec && num);
+
+ n =3D *num;
+
+#if !defined(OPENSSL_SMALL_FOOTPRINT)
+ if (AES_BLOCK_SIZE%sizeof(size_t) =3D=3D 0)
+ do { /* always true actually */
+ if (n) {
+ while (length) {
+ *(out++) =3D ivec[n] ^ *(in++);
+ length--;
+ if(!(n =3D (n + 1) % AES_BLOCK_SIZE))
+ break;
+ }
+ }
+#if defined(STRICT_ALIGNMENT)
+ if (((size_t)in|(size_t)out|(size_t)ivec)%sizeof(size_t) !=3D 0)
+ break;
+#endif
+ while ((l + AES_BLOCK_SIZE) <=3D length) {
+ unsigned int i;
+ intel_AES_encrypt(ivec, ivec, key);
+ for (i=3D0;i<AES_BLOCK_SIZE;i+=3Dsizeof(size_t)) {
+ *(size_t*)(out+l+i) =3D
+ *(size_t*)(ivec+i) ^ *(size_t*)(in+l+i);
+ }
+ l +=3D AES_BLOCK_SIZE;
+ }
+
+ if (l < length) {
+ intel_AES_encrypt(ivec, ivec, key);
+ do { out[l] =3D ivec[n] ^ in[l];
+ l++; n++;
+ } while (l < length);
+ }
+ *num =3D n;
+ return;
+ } while(0);
+#endif
+
+ /* this code would be commonly eliminated by x86* compiler */
+ while (l<length) {
+ if (n =3D=3D 0) {
+ intel_AES_encrypt(ivec, ivec, key);
+ }
+ out[l] =3D ivec[n] ^ in[l];
+ l++;
+ n =3D (n+1) % AES_BLOCK_SIZE;
+ }
+
+ *num=3Dn;
+}
+
+#endif /* COMPILE_HW_INTEL_AES_NI */
+#endif /* !defined(OPENSSL_NO_HW) && !defined(OPENSSL_NO_HW_INTEL_AES_NI) =
&& !defined(OPENSSL_NO_AES) */
--- a/crypto/engine/eng_all.c
+++ b/crypto/engine/eng_all.c
@@ -71,6 +71,9 @@ void ENGINE_load_builtin_engines(void)
#if defined(__OpenBSD__) || defined(__FreeBSD__)
ENGINE_load_cryptodev();
#endif
+#if !defined(OPENSSL_NO_HW) && !defined(OPENSSL_NO_HW_INTEL_AES_NI)
+ ENGINE_load_intel_aes_ni();
+#endif
ENGINE_load_dynamic();
#ifndef OPENSSL_NO_STATIC_ENGINE
#ifndef OPENSSL_NO_HW
--- /dev/null
+++ b/crypto/engine/eng_intel_asm.pl
@@ -0,0 +1,918 @@
+#
+# =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+# Written by Intel Corporation for the OpenSSL project to add support
+# for Intel AES-NI instructions. Rights for redistribution and usage
+# in source and binary forms are granted according to the OpenSSL
+# license.
+#
+# Author: Huang Ying <ying....@intel.com>
+# Vinodh Gopal <vinodh...@intel.com>
+# Kahraman Akdemir
+# =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+#
+
+$output=3Dshift;
+
+$0 =3D~ m/(.*[\/\\])[^\/\\]+$/; $dir=3D$1;
+( $xlate=3D"${dir}x86_64-xlate.pl" and -f $xlate ) or
+( $xlate=3D"${dir}../perlasm/x86_64-xlate.pl" and -f $xlate) or
+die "can't locate x86_64-xlate.pl";
+
+open STDOUT,"| $^X $xlate $output";
+
+$code=3D".text\n";
+
+$state=3D"%xmm0";
+$state1=3D"%xmm0";
+$key=3D"%xmm1";
+$in=3D"%xmm2";
+$in1=3D"%xmm2";
+$iv=3D"%xmm3";
+$state2=3D"%xmm4";
+$state3=3D"%xmm5";
+$state4=3D"%xmm6";
+$in2=3D"%xmm7";
+$in3=3D"%xmm8";
+$in4=3D"%xmm9";
+
+$inp=3D"%r11";
+$len=3D"%rdx";
+$outp=3D"%r10";
+$keyp=3D"%r9";
+$ivp=3D"%r8";
+$rnds=3D"%esi";
+$t1=3D"%rdi";
+$t1d=3D"%edi";
+$tkeyp=3D$t1;
+$t2=3D"%rcx";
+$t3=3D"%rax";
+
+$code.=3D<<___;
+.type _key_expansion_128,\@abi-omnipotent
+.align 16
+_key_expansion_128:
+_key_expansion_256a:
+ pshufd \$0b11111111, %xmm1, %xmm1
+ shufps \$0b00010000, %xmm0, %xmm4
+ pxor %xmm4, %xmm0
+ shufps \$0b10001100, %xmm0, %xmm4
+ pxor %xmm4, %xmm0
+ pxor %xmm1, %xmm0
+ movaps %xmm0, (%rcx)
+ add \$0x10, %rcx
+ ret
+.size _key_expansion_128, . - _key_expansion_128
+___
+
+$code.=3D<<___;
+.type _key_expansion_192a,\@abi-omnipotent
+.align 16
+_key_expansion_192a:
+ pshufd \$0b01010101, %xmm1, %xmm1
+ shufps \$0b00010000, %xmm0, %xmm4
+ pxor %xmm4, %xmm0
+ shufps \$0b10001100, %xmm0, %xmm4
+ pxor %xmm4, %xmm0
+ pxor %xmm1, %xmm0
+
+ movaps %xmm2, %xmm5
+ movaps %xmm2, %xmm6
+ pslldq \$4, %xmm5
+ pshufd \$0b11111111, %xmm0, %xmm3
+ pxor %xmm3, %xmm2
+ pxor %xmm5, %xmm2
+
+ movaps %xmm0, %xmm1
+ shufps \$0b01000100, %xmm0, %xmm6
+ movaps %xmm6, (%rcx)
+ shufps \$0b01001110, %xmm2, %xmm1
+ movaps %xmm1, 0x10(%rcx)
+ add \$0x20, %rcx
+ ret
+.size _key_expansion_192a, . - _key_expansion_192a
+___
+
+$code.=3D<<___;
+.type _key_expansion_192b,\@abi-omnipotent
+.align 16
+_key_expansion_192b:
+ pshufd \$0b01010101, %xmm1, %xmm1
+ shufps \$0b00010000, %xmm0, %xmm4
+ pxor %xmm4, %xmm0
+ shufps \$0b10001100, %xmm0, %xmm4
+ pxor %xmm4, %xmm0
+ pxor %xmm1, %xmm0
+
+ movaps %xmm2, %xmm5
+ pslldq \$4, %xmm5
+ pshufd \$0b11111111, %xmm0, %xmm3
+ pxor %xmm3, %xmm2
+ pxor %xmm5, %xmm2
+
+ movaps %xmm0, (%rcx)
+ add \$0x10, %rcx
+ ret
+.size _key_expansion_192b, . - _key_expansion_192b
+___
+
+$code.=3D<<___;
+.type _key_expansion_256b,\@abi-omnipotent
+.align 16
+_key_expansion_256b:
+ pshufd \$0b10101010, %xmm1, %xmm1
+ shufps \$0b00010000, %xmm2, %xmm4
+ pxor %xmm4, %xmm2
+ shufps \$0b10001100, %xmm2, %xmm4
+ pxor %xmm4, %xmm2
+ pxor %xmm1, %xmm2
+ movaps %xmm2, (%rcx)
+ add \$0x10, %rcx
+ ret
+.size _key_expansion_256b, . - _key_expansion_256b
+___
+
+# int intel_AES_set_encrypt_key(const unsigned char *userKey, const int bi=
ts,
+# AES_KEY *key)
+$code.=3D<<___;
+.globl intel_AES_set_encrypt_key
+.type intel_AES_set_encrypt_key,\@function,3
+.align 16
+intel_AES_set_encrypt_key:
+ call _intel_AES_set_encrypt_key
+ ret
+.size intel_AES_set_encrypt_key, . - intel_AES_set_encrypt_key
+
+.type _intel_AES_set_encrypt_key,\@abi-omnipotent
+.align 16
+_intel_AES_set_encrypt_key:
+ test %rdi, %rdi
+ jz .Lenc_key_invalid_param
+ test %rdx, %rdx
+ jz .Lenc_key_invalid_param
+ movups (%rdi), %xmm0 # user key (first 16 bytes)
+ movaps %xmm0, (%rdx)
+ lea 0x10(%rdx), %rcx # key addr
+ pxor %xmm4, %xmm4 # xmm4 is assumed 0 in _key_expansion_x
+ cmp \$256, %esi
+ jnz .Lenc_key192
+ mov \$14, %esi
+ movl %esi, 240(%rdx) # 14 rounds for 256
+ movups 0x10(%rdi), %xmm2 # other user key
+ movaps %xmm2, (%rcx)
+ add \$0x10, %rcx
+ # aeskeygenassist \$0x1, %xmm2, %xmm1 # round 1
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x01
+ call _key_expansion_256a
+ # aeskeygenassist \$0x1, %xmm0, %xmm1
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x01
+ call _key_expansion_256b
+ # aeskeygenassist \$0x2, %xmm2, %xmm1 # round 2
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x02
+ call _key_expansion_256a
+ # aeskeygenassist \$0x2, %xmm0, %xmm1
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x02
+ call _key_expansion_256b
+ # aeskeygenassist \$0x4, %xmm2, %xmm1 # round 3
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x04
+ call _key_expansion_256a
+ # aeskeygenassist \$0x4, %xmm0, %xmm1
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x04
+ call _key_expansion_256b
+ # aeskeygenassist \$0x8, %xmm2, %xmm1 # round 4
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x08
+ call _key_expansion_256a
+ # aeskeygenassist \$0x8, %xmm0, %xmm1
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x08
+ call _key_expansion_256b
+ # aeskeygenassist \$0x10, %xmm2, %xmm1 # round 5
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x10
+ call _key_expansion_256a
+ # aeskeygenassist \$0x10, %xmm0, %xmm1
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x10
+ call _key_expansion_256b
+ # aeskeygenassist \$0x20, %xmm2, %xmm1 # round 6
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x20
+ call _key_expansion_256a
+ # aeskeygenassist \$0x20, %xmm0, %xmm1
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x20
+ call _key_expansion_256b
+ # aeskeygenassist \$0x40, %xmm2, %xmm1 # round 7
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x40
+ call _key_expansion_256a
+ xor %rax, %rax
+ ret
+.Lenc_key192:
+ cmp \$192, %esi
+ jnz .Lenc_key128
+ mov \$12, %esi
+ movl %esi, 240(%rdx) # 12 rounds for 192
+ movq 0x10(%rdi), %xmm2 # other user key
+ # aeskeygenassist \$0x1, %xmm2, %xmm1 # round 1
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x01
+ call _key_expansion_192a
+ # aeskeygenassist \$0x2, %xmm2, %xmm1 # round 2
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x02
+ call _key_expansion_192b
+ # aeskeygenassist \$0x4, %xmm2, %xmm1 # round 3
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x04
+ call _key_expansion_192a
+ # aeskeygenassist \$0x8, %xmm2, %xmm1 # round 4
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x08
+ call _key_expansion_192b
+ # aeskeygenassist \$0x10, %xmm2, %xmm1 # round 5
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x10
+ call _key_expansion_192a
+ # aeskeygenassist \$0x20, %xmm2, %xmm1 # round 6
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x20
+ call _key_expansion_192b
+ # aeskeygenassist \$0x40, %xmm2, %xmm1 # round 7
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x40
+ call _key_expansion_192a
+ # aeskeygenassist \$0x80, %xmm2, %xmm1 # round 8
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x80
+ call _key_expansion_192b
+ xor %rax, %rax
+ ret
+.Lenc_key128:
+ cmp \$128, %esi
+ jnz .Lenc_key_invalid_key_bits
+ mov \$10, %esi
+ movl %esi, 240(%rdx) # 10 rounds for 128
+ # aeskeygenassist \$0x1, %xmm0, %xmm1 # round 1
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x01
+ call _key_expansion_128
+ # aeskeygenassist \$0x2, %xmm0, %xmm1 # round 2
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x02
+ call _key_expansion_128
+ # aeskeygenassist \$0x4, %xmm0, %xmm1 # round 3
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x04
+ call _key_expansion_128
+ # aeskeygenassist \$0x8, %xmm0, %xmm1 # round 4
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x08
+ call _key_expansion_128
+ # aeskeygenassist \$0x10, %xmm0, %xmm1 # round 5
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x10
+ call _key_expansion_128
+ # aeskeygenassist \$0x20, %xmm0, %xmm1 # round 6
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x20
+ call _key_expansion_128
+ # aeskeygenassist \$0x40, %xmm0, %xmm1 # round 7
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x40
+ call _key_expansion_128
+ # aeskeygenassist \$0x80, %xmm0, %xmm1 # round 8
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x80
+ call _key_expansion_128
+ # aeskeygenassist \$0x1b, %xmm0, %xmm1 # round 9
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x1b
+ call _key_expansion_128
+ # aeskeygenassist \$0x36, %xmm0, %xmm1 # round 10
+ .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x36
+ call _key_expansion_128
+ xor %eax, %eax
+ ret
+.Lenc_key_invalid_param:
+ mov \$-1, %rax
+ ret
+.Lenc_key_invalid_key_bits:
+ mov \$-2, %rax
+ ret
+.size _intel_AES_set_encrypt_key, . - _intel_AES_set_encrypt_key
+___
+
+
+# int intel_AES_set_decrypt_key(const unsigned char *userKey, const int bi=
ts,
+# AES_KEY *key)
+$code.=3D<<___;
+.globl intel_AES_set_decrypt_key
+.type intel_AES_set_decrypt_key,\@function,3
+.align 16
+intel_AES_set_decrypt_key:
+ call _intel_AES_set_encrypt_key
+ test %rax, %rax
+ jnz .Ldec_key_exit
+ lea 0x10(%rdx), %rcx
+ shl \$4, %esi
+ add %rdx, %rsi
+ mov %rsi, %rdi
+.align 4
+.Ldec_key_reorder_loop:
+ movaps (%rdx), %xmm0
+ movaps (%rsi), %xmm1
+ movaps %xmm0, (%rsi)
+ movaps %xmm1, (%rdx)
+ lea 0x10(%rdx), %rdx
+ lea -0x10(%rsi), %rsi
+ cmp %rdx, %rsi
+ ja .Ldec_key_reorder_loop
+.align 4
+.Ldec_key_inv_loop:
+ movaps (%rcx), %xmm0
+ # aesimc %xmm0, %xmm1
+ .byte 0x66, 0x0f, 0x38, 0xdb, 0xc8
+ movaps %xmm1, (%rcx)
+ lea 0x10(%rcx), %rcx
+ cmp %rdi, %rcx
+ jnz .Ldec_key_inv_loop
+.Ldec_key_exit:
+ ret
+.size intel_AES_set_encrypt_key, . - intel_AES_set_encrypt_key
+___
+
+# void intel_AES_encrypt (const void *inp,void *out,const AES_KEY *key);
+$code.=3D<<___;
+.globl intel_AES_encrypt
+.type intel_AES_encrypt,\@function,3
+.align 16
+intel_AES_encrypt:
+ mov %rdi, $inp
+ mov %rsi, $outp
+ mov %rdx, $keyp
+ mov 240($keyp), $rnds # round count
+ movups ($inp), $state # input
+ call _intel_AES_encrypt1
+ movups $state, ($outp) # output
+ ret
+.size intel_AES_encrypt, . - intel_AES_encrypt
+___
+
+# _intel_AES_encrypt1: internal ABI
+# input:
+# $keyp: key struct pointer
+# $rnds: round count
+# $state: initial state (input)
+# output:
+# $state: finial state (output)
+# changed:
+# $key
+# $tkeyp ($t1)
+$code.=3D<<___;
+.type _intel_AES_encrypt1,\@abi-omnipotent
+.align 16
+_intel_AES_encrypt1:
+ movaps ($keyp), $key # key
+ mov $keyp, $tkeyp
+ pxor $key, $state # round 0
+ lea 0x30($tkeyp), $tkeyp
+ cmp \$12, $rnds
+ jb .Lenc128
+ lea 0x20($tkeyp), $tkeyp
+ je .Lenc192
+ lea 0x20($tkeyp), $tkeyp
+ movaps -0x60($tkeyp), $key
+ aesenc $key, $state
+ movaps -0x50($tkeyp), $key
+ aesenc $key, $state
+.align 4
+.Lenc192:
+ movaps -0x40($tkeyp), $key
+ aesenc $key, $state
+ movaps -0x30($tkeyp), $key
+ aesenc $key, $state
+.align 4
+.Lenc128:
+ movaps -0x20($tkeyp), $key
+ aesenc $key, $state
+ movaps -0x10($tkeyp), $key
+ aesenc $key, $state
+ movaps ($tkeyp), $key
+ aesenc $key, $state
+ movaps 0x10($tkeyp), $key
+ aesenc $key, $state
+ movaps 0x20($tkeyp), $key
+ aesenc $key, $state
+ movaps 0x30($tkeyp), $key
+ aesenc $key, $state
+ movaps 0x40($tkeyp), $key
+ aesenc $key, $state
+ movaps 0x50($tkeyp), $key
+ aesenc $key, $state
+ movaps 0x60($tkeyp), $key
+ aesenc $key, $state
+ movaps 0x70($tkeyp), $key
+ aesenclast $key, $state # last round
+ ret
+.size _intel_AES_encrypt1, . - _intel_AES_encrypt1
+___
+
+# _intel_AES_encrypt4: internal ABI
+# input:
+# $keyp: key struct pointer
+# $rnds: round count
+# $state1: initial state (input)
+# $state2
+# $state3
+# $state4
+# output:
+# $state1: finial state (output)
+# $state2
+# $state3
+# $state4
+# changed:
+# $key
+# $tkeyp ($t1)
+$code.=3D<<___;
+.type _intel_AES_encrypt4,\@abi-omnipotent
+.align 16
+_intel_AES_encrypt4:
+ movaps ($keyp), $key # key
+ mov $keyp, $tkeyp
+ pxor $key, $state1 # round 0
+ pxor $key, $state2
+ pxor $key, $state3
+ pxor $key, $state4
+ lea 0x30($tkeyp), $tkeyp
+ cmp \$12, $rnds
+ jb .L4enc128
+ lea 0x20($tkeyp), $tkeyp
+ je .L4enc192
+ lea 0x20($tkeyp), $tkeyp
+ movaps -0x60($tkeyp), $key
+ aesenc $key, $state1
+ aesenc $key, $state2
+ aesenc $key, $state3
+ aesenc $key, $state4
+ movaps -0x50($tkeyp), $key
+ aesenc $key, $state1
+ aesenc $key, $state2
+ aesenc $key, $state3
+ aesenc $key, $state4
+.align 4
+.L4enc192:
+ movaps -0x40($tkeyp), $key
+ aesenc $key, $state1
+ aesenc $key, $state2
+ aesenc $key, $state3
+ aesenc $key, $state4
+ movaps -0x30($tkeyp), $key
+ aesenc $key, $state1
+ aesenc $key, $state2
+ aesenc $key, $state3
+ aesenc $key, $state4
+.align 4
+.L4enc128:
+ movaps -0x20($tkeyp), $key
+ aesenc $key, $state1
+ aesenc $key, $state2
+ aesenc $key, $state3
+ aesenc $key, $state4
+ movaps -0x10($tkeyp), $key
+ aesenc $key, $state1
+ aesenc $key, $state2
+ aesenc $key, $state3
+ aesenc $key, $state4
+ movaps ($tkeyp), $key
+ aesenc $key, $state1
+ aesenc $key, $state2
+ aesenc $key, $state3
+ aesenc $key, $state4
+ movaps 0x10($tkeyp), $key
+ aesenc $key, $state1
+ aesenc $key, $state2
+ aesenc $key, $state3
+ aesenc $key, $state4
+ movaps 0x20($tkeyp), $key
+ aesenc $key, $state1
+ aesenc $key, $state2
+ aesenc $key, $state3
+ aesenc $key, $state4
+ movaps 0x30($tkeyp), $key
+ aesenc $key, $state1
+ aesenc $key, $state2
+ aesenc $key, $state3
+ aesenc $key, $state4
+ movaps 0x40($tkeyp), $key
+ aesenc $key, $state1
+ aesenc $key, $state2
+ aesenc $key, $state3
+ aesenc $key, $state4
+ movaps 0x50($tkeyp), $key
+ aesenc $key, $state1
+ aesenc $key, $state2
+ aesenc $key, $state3
+ aesenc $key, $state4
+ movaps 0x60($tkeyp), $key
+ aesenc $key, $state1
+ aesenc $key, $state2
+ aesenc $key, $state3
+ aesenc $key, $state4
+ movaps 0x70($tkeyp), $key
+ aesenclast $key, $state1 # last round
+ aesenclast $key, $state2
+ aesenclast $key, $state3
+ aesenclast $key, $state4
+ ret
+.size _intel_AES_encrypt4, . - _intel_AES_encrypt4
+___
+
+# void intel_AES_decrypt (const void *inp,void *out,const AES_KEY *key);
+$code.=3D<<___;
+.globl intel_AES_decrypt
+.type intel_AES_decrypt,\@function,3
+.align 16
+intel_AES_decrypt:
+ mov %rdi, $inp
+ mov %rsi, $outp
+ mov %rdx, $keyp
+ mov 240($keyp), $rnds # round count
+ movups ($inp), $state # input
+ call _intel_AES_decrypt1
+ movups $state, ($outp) #output
+ ret
+.size intel_AES_encrypt, . - intel_AES_encrypt
+___
+
+# _intel_AES_decrypt1: internal ABI
+# input:
+# $keyp: key struct pointer
+# $rnds: round count
+# $state: initial state (input)
+# output:
+# $state: finial state (output)
+# changed:
+# $key
+# $tkeyp ($t1)
+$code.=3D<<___;
+.type _intel_AES_decrypt1,\@abi-omnipotent
+.align 16
+_intel_AES_decrypt1:
+ movaps ($keyp), $key # key
+ mov $keyp, $tkeyp
+ pxor $key, $state # round 0
+ lea 0x30($tkeyp), $tkeyp
+ cmp \$12, $rnds
+ jb .Ldec128
+ lea 0x20($tkeyp), $tkeyp
+ je .Ldec192
+ lea 0x20($tkeyp), $tkeyp
+ movaps -0x60($tkeyp), $key
+ aesdec $key, $state
+ movaps -0x50($tkeyp), $key
+ aesdec $key, $state
+.align 4
+.Ldec192:
+ movaps -0x40($tkeyp), $key
+ aesdec $key, $state
+ movaps -0x30($tkeyp), $key
+ aesdec $key, $state
+.align 4
+.Ldec128:
+ movaps -0x20($tkeyp), $key
+ aesdec $key, $state
+ movaps -0x10($tkeyp), $key
+ aesdec $key, $state
+ movaps ($tkeyp), $key
+ aesdec $key, $state
+ movaps 0x10($tkeyp), $key
+ aesdec $key, $state
+ movaps 0x20($tkeyp), $key
+ aesdec $key, $state
+ movaps 0x30($tkeyp), $key
+ aesdec $key, $state
+ movaps 0x40($tkeyp), $key
+ aesdec $key, $state
+ movaps 0x50($tkeyp), $key
+ aesdec $key, $state
+ movaps 0x60($tkeyp), $key
+ aesdec $key, $state
+ movaps 0x70($tkeyp), $key
+ aesdeclast $key, $state # last round
+ ret
+.size _intel_AES_decrypt1, . - _intel_AES_decrypt1
+___
+
+# _intel_AES_decrypt4: internal ABI
+# input:
+# $keyp: key struct pointer
+# $rnds: round count
+# $state1: initial state (input)
+# $state2
+# $state3
+# $state4
+# output:
+# $state1: finial state (output)
+# $state2
+# $state3
+# $state4
+# changed:
+# $key
+# $tkeyp ($t1)
+$code.=3D<<___;
+.type _intel_AES_decrypt4,\@abi-omnipotent
+.align 16
+_intel_AES_decrypt4:
+ movaps ($keyp), $key # key
+ mov $keyp, $tkeyp
+ pxor $key, $state1 # round 0
+ pxor $key, $state2
+ pxor $key, $state3
+ pxor $key, $state4
+ lea 0x30($tkeyp), $tkeyp
+ cmp \$12, $rnds
+ jb .L4dec128
+ lea 0x20($tkeyp), $tkeyp
+ je .L4dec192
+ lea 0x20($tkeyp), $tkeyp
+ movaps -0x60($tkeyp), $key
+ aesdec $key, $state1
+ aesdec $key, $state2
+ aesdec $key, $state3
+ aesdec $key, $state4
+ movaps -0x50($tkeyp), $key
+ aesdec $key, $state1
+ aesdec $key, $state2
+ aesdec $key, $state3
+ aesdec $key, $state4
+.align 4
+.L4dec192:
+ movaps -0x40($tkeyp), $key
+ aesdec $key, $state1
+ aesdec $key, $state2
+ aesdec $key, $state3
+ aesdec $key, $state4
+ movaps -0x30($tkeyp), $key
+ aesdec $key, $state1
+ aesdec $key, $state2
+ aesdec $key, $state3
+ aesdec $key, $state4
+.align 4
+.L4dec128:
+ movaps -0x20($tkeyp), $key
+ aesdec $key, $state1
+ aesdec $key, $state2
+ aesdec $key, $state3
+ aesdec $key, $state4
+ movaps -0x10($tkeyp), $key
+ aesdec $key, $state1
+ aesdec $key, $state2
+ aesdec $key, $state3
+ aesdec $key, $state4
+ movaps ($tkeyp), $key
+ aesdec $key, $state1
+ aesdec $key, $state2
+ aesdec $key, $state3
+ aesdec $key, $state4
+ movaps 0x10($tkeyp), $key
+ aesdec $key, $state1
+ aesdec $key, $state2
+ aesdec $key, $state3
+ aesdec $key, $state4
+ movaps 0x20($tkeyp), $key
+ aesdec $key, $state1
+ aesdec $key, $state2
+ aesdec $key, $state3
+ aesdec $key, $state4
+ movaps 0x30($tkeyp), $key
+ aesdec $key, $state1
+ aesdec $key, $state2
+ aesdec $key, $state3
+ aesdec $key, $state4
+ movaps 0x40($tkeyp), $key
+ aesdec $key, $state1
+ aesdec $key, $state2
+ aesdec $key, $state3
+ aesdec $key, $state4
+ movaps 0x50($tkeyp), $key
+ aesdec $key, $state1
+ aesdec $key, $state2
+ aesdec $key, $state3
+ aesdec $key, $state4
+ movaps 0x60($tkeyp), $key
+ aesdec $key, $state1
+ aesdec $key, $state2
+ aesdec $key, $state3
+ aesdec $key, $state4
+ movaps 0x70($tkeyp), $key
+ aesdeclast $key, $state1 # last round
+ aesdeclast $key, $state2
+ aesdeclast $key, $state3
+ aesdeclast $key, $state4
+ ret
+.size _intel_AES_decrypt4, . - _intel_AES_decrypt4
+___
+
+# void intel_AES_ecb_encrypt(const unsigned char *in, unsigned char *out,
+# size_t length, const AES_KEY *key,
+# const int enc);
+$code.=3D<<___;
+.globl intel_AES_ecb_encrypt
+.type intel_AES_ecb_encrypt,\@function,5
+.align 16
+intel_AES_ecb_encrypt:
+ test $len, $len # check length
+ jz .Lecb_just_ret
+ mov %rdi, $inp
+ mov %rsi, $outp
+ mov %r8d, $t1d # clear upper half of enc
+ mov %rcx, $keyp
+ mov 240($keyp), $rnds
+ test $t1, $t1
+ jz .Lecb_decrypt
+#--------------------------- ENCRYPT ------------------------------#
+ cmp \$16, $len
+ jb .Lecb_just_ret
+ cmp \$64, $len
+ jb .Lecb_enc_loop1
+.align 4
+.Lecb_enc_loop4:
+ movups ($inp), $state1
+ movups 0x10($inp), $state2
+ movups 0x20($inp), $state3
+ movups 0x30($inp), $state4
+ call _intel_AES_encrypt4
+ movups $state1, ($outp)
+ movups $state2, 0x10($outp)
+ movups $state3, 0x20($outp)
+ movups $state4, 0x30($outp)
+ sub \$64, $len
+ add \$64, $inp
+ add \$64, $outp
+ cmp \$64, $len
+ jge .Lecb_enc_loop4
+ cmp \$16, $len
+ jb .Lecb_just_ret
+.align 4
+.Lecb_enc_loop1:
+ movups ($inp), $state1
+ call _intel_AES_encrypt1
+ movups $state1, ($outp)
+ sub \$16, $len
+ add \$16, $inp
+ add \$16, $outp
+ cmp \$16, $len
+ jge .Lecb_enc_loop1
+ jmp .Lecb_just_ret
+#--------------------------- DECRYPT ------------------------------#
+.Lecb_decrypt:
+ cmp \$16, $len
+ jb .Lecb_just_ret
+ cmp \$64, $len
+ jb .Lecb_dec_loop1
+.align 4
+.Lecb_dec_loop4:
+ movups ($inp), $state1
+ movups 0x10($inp), $state2
+ movups 0x20($inp), $state3
+ movups 0x30($inp), $state4
+ call _intel_AES_decrypt4
+ movups $state1, ($outp)
+ movups $state2, 0x10($outp)
+ movups $state3, 0x20($outp)
+ movups $state4, 0x30($outp)
+ sub \$64, $len
+ add \$64, $inp
+ add \$64, $outp
+ cmp \$64, $len
+ jge .Lecb_dec_loop4
+ cmp \$16, $len
+ jb .Lecb_just_ret
+.align 4
+.Lecb_dec_loop1:
+ movups ($inp), $state1
+ call _intel_AES_decrypt1
+ movups $state1, ($outp)
+ sub \$16, $len
+ add \$16, $inp
+ add \$16, $outp
+ cmp \$16, $len
+ jge .Lecb_dec_loop1
+.Lecb_just_ret:
+ ret
+.size intel_AES_ecb_encrypt, . - intel_AES_ecb_encrypt
+___
+
+# void intel_AES_cbc_encrypt (const void char *inp, unsigned char *out,
+# size_t length, const AES_KEY *key,
+# unsigned char *ivp,const int enc);
+$code.=3D<<___;
+.globl intel_AES_cbc_encrypt
+.type intel_AES_cbc_encrypt,\@function,6
+.align 16
+intel_AES_cbc_encrypt:
+ test $len, $len # check length
+ jz .Lcbc_just_ret
+ mov %rdi, $inp
+ mov %rsi, $outp
+ mov %r9d, $t1d # clear upper half of enc
+ mov %rcx, $keyp
+ mov 240($keyp), $rnds
+ test $t1, $t1
+ jz .Lcbc_decrypt
+#--------------------------- ENCRYPT ------------------------------#
+ movups ($ivp), $state # load iv as initial state
+ cmp \$16, $len
+ jb .Lcbc_enc_tail
+.align 4
+.Lcbc_enc_loop:
+ movups ($inp), $in # load input
+ pxor $in, $state
+ call _intel_AES_encrypt1
+ movups $state, ($outp) # store output
+ sub \$16, $len
+ add \$16, $inp
+ add \$16, $outp
+ cmp \$16, $len
+ jge .Lcbc_enc_loop
+ test \$0xf, $len
+ jnz .Lcbc_enc_tail
+ movups $state, ($ivp)
+ jmp .Lcbc_just_ret
+.Lcbc_enc_tail:
+ mov $len, %rcx
+ mov $inp, %rsi
+ mov $outp, %rdi
+ .long 0x9066A4F3 # rep movsb
+ mov 240($keyp), $rnds # restore $rnds (%esi)
+ mov \$16, %rcx # zero tail
+ sub $len, %rcx
+ xor %rax, %rax
+ .long 0x9066AAF3 # rep stosb
+ mov $outp, $inp # this is not a mistake!
+ movq \$16, $len # len=3D16
+ jmp .Lcbc_enc_loop # one more spin
+#--------------------------- DECRYPT ------------------------------#
+.Lcbc_decrypt:
+ movups ($ivp), $iv
+ cmp \$16, $len
+ jb .Lcbc_dec_tail
+ cmp \$64, $len
+ jb .Lcbc_dec_loop1
+.align 4
+.Lcbc_dec_loop4:
+ movups ($inp), $in1
+ movaps $in1, $state1
+ movups 0x10($inp), $in2
+ movaps $in2, $state2
+ movups 0x20($inp), $in3
+ movaps $in3, $state3
+ movups 0x30($inp), $in4
+ movaps $in4, $state4
+ call _intel_AES_decrypt4
+ pxor $iv, $state1
+ pxor $in1, $state2
+ pxor $in2, $state3
+ pxor $in3, $state4
+ movaps $in4, $iv
+ movups $state1, ($outp)
+ movups $state2, 0x10($outp)
+ movups $state3, 0x20($outp)
+ movups $state4, 0x30($outp)
+ sub \$64, $len
+ add \$64, $inp
+ add \$64, $outp
+ cmp \$64, $len
+ jge .Lcbc_dec_loop4
+ cmp \$0, $len
+ jz .Lcbc_dec_ret
+ cmp \$16, $len
+ jb .Lcbc_dec_tail
+.align 4
+.Lcbc_dec_loop1:
+ movups ($inp), $in
+ movaps $in, $state
+ call _intel_AES_decrypt1
+ pxor $iv, $state
+ movups $state, ($outp)
+ movaps $in, $iv
+ sub \$16, $len
+ add \$16, $inp
+ add \$16, $outp
+ cmp \$16, $len
+ jge .Lcbc_dec_loop1
+ test \$0xf, $len
+ jz .Lcbc_dec_ret
+.Lcbc_dec_tail:
+ movups ($inp), $in
+ movaps $in, $state
+ call _intel_AES_decrypt1
+ pxor $iv, $state
+ movaps $in, $iv
+ sub \$16, %rsp # alloc temporary space
+ movups $state, (%rsp)
+ mov $outp, %rdi
+ mov %rsp, %rsi
+ mov $len, %rcx
+ .long 0x9066A4F3 # rep movsb
+ mov %rsp, %rdi # clear stack
+ mov \$16, %rcx
+ xor %rax, %rax
+ .long 0x9066AAF3 # rep stosb
+ add \$16, %rsp
+.Lcbc_dec_ret:
+ movups $iv, ($ivp)
+.Lcbc_just_ret:
+ ret
+.size intel_AES_cbc_encrypt, . - intel_AES_cbc_encrypt
+___
+
+$code.=3D<<___;
+ .long 0x80808080, 0x80808080, 0xfefefefe, 0xfefefefe
+ .long 0x1b1b1b1b, 0x1b1b1b1b, 0, 0
+.asciz "AES for Intel AES-NI, CRYPTOGAMS by <ying.huang\@intel.com>"
+.align 64
+___
+
+$code =3D~ s/\`([^\`]*)\`/eval($1)/gem;
+
+print $code;
+
+close STDOUT;
--- a/crypto/engine/Makefile
+++ b/crypto/engine/Makefile
@@ -11,6 +11,8 @@ MAKEFILE=3D Makefile
AR=3D ar r
=20
CFLAGS=3D $(INCLUDES) $(CFLAG)
+ASFLAGS=3D $(INCLUDES) $(ASFLAG)
+AFLAGS=3D $(ASFLAGS)
=20
GENERAL=3DMakefile
TEST=3D enginetest.c
@@ -21,12 +23,14 @@ LIBSRC=3D eng_err.c eng_lib.c eng_list.c e
eng_table.c eng_pkey.c eng_fat.c eng_all.c \
tb_rsa.c tb_dsa.c tb_ecdsa.c tb_dh.c tb_ecdh.c tb_rand.c tb_store.c \
tb_cipher.c tb_digest.c tb_pkmeth.c tb_asnmth.c \
- eng_openssl.c eng_cnf.c eng_dyn.c eng_cryptodev.c
+ eng_openssl.c eng_cnf.c eng_dyn.c eng_cryptodev.c \
+ eng_intel.c
LIBOBJ=3D eng_err.o eng_lib.o eng_list.o eng_init.o eng_ctrl.o \
eng_table.o eng_pkey.o eng_fat.o eng_all.o \
tb_rsa.o tb_dsa.o tb_ecdsa.o tb_dh.o tb_ecdh.o tb_rand.o tb_store.o \
tb_cipher.o tb_digest.o tb_pkmeth.o tb_asnmth.o \
- eng_openssl.o eng_cnf.o eng_dyn.o eng_cryptodev.o
+ eng_openssl.o eng_cnf.o eng_dyn.o eng_cryptodev.o \
+ eng_intel.o eng_intel_asm.o
=20
SRC=3D $(LIBSRC)
=20
@@ -45,6 +49,9 @@ lib: $(LIBOBJ)
$(RANLIB) $(LIB) || echo Never mind.
@touch lib
=20
+eng_intel_asm.s: eng_intel_asm.pl
+ $(PERL) eng_intel_asm.pl $(PERLASM_SCHEME) > $@
+
files:
$(PERL) $(TOP)/util/files.pl Makefile >> $(TOP)/MINFO
=20


--=-wxlSkMZ7lEbVOuYRYYOP
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEABECAAYFAklQjMkACgkQKhFGF+eHlpiH5gCfY7u8+Re+am6k5culNtM817za
JBcAn0NHZMABwjwPSV3uhWDdrqfJf+9D
=g1mB
-----END PGP SIGNATURE-----

--=-wxlSkMZ7lEbVOuYRYYOP--

______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List opens...@openssl.org
Automated List Manager majo...@openssl.org

Andy Polyakov

unread,
Dec 23, 2008, 11:58:33 AM12/23/08
to
>> This patch adds support to Intel AES-NI instruction set for x86_64
>> platform.
>
> Cool. I'm relying on Andy to provide a more thorough review

Even after short glance I can tell there will be a lot of comments and
even work to do, but I'm planning to take it later... ... ... ... ...

> Also, if you have no philosophical objection, I think the file and symbol
> naming should be based on the interface rather than the manufacturer
> (particularly for "intel", who provide lots of h/w and interfaces that
> have nothing to do with AES-NI). Perhaps eng_aesni.c rather than
> eng_intel.c.

I second it. Ying, there is nothing preventing us from renaming files
and functions (assuming that you have no philosophical objections), but
*if* you choose to submit another patch with alternative naming, could
you look into crypto/modes and use it? At earlier occasion you commented
"hope that it can be merged quickly," but it was committed to OpenSSL
CVS prior I mentioned it... Or is it that you might have failed to pull
it to your repository, but then it's something we have no power to make
quicker...

Out of curiosity, what does "NI" stand for anyway? Or is it just
something the knights kept saying? But didn't they stop doing so? Cheers. A.

Huang Ying

unread,
Dec 23, 2008, 10:12:29 PM12/23/08
to

--=-soGQPDluQVT96g6xpEN0
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

This patch adds support to Intel AES-NI instruction set for x86_64
platform.

Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD)


instructions that are going to be introduced in the next generation of
Intel processor, as of 2009. These instructions enable fast and secure
data encryption and decryption, using the Advanced Encryption Standard
(AES), defined by FIPS Publication number 197. The architecture
introduces six instructions that offer full hardware support for
AES. Four of them support high performance data encryption and
decryption, and the other two instructions support the AES key
expansion procedure.

The white paper can be downloaded from:

http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-=
Set_WP.pdf


AES-NI support is implemented as an engine in crypto/engine/.


ChangeLog:

v3:

- Rename INTEL or INTEL_AES stuff to AESNI

- Use cfb and ofb modes implementation of crypto/modes instead of copying.

v2:

- AES-NI support is implemented as an engine instead of "branch".

- ECB and CBC modes are implemented in parallel style to take
advantage of pipelined hardware implementation.

- AES key scheduling algorithm is re-implemented with higher performance.


Known issues:

- How to add conditional compilation for eng_intel_asm.pl? It can not
be compiled on non-x86 platform.

- NID for CTR mode can not be found, how to support it in engine?

- CFB1, CFB8, OFB1, OFB8 modes are not supported. If it is necessary
to add AES-NI support for them, I can add them.


Signed-off-by: Huang Ying <ying....@intel.com>

---
crypto/engine/Makefile | 11=20
crypto/engine/eng_aesni.c | 409 ++++++++++++++++++
crypto/engine/eng_aesni_asm.pl | 918 ++++++++++++++++++++++++++++++++++++=
+++++
crypto/engine/eng_all.c | 3=20
crypto/engine/engine.h | 1=20
5 files changed, 1340 insertions(+), 2 deletions(-)

--- /dev/null
+++ b/crypto/engine/eng_aesni.c
@@ -0,0 +1,409 @@


+/*
+ * Support for Intel AES-NI intruction set
+ * Author: Huang Ying <ying....@intel.com>
+ *

+ * Intel AES-NI is a new set of Single Instruction Multiple Data
+ * (SIMD) instructions that are going to be introduced in the next
+ * generation of Intel processor, as of 2009. These instructions
+ * enable fast and secure data encryption and decryption, using the
+ * Advanced Encryption Standard (AES), defined by FIPS Publication
+ * number 197. The architecture introduces six instructions that
+ * offer full hardware support for AES. Four of them support high
+ * performance data encryption and decryption, and the other two
+ * instructions support the AES key expansion procedure.
+ *
+ * The white paper can be downloaded from:
+ * http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instruc=
tions-Set_WP.pdf
+ *
+ * This file is based on engines/e_padlock.c

+#if !defined(OPENSSL_NO_HW) && !defined(OPENSSL_NO_HW_AES_NI) && !defined(=
OPENSSL_NO_AES)


+
+#include <stdio.h>
+#include <string.h>
+#include <assert.h>
+#include <openssl/crypto.h>
+#include <openssl/dso.h>
+#include <openssl/engine.h>
+#include <openssl/evp.h>
+#include <openssl/aes.h>
+#include <openssl/err.h>
+#include <cryptlib.h>

+#include "crypto/modes/modes.h"


+
+/* AES-NI is available *ONLY* on some x86 CPUs. Not only that it
+ doesn't exist elsewhere, but it even can't be compiled on other
+ platforms! */

+#undef COMPILE_HW_AESNI


+#if (defined(__x86_64) || defined(__x86_64__) || defined(_M_AMD64)) && !de=
fined(I386_ONLY)

+#define COMPILE_HW_AESNI
+static ENGINE *ENGINE_aesni (void);
+#endif
+
+void ENGINE_load_aesni (void)


+{
+/* On non-x86 CPUs it just returns. */

+#ifdef COMPILE_HW_AESNI
+ ENGINE *toadd =3D ENGINE_aesni();


+ if (!toadd)
+ return;
+ ENGINE_add (toadd);
+ ENGINE_free (toadd);
+ ERR_clear_error ();
+#endif
+}
+

+#ifdef COMPILE_HW_AESNI
+int aesni_set_encrypt_key(const unsigned char *userKey, const int bits,
+ AES_KEY *key);
+int aesni_set_decrypt_key(const unsigned char *userKey, const int bits,
+ AES_KEY *key);
+
+void aesni_encrypt(const unsigned char *in, unsigned char *out,
+ const AES_KEY *key);
+void aesni_decrypt(const unsigned char *in, unsigned char *out,


+ const AES_KEY *key);
+

+void aesni_ecb_encrypt(const unsigned char *in,


+ unsigned char *out,
+ const unsigned long length,
+ const AES_KEY *key,
+ const int enc);

+void aesni_cbc_encrypt(const unsigned char *in,


+ unsigned char *out,
+ const unsigned long length,
+ const AES_KEY *key,
+ unsigned char *ivec, const int enc);
+

+/* Function for ENGINE detection and control */

+static int aesni_init(ENGINE *e);


+
+/* Cipher Stuff */

+static int aesni_ciphers(ENGINE *e, const EVP_CIPHER **cipher,


+ const int **nids, int nid);
+

+#define AESNI_MIN_ALIGN 16
+#define AESNI_ALIGN(x) \
+ ((void *)(((unsigned long)(x)+AESNI_MIN_ALIGN-1)&~(AESNI_MIN_ALIGN-1)))


+
+/* Engine names */

+static const char *aesni_id =3D "AESNI";
+static char *aesni_name =3D "AESNI";


+
+/* =3D=3D=3D=3D=3D Engine "management" functions =3D=3D=3D=3D=3D */
+
+/* Prepare the ENGINE structure for registration */
+static int

+aesni_bind_helper(ENGINE *e)


+{
+ if (!(OPENSSL_ia32cap_P & (1UL << 57)))
+ return 0;
+
+ /* Register everything or return with an error */

+ if (!ENGINE_set_id(e, aesni_id) ||
+ !ENGINE_set_name(e, aesni_name) ||
+
+ !ENGINE_set_init_function(e, aesni_init) ||
+ !ENGINE_set_ciphers (e, aesni_ciphers))


+ return 0;
+
+ /* Everything looks good */
+ return 1;
+}
+
+/* Constructor */
+static ENGINE *

+ENGINE_aesni(void)


+{
+ ENGINE *eng =3D ENGINE_new();
+
+ if (!eng) {
+ return NULL;
+ }
+

+ if (!aesni_bind_helper(eng)) {


+ ENGINE_free(eng);
+ return NULL;
+ }
+
+ return eng;
+}
+
+/* Check availability of the engine */
+static int

+aesni_init(ENGINE *e)

+static int aesni_cipher_nids[] =3D {


+ NID_aes_128_ecb,
+ NID_aes_128_cbc,
+ NID_aes_128_cfb,
+ NID_aes_128_ofb,
+
+ NID_aes_192_ecb,
+ NID_aes_192_cbc,
+ NID_aes_192_cfb,
+ NID_aes_192_ofb,
+
+ NID_aes_256_ecb,
+ NID_aes_256_cbc,
+ NID_aes_256_cfb,
+ NID_aes_256_ofb,
+};

+static int aesni_cipher_nids_num =3D
+ (sizeof(aesni_cipher_nids)/sizeof(aesni_cipher_nids[0]));


+
+/* Function prototypes ... */

+static int aesni_init_key(EVP_CIPHER_CTX *ctx, const unsigned char *key,


+ const unsigned char *iv, int enc);

+static int aesni_cipher(EVP_CIPHER_CTX *ctx, unsigned char *out,


+ const unsigned char *in, size_t inl);
+
+typedef struct
+{
+ AES_KEY ks;
+ unsigned int _pad1[3];

+} AESNI_KEY;


+
+#define AES_BLOCK_SIZE 16
+
+#define EVP_CIPHER_block_size_ECB AES_BLOCK_SIZE
+#define EVP_CIPHER_block_size_CBC AES_BLOCK_SIZE
+#define EVP_CIPHER_block_size_OFB 1
+#define EVP_CIPHER_block_size_CFB 1
+
+/* Declaring so many ciphers by hand would be a pain.
+ Instead introduce a bit of preprocessor magic :-) */
+#define DECLARE_AES_EVP(ksize,lmode,umode) \

+static const EVP_CIPHER aesni_##ksize##_##lmode =3D { \


+ NID_aes_##ksize##_##lmode, \
+ EVP_CIPHER_block_size_##umode, \
+ ksize / 8, \
+ AES_BLOCK_SIZE, \
+ 0 | EVP_CIPH_##umode##_MODE, \

+ aesni_init_key, \
+ aesni_cipher, \
+ NULL, \
+ sizeof(AESNI_KEY), \


+ EVP_CIPHER_set_asn1_iv, \
+ EVP_CIPHER_get_asn1_iv, \
+ NULL, \
+ NULL \
+}
+
+DECLARE_AES_EVP(128,ecb,ECB);
+DECLARE_AES_EVP(128,cbc,CBC);
+DECLARE_AES_EVP(128,cfb,CFB);
+DECLARE_AES_EVP(128,ofb,OFB);
+
+DECLARE_AES_EVP(192,ecb,ECB);
+DECLARE_AES_EVP(192,cbc,CBC);
+DECLARE_AES_EVP(192,cfb,CFB);
+DECLARE_AES_EVP(192,ofb,OFB);
+
+DECLARE_AES_EVP(256,ecb,ECB);
+DECLARE_AES_EVP(256,cbc,CBC);
+DECLARE_AES_EVP(256,cfb,CFB);
+DECLARE_AES_EVP(256,ofb,OFB);
+
+static int

+aesni_ciphers (ENGINE *e, const EVP_CIPHER **cipher,


+ const int **nids, int nid)
+{
+ /* No specific cipher =3D> return a list of supported nids ... */
+ if (!cipher) {

+ *nids =3D aesni_cipher_nids;
+ return aesni_cipher_nids_num;


+ }
+
+ /* ... or the requested "cipher" otherwise */
+ switch (nid) {
+ case NID_aes_128_ecb:

+ *cipher =3D &aesni_128_ecb;


+ break;
+ case NID_aes_128_cbc:

+ *cipher =3D &aesni_128_cbc;


+ break;
+ case NID_aes_128_cfb:

+ *cipher =3D &aesni_128_cfb;


+ break;
+ case NID_aes_128_ofb:

+ *cipher =3D &aesni_128_ofb;


+ break;
+
+ case NID_aes_192_ecb:

+ *cipher =3D &aesni_192_ecb;


+ break;
+ case NID_aes_192_cbc:

+ *cipher =3D &aesni_192_cbc;


+ break;
+ case NID_aes_192_cfb:

+ *cipher =3D &aesni_192_cfb;


+ break;
+ case NID_aes_192_ofb:

+ *cipher =3D &aesni_192_ofb;


+ break;
+
+ case NID_aes_256_ecb:

+ *cipher =3D &aesni_256_ecb;


+ break;
+ case NID_aes_256_cbc:

+ *cipher =3D &aesni_256_cbc;


+ break;
+ case NID_aes_256_cfb:

+ *cipher =3D &aesni_256_cfb;


+ break;
+ case NID_aes_256_ofb:

+ *cipher =3D &aesni_256_ofb;


+ break;
+
+ default:
+ /* Sorry, we don't support this NID */
+ *cipher =3D NULL;
+ return 0;
+ }
+
+ return 1;
+}
+
+/* Prepare the encryption key for AES NI usage */
+static int

+aesni_init_key (EVP_CIPHER_CTX *ctx, const unsigned char *user_key,


+ const unsigned char *iv, int enc)
+{
+ int ret;

+ AES_KEY *key =3D AESNI_ALIGN(ctx->cipher_data);


+
+ if ((ctx->cipher->flags & EVP_CIPH_MODE) =3D=3D EVP_CIPH_CFB_MODE
+ || (ctx->cipher->flags & EVP_CIPH_MODE) =3D=3D EVP_CIPH_OFB_MODE
+ || enc)

+ ret=3Daesni_set_encrypt_key(user_key, ctx->key_len * 8, key);
+ else
+ ret=3Daesni_set_decrypt_key(user_key, ctx->key_len * 8, key);


+
+ if(ret < 0) {
+ EVPerr(EVP_F_AES_INIT_KEY,EVP_R_AES_KEY_SETUP_FAILED);
+ return 0;
+ }
+
+ return 1;
+}
+
+static int

+aesni_cipher(EVP_CIPHER_CTX *ctx, unsigned char *out,


+ const unsigned char *in, size_t inl)
+{

+ AES_KEY *key =3D AESNI_ALIGN(ctx->cipher_data);


+
+ switch (EVP_CIPHER_CTX_mode(ctx)) {
+ case EVP_CIPH_ECB_MODE:

+ aesni_ecb_encrypt(in, out, inl, key, ctx->encrypt);


+ break;
+ case EVP_CIPH_CBC_MODE:

+ aesni_cbc_encrypt(in, out, inl, key,


+ ctx->iv, ctx->encrypt);
+ break;
+ case EVP_CIPH_CFB_MODE:

+ CRYPTO_cfb128_encrypt(in, out, inl, key, ctx->iv,
+ &ctx->num, ctx->encrypt,
+ aesni_encrypt);


+ break;
+ case EVP_CIPH_OFB_MODE:

+ CRYPTO_ofb128_encrypt(in, out, inl, key,
+ ctx->iv, &ctx->num,
+ aesni_encrypt);


+ break;
+ default:
+ return 0;
+ }
+
+ return 1;
+}
+

+#endif /* COMPILE_HW_AESNI */
+#endif /* !defined(OPENSSL_NO_HW) && !defined(OPENSSL_NO_HW_AESNI) && !def=


ined(OPENSSL_NO_AES) */
--- a/crypto/engine/eng_all.c
+++ b/crypto/engine/eng_all.c
@@ -71,6 +71,9 @@ void ENGINE_load_builtin_engines(void)
#if defined(__OpenBSD__) || defined(__FreeBSD__)
ENGINE_load_cryptodev();
#endif

+#if !defined(OPENSSL_NO_HW) && !defined(OPENSSL_NO_HW_AESNI)
+ ENGINE_load_aesni();


+#endif
ENGINE_load_dynamic();
#ifndef OPENSSL_NO_STATIC_ENGINE
#ifndef OPENSSL_NO_HW
--- /dev/null

+++ b/crypto/engine/eng_aesni_asm.pl

+# int aesni_set_encrypt_key(const unsigned char *userKey, const int bits,


+# AES_KEY *key)
+$code.=3D<<___;

+.globl aesni_set_encrypt_key
+.type aesni_set_encrypt_key,\@function,3
+.align 16
+aesni_set_encrypt_key:
+ call _aesni_set_encrypt_key
+ ret
+.size aesni_set_encrypt_key, . - aesni_set_encrypt_key
+
+.type _aesni_set_encrypt_key,\@abi-omnipotent
+.align 16
+_aesni_set_encrypt_key:

+.size _aesni_set_encrypt_key, . - _aesni_set_encrypt_key
+___
+
+
+# int aesni_set_decrypt_key(const unsigned char *userKey, const int bits,


+# AES_KEY *key)
+$code.=3D<<___;

+.globl aesni_set_decrypt_key
+.type aesni_set_decrypt_key,\@function,3
+.align 16
+aesni_set_decrypt_key:
+ call _aesni_set_encrypt_key

+.size aesni_set_encrypt_key, . - aesni_set_encrypt_key
+___
+
+# void aesni_encrypt (const void *inp,void *out,const AES_KEY *key);
+$code.=3D<<___;
+.globl aesni_encrypt
+.type aesni_encrypt,\@function,3
+.align 16
+aesni_encrypt:


+ mov %rdi, $inp
+ mov %rsi, $outp
+ mov %rdx, $keyp
+ mov 240($keyp), $rnds # round count
+ movups ($inp), $state # input

+ call _aesni_encrypt1


+ movups $state, ($outp) # output
+ ret

+.size aesni_encrypt, . - aesni_encrypt
+___
+
+# _aesni_encrypt1: internal ABI


+# input:
+# $keyp: key struct pointer
+# $rnds: round count
+# $state: initial state (input)
+# output:
+# $state: finial state (output)
+# changed:
+# $key
+# $tkeyp ($t1)
+$code.=3D<<___;

+.type _aesni_encrypt1,\@abi-omnipotent
+.align 16
+_aesni_encrypt1:

+.size _aesni_encrypt1, . - _aesni_encrypt1
+___
+
+# _aesni_encrypt4: internal ABI


+# input:
+# $keyp: key struct pointer
+# $rnds: round count
+# $state1: initial state (input)
+# $state2
+# $state3
+# $state4
+# output:
+# $state1: finial state (output)
+# $state2
+# $state3
+# $state4
+# changed:
+# $key
+# $tkeyp ($t1)
+$code.=3D<<___;

+.type _aesni_encrypt4,\@abi-omnipotent
+.align 16
+_aesni_encrypt4:

+.size _aesni_encrypt4, . - _aesni_encrypt4
+___
+
+# void aesni_decrypt (const void *inp,void *out,const AES_KEY *key);
+$code.=3D<<___;
+.globl aesni_decrypt
+.type aesni_decrypt,\@function,3
+.align 16
+aesni_decrypt:


+ mov %rdi, $inp
+ mov %rsi, $outp
+ mov %rdx, $keyp
+ mov 240($keyp), $rnds # round count
+ movups ($inp), $state # input

+ call _aesni_decrypt1


+ movups $state, ($outp) #output
+ ret

+.size aesni_encrypt, . - aesni_encrypt
+___
+
+# _aesni_decrypt1: internal ABI


+# input:
+# $keyp: key struct pointer
+# $rnds: round count
+# $state: initial state (input)
+# output:
+# $state: finial state (output)
+# changed:
+# $key
+# $tkeyp ($t1)
+$code.=3D<<___;

+.type _aesni_decrypt1,\@abi-omnipotent
+.align 16
+_aesni_decrypt1:

+.size _aesni_decrypt1, . - _aesni_decrypt1
+___
+
+# _aesni_decrypt4: internal ABI


+# input:
+# $keyp: key struct pointer
+# $rnds: round count
+# $state1: initial state (input)
+# $state2
+# $state3
+# $state4
+# output:
+# $state1: finial state (output)
+# $state2
+# $state3
+# $state4
+# changed:
+# $key
+# $tkeyp ($t1)
+$code.=3D<<___;

+.type _aesni_decrypt4,\@abi-omnipotent
+.align 16
+_aesni_decrypt4:

+.size _aesni_decrypt4, . - _aesni_decrypt4
+___
+
+# void aesni_ecb_encrypt(const unsigned char *in, unsigned char *out,


+# size_t length, const AES_KEY *key,
+# const int enc);
+$code.=3D<<___;

+.globl aesni_ecb_encrypt
+.type aesni_ecb_encrypt,\@function,5
+.align 16
+aesni_ecb_encrypt:


+ test $len, $len # check length
+ jz .Lecb_just_ret
+ mov %rdi, $inp
+ mov %rsi, $outp
+ mov %r8d, $t1d # clear upper half of enc
+ mov %rcx, $keyp
+ mov 240($keyp), $rnds
+ test $t1, $t1
+ jz .Lecb_decrypt
+#--------------------------- ENCRYPT ------------------------------#
+ cmp \$16, $len
+ jb .Lecb_just_ret
+ cmp \$64, $len
+ jb .Lecb_enc_loop1
+.align 4
+.Lecb_enc_loop4:
+ movups ($inp), $state1
+ movups 0x10($inp), $state2
+ movups 0x20($inp), $state3
+ movups 0x30($inp), $state4

+ call _aesni_encrypt4


+ movups $state1, ($outp)
+ movups $state2, 0x10($outp)
+ movups $state3, 0x20($outp)
+ movups $state4, 0x30($outp)
+ sub \$64, $len
+ add \$64, $inp
+ add \$64, $outp
+ cmp \$64, $len
+ jge .Lecb_enc_loop4
+ cmp \$16, $len
+ jb .Lecb_just_ret
+.align 4
+.Lecb_enc_loop1:
+ movups ($inp), $state1

+ call _aesni_encrypt1


+ movups $state1, ($outp)
+ sub \$16, $len
+ add \$16, $inp
+ add \$16, $outp
+ cmp \$16, $len
+ jge .Lecb_enc_loop1
+ jmp .Lecb_just_ret
+#--------------------------- DECRYPT ------------------------------#
+.Lecb_decrypt:
+ cmp \$16, $len
+ jb .Lecb_just_ret
+ cmp \$64, $len
+ jb .Lecb_dec_loop1
+.align 4
+.Lecb_dec_loop4:
+ movups ($inp), $state1
+ movups 0x10($inp), $state2
+ movups 0x20($inp), $state3
+ movups 0x30($inp), $state4

+ call _aesni_decrypt4


+ movups $state1, ($outp)
+ movups $state2, 0x10($outp)
+ movups $state3, 0x20($outp)
+ movups $state4, 0x30($outp)
+ sub \$64, $len
+ add \$64, $inp
+ add \$64, $outp
+ cmp \$64, $len
+ jge .Lecb_dec_loop4
+ cmp \$16, $len
+ jb .Lecb_just_ret
+.align 4
+.Lecb_dec_loop1:
+ movups ($inp), $state1

+ call _aesni_decrypt1


+ movups $state1, ($outp)
+ sub \$16, $len
+ add \$16, $inp
+ add \$16, $outp
+ cmp \$16, $len
+ jge .Lecb_dec_loop1
+.Lecb_just_ret:
+ ret

+.size aesni_ecb_encrypt, . - aesni_ecb_encrypt
+___
+
+# void aesni_cbc_encrypt (const void char *inp, unsigned char *out,


+# size_t length, const AES_KEY *key,
+# unsigned char *ivp,const int enc);
+$code.=3D<<___;

+.globl aesni_cbc_encrypt
+.type aesni_cbc_encrypt,\@function,6
+.align 16
+aesni_cbc_encrypt:


+ test $len, $len # check length
+ jz .Lcbc_just_ret
+ mov %rdi, $inp
+ mov %rsi, $outp
+ mov %r9d, $t1d # clear upper half of enc
+ mov %rcx, $keyp
+ mov 240($keyp), $rnds
+ test $t1, $t1
+ jz .Lcbc_decrypt
+#--------------------------- ENCRYPT ------------------------------#
+ movups ($ivp), $state # load iv as initial state
+ cmp \$16, $len
+ jb .Lcbc_enc_tail
+.align 4
+.Lcbc_enc_loop:
+ movups ($inp), $in # load input
+ pxor $in, $state

+ call _aesni_encrypt1

+ call _aesni_decrypt4


+ pxor $iv, $state1
+ pxor $in1, $state2
+ pxor $in2, $state3
+ pxor $in3, $state4
+ movaps $in4, $iv
+ movups $state1, ($outp)
+ movups $state2, 0x10($outp)
+ movups $state3, 0x20($outp)
+ movups $state4, 0x30($outp)
+ sub \$64, $len
+ add \$64, $inp
+ add \$64, $outp
+ cmp \$64, $len
+ jge .Lcbc_dec_loop4
+ cmp \$0, $len
+ jz .Lcbc_dec_ret
+ cmp \$16, $len
+ jb .Lcbc_dec_tail
+.align 4
+.Lcbc_dec_loop1:
+ movups ($inp), $in
+ movaps $in, $state

+ call _aesni_decrypt1


+ pxor $iv, $state
+ movups $state, ($outp)
+ movaps $in, $iv
+ sub \$16, $len
+ add \$16, $inp
+ add \$16, $outp
+ cmp \$16, $len
+ jge .Lcbc_dec_loop1
+ test \$0xf, $len
+ jz .Lcbc_dec_ret
+.Lcbc_dec_tail:
+ movups ($inp), $in
+ movaps $in, $state

+ call _aesni_decrypt1


+ pxor $iv, $state
+ movaps $in, $iv
+ sub \$16, %rsp # alloc temporary space
+ movups $state, (%rsp)
+ mov $outp, %rdi
+ mov %rsp, %rsi
+ mov $len, %rcx
+ .long 0x9066A4F3 # rep movsb
+ mov %rsp, %rdi # clear stack
+ mov \$16, %rcx
+ xor %rax, %rax
+ .long 0x9066AAF3 # rep stosb
+ add \$16, %rsp
+.Lcbc_dec_ret:
+ movups $iv, ($ivp)
+.Lcbc_just_ret:
+ ret

+.size aesni_cbc_encrypt, . - aesni_cbc_encrypt


+___
+
+$code.=3D<<___;
+ .long 0x80808080, 0x80808080, 0xfefefefe, 0xfefefefe
+ .long 0x1b1b1b1b, 0x1b1b1b1b, 0, 0

+.asciz "AES for Intel AESNI, CRYPTOGAMS by <ying.huang\@intel.com>"

+ eng_aesni.c eng_aesni_asm.pl


LIBOBJ=3D eng_err.o eng_lib.o eng_list.o eng_init.o eng_ctrl.o \
eng_table.o eng_pkey.o eng_fat.o eng_all.o \
tb_rsa.o tb_dsa.o tb_ecdsa.o tb_dh.o tb_ecdh.o tb_rand.o tb_store.o \
tb_cipher.o tb_digest.o tb_pkmeth.o tb_asnmth.o \
- eng_openssl.o eng_cnf.o eng_dyn.o eng_cryptodev.o
+ eng_openssl.o eng_cnf.o eng_dyn.o eng_cryptodev.o \

+ eng_aesni.o eng_aesni_asm.o


=20
SRC=3D $(LIBSRC)
=20
@@ -45,6 +49,9 @@ lib: $(LIBOBJ)
$(RANLIB) $(LIB) || echo Never mind.
@touch lib
=20

+eng_aesni_asm.s: eng_aesni_asm.pl
+ $(PERL) eng_aesni_asm.pl $(PERLASM_SCHEME) > $@


+
files:
$(PERL) $(TOP)/util/files.pl Makefile >> $(TOP)/MINFO
=20

--- a/crypto/engine/engine.h
+++ b/crypto/engine/engine.h
@@ -346,6 +346,7 @@ void ENGINE_load_gost(void);
#endif
#endif
void ENGINE_load_cryptodev(void);
+void ENGINE_load_aesni(void);
void ENGINE_load_builtin_engines(void);
=20
/* Get and set global flags (ENGINE_TABLE_FLAG_***) for the implementation


--=-soGQPDluQVT96g6xpEN0


Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEABECAAYFAklRqH0ACgkQKhFGF+eHlpj//wCfXhaZppeqi+oxqJvk/9ggBdnh
334AoKT5SA4u6NTvnjMpNf3+Ws30wlpA
=XA9m
-----END PGP SIGNATURE-----

--=-soGQPDluQVT96g6xpEN0--

0 new messages