Misdetection of MIPS endianness & How to get fast AES calls?

1000 views
Skip to first unread message

Ted Krovetz

unread,
Jul 16, 2010, 2:48:08 PM7/16/10
to Crypto++ Users
Hello,

A few things:

1) I compiled Crypto++ on a little-endian MIPS machine under debian
squeeze and encountered problems. First, config.h assumes anything
that defines __mips__ is big endian. I recompiled with -
DIS_LITTLE_ENDIAN, and all was well.

In my own programming, I've given up trying to determine host
endianness using the preprocessor. Instead I declare

const union { unsigned x; unsigned char endian; } little = { 1 };

in each function that I care, and use little.endian as a guard in
conditionals. Since it's constant, the compiler optimizes away the
untaken branches.

2) the GNUmakefile didn't detect GCC42_OR_LATER. g++ --version returns
"g++ (Debian 4.4.4-6) 4.4.4" on its first line, which doesn't match
the grep regex. Not much depended on this macro, so I hand-changed the
makefile.

3) My biggest problem is trying to get good performance with
individual AES block encryptions. In the Crypto++ API I could only
find AES::ProcessBlock() as the method for enciphering single AES
blocks. However, this call appears to entail so much overhead that
performance is poor. (My OCB implementation is peaking at around 23
cycles per byte while it theoretically should be closer to 13 or 14
cpb since CTR is around 11.) Is there a higher-performance interface
to the raw block cipher?

Thanks. Good work.

-Ted

Wei Dai

unread,
Jul 16, 2010, 7:29:36 PM7/16/10
to Ted Krovetz, Crypto++ Users
Thanks for your report, Ted.

> 1) I compiled Crypto++ on a little-endian MIPS machine under debian
> squeeze and encountered problems. First, config.h assumes anything
> that defines __mips__ is big endian. I recompiled with -
> DIS_LITTLE_ENDIAN, and all was well.

In SVN, this has already been fixed by using __MIPSEB__ instead.

> 2) the GNUmakefile didn't detect GCC42_OR_LATER. g++ --version returns
> "g++ (Debian 4.4.4-6) 4.4.4" on its first line, which doesn't match
> the grep regex. Not much depended on this macro, so I hand-changed the
> makefile.

I changed the regex to "\((Debian|GCC\)) (4.[2-9]|[5-9])", but I wonder if
other Linux distributions also modify the GCC version output.

> 3) My biggest problem is trying to get good performance with
> individual AES block encryptions. In the Crypto++ API I could only
> find AES::ProcessBlock() as the method for enciphering single AES
> blocks. However, this call appears to entail so much overhead that
> performance is poor. (My OCB implementation is peaking at around 23
> cycles per byte while it theoretically should be closer to 13 or 14
> cpb since CTR is around 11.) Is there a higher-performance interface
> to the raw block cipher?

Is this on x86/x64? If so, you can use AdvancedProcessBlocks (search for it
in cryptlib.h), which will give you a big performance boost, since it
encrypts multiple blocks with only one set of overhead. That function will
let you choose to XOR the input or output to AES with something, but not
both (which is what OCB calls for) but you can do the second XOR yourself.
But I just recalled that there is currently no assembly code for AES block
decryption so OCB decryption in Crypto++ would be pretty slow right now.

BTW, I just did an implementation of AES and GCM using the new AES-NI and
CLMUL instructions, and got 3.5 cpb for AES-GCM, of which 1.4 is for AES-CTR
and 2.1 is for GMAC. Looking at the OCB description, it seems like it should
be possible for AES-OCB to clock in at less than 2 cpb. (Too bad OCB is
patented.)

Reply all
Reply to author
Forward
0 new messages