Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

hardware encryption

67 views
Skip to first unread message

brai...@posteo.net

unread,
Jan 20, 2021, 5:50:02 AM1/20/21
to
this is probably not the proper place for this question but i got'a
start somewhere
this thing about hardware accelerated encryption is a bit of a mystery
to me
some processors advertise it but how do we know if it's being used
is there a way to test if hardware accelerated encryption is being used
or if it's just advertising hipe
if i'm encrypting my data and want to reduce the load on the cpu as much
as possible what processor would be best
maybe someone can point me to the place for such a discussion

Paul Wise

unread,
Jan 20, 2021, 8:30:02 AM1/20/21
to
On Wed, Jan 20, 2021 at 10:40 AM <brai...@posteo.net> wrote:

> hardware accelerated encryption

For Linux kernel crypto stuff (disk encryption, TLS offload etc),
/proc/crypto lists what is available, some of it will be generic stuff
and others will be crypto accelerators on the SoC or elsewhere.

For CPUs with crypto instructions, probably they will be listed in the
CPU flags in /proc/cpuinfo

--
bye,
pabs

https://wiki.debian.org/PaulWise

Diederik de Haas

unread,
Jun 3, 2021, 11:00:02 AM6/3/21
to
On woensdag 20 januari 2021 11:40:26 CEST brai...@posteo.net wrote:
> hardware accelerated encryption is a bit of a mystery to me
> some processors advertise it but how do we know if it's being used
> is there a way to test if hardware accelerated encryption is being used
> or if it's just advertising hype

I very much like to understand this as well.
I have a/several Rock64 devices and it is supposed to have ARMv8 Cryptography
Extensions according to https://wiki.pine64.org/wiki/ROCK64#CPU_Architecture.

Due to bug #976635 several CRYPTO modules got enabled in the 5.10 kernel.
But I don't know whether that's relevant for ARMv8 CE.

https://turecki.net/content/getting-most-out-ssh-hardware-acceleration-tuning-aes-ni
contains a test to check the speed of some crypto operations.
Based on that I've made a procedure which I've now run on several devices:

# adduser test
$ ssh-add (make sure ssh agent is running)
$ ssh-copy-id test@localhost
$ ssh test@localhost (verify key based auth works)
$ exit
$ for i in `ssh -Q cipher`; do dd if=/dev/zero bs=1M count=100 2> /dev/null | \
ssh -c $i test@localhost "(time -p cat) > /dev/null" 2>&1 | grep real | \
awk '{print "'$i': "100 / $2" MB/s" }'; done
$ grep -i -E "(flags|features)" /proc/cpuinfo | tail -n1

On a Rock64 with kernel 5.8.0-1-arm64, I got these results:
aes128-ctr: 45.8716 MB/s
aes192-ctr: 45.6621 MB/s
aes256-ctr: 44.6429 MB/s
aes12...@openssh.com: 49.505 MB/s
aes25...@openssh.com: 48.7805 MB/s
chacha20...@openssh.com: 36.9004 MB/s

Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

But on kernel 5.10.0-7-arm64, with those CRYPTO modules, I got this:
aes128-ctr: 42,735 MB/s
aes192-ctr: 44,4444 MB/s
aes256-ctr: 44,0529 MB/s
aes12...@openssh.com: 48,0769 MB/s
aes25...@openssh.com: 46,0829 MB/s
chacha20...@openssh.com: 37,037 MB/s

Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

If you run the test several times, you'll get slightly different results
each time, so I consider these results the same.

For comparison (I don't remember which kernel version) on Ryzen 7 1800X:
aes128-ctr: 714.286 MB/s
aes192-ctr: 714.286 MB/s
aes256-ctr: 769.231 MB/s
aes12...@openssh.com: 1000 MB/s
aes25...@openssh.com: 1000 MB/s
chacha20...@openssh.com: 294.118 MB/s

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp
lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx
f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse
3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext
perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1
avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1
xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale
vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic
v_vmsave_vmload vgif overflow_recov succor smca

with kernel 5.10.0-7-amd64:
aes128-ctr: 714,286 MB/s
aes192-ctr: 769,231 MB/s
aes256-ctr: 714,286 MB/s
aes12...@openssh.com: 909,091 MB/s
aes25...@openssh.com: 909,091 MB/s
chacha20...@openssh.com: 500 MB/s

very odd that aes192-ctr and aes256-ctr seem to have switched, but the values
are otherwise EXACTLY the same :-O
Very impressive speed improvement with chacha20-poly1305 though :D
(Note that the aforementioned bug report was about arm64, not amd64)

On a RPi2, the values were around 12 MB/s


I don't find the scores of the Rock64 impressive, but that may be because
I've read somewhere that ARMv8 Cryptography Extensions could/should
result in a FACTOR 10 speed improvements with cryptography.

There could be a number of issues here:
1) The 'factor 10' is horseshit
2) The 'factor 10' is true, but it doesn't work on Rock64 (yet?)
3) The 'factor 10' is true and working and without it, the scores would be abysmal.
4) The test is all wrong

If I do 'cat /proc/crypto' I get a long list, but I have no idea what the output means.


So essentially I have the same question as OP.
How can I/we know if it's present and working as intended?
What kind of speed improvement can/should one expect?
What is needed to take advantage of it? Kernel modules and if so, which?
The CRYPTO_XYZ_CE ones? Others? Something else entirely?

Cheers,
Diederik
signature.asc

Jeffrey Walton

unread,
Jun 3, 2021, 11:40:03 AM6/3/21
to
On Wed, Jan 20, 2021 at 5:40 AM <brai...@posteo.net> wrote:
> ...
> this thing about hardware accelerated encryption is a bit of a mystery
> to me
> some processors advertise it but how do we know if it's being used
> is there a way to test if hardware accelerated encryption is being used
> or if it's just advertising hipe

You usually cannot tell when the hardware acceleration is being used.
For most libraries, they don't provide the implementation details.
About all you can do is check CPU availability of the acceleration.

One library that provides the algorithmic details is Crypto++.
Crypto++ is a C++ class library. Classes like AES and SHA have a
member function AlgorithmProvider(). If the C++ implementation is
used, then the string "C++" is returned. If hardware acceleration is
used, then the string will be "AES", "SHA" or "NEON", "ASIMD" or
"ARMv7", depending what is fastest.

I can't tell if you are asking how to check that a hardware
implementation, like AES or SHA acceleration, is actually faster than
C, C++, ASM, etc. For that you have to benchmark the algorithm.

And one thing to be aware of... NEON (ARMv7) and ASIMD (ARMv8) are
like Intel SSE acceleration. Some algorithms slow down when using NEON
or ASIMD. For example, BLAKE2 is fastest when using C or C++ code. If
you use NEON or ASIMD then the code slows down by about 3 cycles per
byte (cpb).[1] The slowdown is due to a slow double-word (64-bit)
shift that can only be issued from one port. That holds for ARM A53's,
A57's and Apple's M1.

[1] https://github.com/weidai11/cryptopp/blob/master/blake2.cpp#L30

> if i'm encrypting my data and want to reduce the load on the cpu as much
> as possible what processor would be best

Efficiency is one reason, but a more important one is side channels.
Using AES acceleration will avoid most side channel attacks.

Once the implementation is correct, then it can be sped-up to be faster :)

Jeff

Jeffrey Walton

unread,
Jun 3, 2021, 12:00:03 PM6/3/21
to
I _think_ OpenSSH uses OpenSSL, not kernel crypto. Or they use that
LibreSSL port of OpenSSL.

To benchmark OpenSSL, you use something like:

# C implementation
openssl speed aes-128-cbc

# Hardware acceleration
openssl speed -evp aes-128-cbc

You can see the difference in the numbers below. Below, I'm on a Core i7-8700.

$ openssl speed aes-128-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 57736814 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 14943316 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 3741357 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 944345 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 118246 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 16384 size blocks: 59132 aes-128 cbc's in 3.00s
OpenSSL 1.1.1f 31 Mar 2020
built on: Wed Apr 28 00:37:28 2021 UTC
...
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes
8192 bytes 16384 bytes
aes-128 cbc 307929.67k 318790.74k 319262.46k 322336.43k
322890.41k 322939.56k

$ openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 186837731 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 64 size blocks: 78857865 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 20276035 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 5088201 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 636732 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 318374 aes-128-cbc's in 3.00s
OpenSSL 1.1.1f 31 Mar 2020
built on: Wed Apr 28 00:37:28 2021 UTC
...
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes
8192 bytes 16384 bytes
aes-128-cbc 999800.57k 1682301.12k 1730221.65k 1736772.61k
1738702.85k 1738746.54k

I don't like OpenSSL output. They should provide Cycle-per-byte (cpb)
since it is mostly independent as a metric when measuring performance.
Jeff

Diederik de Haas

unread,
Jun 3, 2021, 1:40:03 PM6/3/21
to
On donderdag 3 juni 2021 17:52:50 CEST Jeffrey Walton wrote:
> I _think_ OpenSSH uses OpenSSL, not kernel crypto.

If that means that hardware/accelerated crypto is dependent on
the program being used, that would suck

> To benchmark OpenSSL, you use something like:
> # C implementation
> openssl speed aes-128-cbc
> # Hardware acceleration
> openssl speed -evp aes-128-cbc
>
> You can see the difference in the numbers below ... on a Core i7-8700.
>
> $ openssl speed aes-128-cbc
> ...
> OpenSSL 1.1.1f 31 Mar 2020
> built on: Wed Apr 28 00:37:28 2021 UTC
> ...
> The 'numbers' are in 1000s of bytes per second processed.
> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
> aes-128 cbc 307929.67k 318790.74k 319262.46k 322336.43k 322890.41k 322939.56k
>
> $ openssl speed -evp aes-128-cbc
> ...
> The 'numbers' are in 1000s of bytes per second processed.
> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
> aes-128-cbc 999800.57k 1682301.12k 1730221.65k 1736772.61k 1738702.85k 1738746.54k

$ openssl speed aes-128-cbc
...
version: 3.0.0-alpha16
built on: built on: Thu May 6 19:54:38 2021 UTC
...
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 84716.70k 269243.61k 584986.37k 830015.83k 944873.47k 953417.73k

$ openssl speed -evp aes-128-cbc
...
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-128-CBC 95904.58k 297023.53k 611697.15k 855083.69k 966412.97k 956033.71k

At first glance there seems to be some improvement, particular with 16/64 bytes,
but the difference is nowhere near as significant as with you.

But I also tried it a few more times and generally speaking 16/64 bytes saw
higher scores with '-evp', but I've also had higher scores on the larger types
WITHOUT '-evp' ?!?

(Included the version as it was very different; turns out mine if from experimental)


Thanks for your reply,
Diederik
signature.asc

Ryutaroh Matsumoto

unread,
Jun 3, 2021, 3:20:02 PM6/3/21
to
Your Rock64 is significantly faster than my RPi4B. I wonder how such a big
difference appears.

From: Diederik de Haas <didi....@cknow.org>
Date: Thu, 03 Jun 2021 19:34:19 +0200,Thu, 03 Jun 2021 19:34:19 +0200

> $ openssl speed aes-128-cbc
> ...
> version: 3.0.0-alpha16
> built on: built on: Thu May 6 19:54:38 2021 UTC
> ...
> The 'numbers' are in 1000s of bytes per second processed.
> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
> aes-128-cbc 84716.70k 269243.61k 584986.37k 830015.83k 944873.47k 953417.73k
>
> $ openssl speed -evp aes-128-cbc
> ...
> The 'numbers' are in 1000s of bytes per second processed.
> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
> AES-128-CBC 95904.58k 297023.53k 611697.15k 855083.69k 966412.97k 956033.71k

On my RPi4B I have:

# openssl speed aes-128-cbc
...
OpenSSL 1.1.1k 25 Mar 2021
built on: Thu Mar 25 20:49:34 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-YhzaKF/openssl-1.1.1k=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128 cbc 73719.58k 78001.25k 79918.46k 79520.45k 78646.02k 79442.42k
# openssl speed -evp aes-128-cbc
...
OpenSSL 1.1.1k 25 Mar 2021
built on: Thu Mar 25 20:49:34 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-YhzaKF/openssl-1.1.1k=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 37975.41k 40705.82k 41937.97k 42066.56k 42265.07k 42382.97k


Note that openssl version is much older but it is bundled with Debian Bullseye.
Kernel version is upstream 5.10.39 with almost the same kernel compilation options with
Debian RT kernel. CPU frequency is fixed to 1.5GHz by
"cpupower frequency-set -g performance".

Best regards, Ryutaroh

Ryutaroh Matsumoto

unread,
Jun 3, 2021, 3:50:02 PM6/3/21
to
From: Ryutaroh Matsumoto <ryut...@ict.e.titech.ac.jp>
Date: Fri, 04 Jun 2021 04:18:25 +0900 (JST)
> Note that openssl version is much older but it is bundled with Debian Bullseye.

I installed openssl ver. 3 from Debian experimental,
and observed much slower speed than ver. 1.1.1 in Debian Bullseye,
on the same hardware and kernel, as below. Interesting...
I wonder if I should file a bug to Debian BTS...

# openssl speed aes-128-cbc
...
version: 3.0.0-alpha16
built on: built on: Thu May 6 19:54:38 2021 UTC
options:bn(64,64)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-UqeSFN/openssl-3.0.0~~alpha16=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
CPUINFO: OPENSSL_armcap=0x83
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 37858.56k 40995.79k 41736.44k 42339.69k 41984.00k 42350.33k

# openssl speed -evp aes-128-cbc
...
version: 3.0.0-alpha16
built on: built on: Thu May 6 19:54:38 2021 UTC
options:bn(64,64)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-UqeSFN/openssl-3.0.0~~alpha16=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
CPUINFO: OPENSSL_armcap=0x83
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-128-CBC 38057.99k 41038.28k 41973.03k 41930.50k 42233.35k 42308.27k

Best regards, Ryutaroh

Diederik de Haas

unread,
Jun 3, 2021, 5:20:02 PM6/3/21
to
On donderdag 3 juni 2021 21:40:04 CEST Ryutaroh Matsumoto wrote:
> > Note that openssl version is much older but it is bundled with Debian
> > Bullseye.
> I installed openssl ver. 3 from Debian experimental,
> and observed much slower speed than ver. 1.1.1 in Debian Bullseye,
> on the same hardware and kernel, as below. Interesting...
> I wonder if I should file a bug to Debian BTS...

Interesting.
I should probably test with version 1.1.1 too (but likely not tonight).

Upstream seems a better place, but it's ofc useful to also track it in
Debian's BTS.

Cheers,
Diederik
signature.asc

Diederik de Haas

unread,
Jun 3, 2021, 5:20:02 PM6/3/21
to
On donderdag 3 juni 2021 21:18:25 CEST Ryutaroh Matsumoto wrote:
> Your Rock64 is significantly faster than my RPi4B. I wonder how such a big
> difference appears.

IIRC where I read about the 10x speed improvement wrt crypto with
ARM Crypto Extension is where I also read that Broadcom does NOT
have a license to them

> From: Diederik de Haas <didi....@cknow.org>
> Date: Thu, 03 Jun 2021 19:34:19 +0200,Thu, 03 Jun 2021 19:34:19 +0200
>
> > $ openssl speed aes-128-cbc
> > The 'numbers' are in 1000s of bytes per second processed.
> > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
> > aes-128-cbc 84716.70k 269243.61k 584986.37k 830015.83k 944873.47k 953417.73k
> >
> > $ openssl speed -evp aes-128-cbc
> > The 'numbers' are in 1000s of bytes per second processed.
> > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
> > AES-128-CBC 95904.58k 297023.53k 611697.15k 855083.69k 966412.97k 956033.71k
>
> On my RPi4B I have:
> # openssl speed aes-128-cbc
> OpenSSL 1.1.1k 25 Mar 2021
> built on: Thu Mar 25 20:49:34 2021 UTC
> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
> aes-128 cbc 73719.58k 78001.25k 79918.46k 79520.45k 78646.02k 79442.42k
>
> # openssl speed -evp aes-128-cbc
> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
> aes-128-cbc 37975.41k 40705.82k 41937.97k 42066.56k 42265.07k 42382.97k

What I find the most odd is that with '-evp' your scores are much lower then without.
The lack of ARM CE could explain why Rock64's scores are better then one
otherwise would expect, even though the speedup is (still) lower then I expected.

> Note that openssl version is much older

It appears that OpenSSL made a jump from 1.1.1X to 3.0.x.
The most likely reason I had 3.0.0-alpha16 is a YOLO action by me whereby
I upgrade almost everything to experimental (KDE from exp was intentional)

> Kernel version is upstream 5.10.39 with almost the same kernel
> compilation options with Debian RT kernel.

You may want to verify whether the options enabled bc of bug #976635
are enabled with your kernel as well.

> CPU frequency is fixed to 1.5GHz by "cpupower frequency-set -g performance".

I still have to investigate what's possible on Rock64, but my main problem
is heat and bc I have no cooling that causes problems.

> Best regards, Ryutaroh

Cheers,
Diederik
signature.asc

Diederik de Haas

unread,
Jun 4, 2021, 7:50:03 AM6/4/21
to
[Note that I've combined the output of several posts for comparison]
[As a result I've also made several edits for consistency/readability]

On donderdag 3 juni 2021 21:40:04 CEST Ryutaroh Matsumoto wrote:
> From: Ryutaroh Matsumoto <ryut...@ict.e.titech.ac.jp>
> > Note that openssl version is much older ... with Debian Bullseye.
> I installed openssl ver. 3 from Debian experimental,
> and observed much slower speed than ver. 1.1.1 in Debian Bullseye,
> on the same hardware and kernel, as below. Interesting...
>
> # openssl speed aes-128-cbc
> OpenSSL 1.1.1k 25 Mar 2021
> The 'numbers' are in 1000s of bytes per second processed.
> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192bytes 16384 bytes
> aes-128 cbc 73719.58k 78001.25k 79918.46k 79520.45k 78646.02k 79442.42k
> for easy comparison, I'm adding *your* 3.0.0-alpha16 scores directly below
> aes-128-cbc 37858.56k 40995.79k 41736.44k 42339.69k 41984.00k 42350.33k
>
> # openssl speed -evp aes-128-cbc
> The 'numbers' are in 1000s of bytes per second processed.
> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192bytes 16384 bytes
> aes-128-cbc 37975.41k 40705.82k 41937.97k 42066.56k 42265.07k 42382.97k
> for easy comparison, I'm adding *your* 3.0.0-alpha16 scores directly below
> AES-128-CBC 38057.99k 41038.28k 41973.03k 41930.50k 42233.35k 42308.27k

$ openssl speed aes-128-cbc
OpenSSL 1.1.1k 25 Mar 2021
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128 cbc 44008.81k 51444.78k 53902.17k 54553.60k 54730.75k 54717.10k
> for easy comparison, I'm adding my 3.0.0-alpha16 scores directly below
aes-128-cbc 84716.70k 269243.61k 584986.37k 830015.83k 944873.47k 953417.73k

$ openssl speed -evp aes-128-cbc
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 84829.88k 269672.83k 575085.57k 836608.00k 963663.19k 974023.34k
> for easy comparison, I'm adding my 3.0.0-alpha16 scores directly below
AES-128-CBC 95904.58k 297023.53k 611697.15k 855083.69k 966412.97k 956033.71k


This is indeed *quite* interesting!
In your case, the test with OpenSSL 1.1.1k without '-evp' stands (far) out
from your other score in a positive way.
In my case, it is the same combo that stands out in a negative way.


https://openwrt.org/docs/techref/hardware/cryptographic.hardware.accelerators#finding_out_what_s_available_in_the_kernel
is the only page I found wrt /proc/crypto and I do indeed have
several 'skcipher' and 'shash' nodes with prio >= 300.
That article also speaks about /dev/crypto, but I don't have that.

So there's a reasonable chance I indeed do have HW accelerated crypto,
but it doesn't seem to be near '10x' speed improvements.

Thermal issues may also play a role. I noticed that if I did a test
after letting the device idle for a while, so it can cool off (?), did result
in higher scores. The Rock64 tends to get (quite) hot pretty quickly
and that _may_ mean it throttles back quickly. I haven't done any
measuring, so I may be completely off on this.
I'm pretty sure the RPi (4) has had more man-power devoted to this issue.

Cheers,
Diederik
signature.asc

Jeffrey Walton

unread,
Jun 4, 2021, 8:10:02 AM6/4/21
to
> https://openwrt.org/docs/techref/hardware/cryptographic.hardware.accelerators#finding_out_what_s_available_in_the_kernel
> is the only page I found wrt /proc/crypto and I do indeed have
> several 'skcipher' and 'shash' nodes with prio >= 300.
> That article also speaks about /dev/crypto, but I don't have that.

Yeah, kernel crypto is not well documented (in my opinion).

> So there's a reasonable chance I indeed do have HW accelerated crypto,
> but it doesn't seem to be near '10x' speed improvements.

Yeah, you won't see that kind of speedup across all agorithms.

On Aarch64, you will see the following speedups (give or take) over a
quality C implementation:

* AES - 6x
* SHA1 - 3.5x
* SHA2 - 9.5x
* PMULL - 12x

SHA3 is available on ARMv8.2. Apple M1's ship with it. I don't have
benchmark numbers for it (yet).

> Thermal issues may also play a role. I noticed that if I did a test
> after letting the device idle for a while, so it can cool off (?), did result
> in higher scores.

You should probably use an active cooling solution, like a fan.

Then, before your run benchmarks, move the CPU from standby mode to
performance mode. See, for example,
https://github.com/weidai11/cryptopp/blob/master/TestScripts/governor.sh.

Standby mode is a kind of slow start. When in standby mode the first
couple of algorithms you benchmark will be off. By about the third
algorithm, the cpu is no longer in standby mode.

By using performance mode you avoid the slow start that throws off
your benchmarks.

Jeff

Diederik de Haas

unread,
Jun 4, 2021, 4:20:03 PM6/4/21
to
On vrijdag 4 juni 2021 14:05:28 CEST Jeffrey Walton wrote:
> Yeah, kernel crypto is not well documented (in my opinion).

It's a complicated subject with various nuances and if you're not "in the
know", like I am, it's very hard to f.e. qualify/quantify whether having "ARM
CE" is just nice marketing or it is actually substantial.
It could be that the rock64 achieves '6x', but without knowing/understanding
the baseline, that's mostly still meaningless.
F.e. I had expected a greater difference with RPi4 bc it's Broadcom, but if the
base is so much lower, then what I get is still 'good' (but relatively).

> > So there's a reasonable chance I indeed do have HW accelerated crypto,
> > but it doesn't seem to be near '10x' speed improvements.
>
> Yeah, you won't see that kind of speedup across all agorithms.
>
> On Aarch64, you will see the following speedups (give or take) over a
> quality C implementation:
>
> * AES - 6x
> * SHA1 - 3.5x
> * SHA2 - 9.5x
> * PMULL - 12x

Good to know, thanks.

> > Thermal issues may also play a role. I noticed that if I did a test
> > after letting the device idle for a while, so it can cool off (?), did
> > result in higher scores.
>
> You should probably use an active cooling solution, like a fan.

I asked around (on irc:Pine64:/#rock64) and there was one person that used
active cooling, but that was custom made, which is beyond my skill set.
There is an aluminum case available (in Pine64's store) whereby the whole case
is practically a heat sink and that seems to work well, so I'll go for that.

> move the CPU from standby mode to performance mode.

That's still a research area for me to see if and if yes, what could/needs to
be done to keep the device from crashing (due to thermals), while also/still
getting the most out of the device.
Could take quite a while though. If needed would probably be part of a
dedicated 'rock64' thread.

Cheers,
Diederik



signature.asc
0 new messages