Hardware acceleration for libsrtp

1,051 views
Skip to first unread message

Bobby Mah

unread,
Oct 15, 2019, 12:14:50 PM10/15/19
to discuss-webrtc
Hello all!

I notice that libsrtp does not do hardware acceleration. Are there any plans to support it? Perhaps its already supported?

Thanks

Dubois, Sean

unread,
Oct 15, 2019, 12:19:58 PM10/15/19
to discuss...@googlegroups.com

Hey Bobby,

 

https://bugs.chromium.org/p/chromium/issues/detail?id=713701#c22

 

I have also brought this up in the IETF/W3C but haven’t had much luck either.

 

IMO overnight this would mean a dramatic improvement for underpowered devices, high client CPU usage is in the top 5 complaints I get from people building this with Pion WebRTC. If anyone has any idea on how to get this jump started/escalated I would be so grateful 😊

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/89224011-d327-447a-b61e-04f33e1417d8%40googlegroups.com.

Harald Alvestrand

unread,
Oct 16, 2019, 8:58:17 AM10/16/19
to discuss...@googlegroups.com
I stopped worrying about libsrtp performance when I found that the time needed for encrypting a video frame on a regular workstation was on the order of microseconds.
In most video-using apps, most of the CPU is consumed by the video encoder, with the decoder a very clear second - even if hardware supported, lots of power goes here.
It's fairly rare for people to have apps where a lot of the time is spent in encrypt/decrypt.

But libsrtp is an open-source project, so if someone who actually has the issue wants to contribute patches to support the hardware of the platform they're running on, I'm sure it will be welcome.


Sean DuBois

unread,
Oct 16, 2019, 11:55:10 AM10/16/19
to discuss...@googlegroups.com
Hey Harald,

What brought it my attention was a user trying to do SFUs at a very large scale. I then started telling Pion users about it and people have been enabling it locally and telling me it has made a big difference on some cheaper devices they have.

libsrtp already supports it. Firefox <-> Firefox already has it enabled. All the dev work is done in Chrome, someone just needs to turn it on.

Harald Alvestrand

unread,
Oct 16, 2019, 12:05:43 PM10/16/19
to discuss...@googlegroups.com
Can you file a bug with instructions on how to turn it on?


Philipp Hancke

unread,
Oct 16, 2019, 12:07:43 PM10/16/19
to discuss...@googlegroups.com

Harald Alvestrand

unread,
Oct 16, 2019, 12:31:07 PM10/16/19
to discuss...@googlegroups.com
Ah, didn't recognize it as the same - that one is about AES-GCM, which uses less CPU but bloats the packets, while the thread here was about hardware support for cipher (which I don't know anything about).

Bobby Mah

unread,
Oct 16, 2019, 12:37:02 PM10/16/19
to discuss-webrtc

Thanks all for the information

Harald, in my scenario there's no encoding and SRTP encryption and decryption does take up 20% of CPU

Sean, thanks for the information about AES GCM. 

But, I was requesting information about hardware enabled crypto, so not sure if my question was answered.

Thanks again everyone!

Sean DuBois

unread,
Oct 16, 2019, 12:50:28 PM10/16/19
to discuss...@googlegroups.com
Always happy to help! Feel free to reach out directly or on the Pion slack if you have questions better asked real-time :)

AES GCM is what you want if you want hardware acceleration, no other suites support it AFAIK!

How much of a issue is 6 more bytes? Would love to know what that actually looks like for the ‘average real world’ call. 

On Oct 16, 2019, at 09:37, Bobby Mah <bits...@gmail.com> wrote:


--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Justin Uberti

unread,
Oct 16, 2019, 1:16:04 PM10/16/19
to discuss-webrtc
6 bytes/packet is a real issue for a meaningful fraction of real-world calls, but I think that's largely besides the point - the plan in https://bugs.chromium.org/p/chromium/issues/detail?id=713701 is to support GCM as a supported but not default mode, so applications can control whether those bytes are an acceptable tradeoff.

Right now this bug is simply waiting for someone to have cycles to do the work associated with rolling this out.

Bobby Mah

unread,
Oct 16, 2019, 1:21:39 PM10/16/19
to discuss-webrtc
Sean,

"AES GCM is what you want if you want hardware acceleration, no other suites support it AFAIK!"

Thats excellent, good to know. Thanks for the slack offer, my questions are not real time yet... :)

Perhaps you may know this already, if Chrome was to be offered a call with the offering SDP containing crypto attribute AES GCM, will it be able to answer the call with AES GCM? 

I will try to test it next week 


Dubois, Sean

unread,
Oct 16, 2019, 1:31:04 PM10/16/19
to discuss...@googlegroups.com

When https://bugs.chromium.org/p/chromium/issues/detail?id=713701 is resolved Chrome will be able to use SRTP AES-GCM! You can test this today by enabling the experiment.

 

Chrome <-> Chrome will never use it. There is no W3C API to influence SRTP cipher suite order. The non-accelerated suites will be the default.

 

You can see what suites are being used today by using wireshark and looking at the DTLS ClientHello/ServerHello use_srtp values. The ClientHello is all the suites supported by the client, the ServerHello then contains the suite that was selected.

 

From: <discuss...@googlegroups.com> on behalf of Bobby Mah <bits...@gmail.com>


Reply-To: <discuss...@googlegroups.com>
Date: Wednesday, October 16, 2019 at 10:21 AM
To: discuss-webrtc <discuss...@googlegroups.com>

--


---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Bobby Mah

unread,
Oct 16, 2019, 1:43:12 PM10/16/19
to discuss-webrtc
Sean,

Thanks! will do

Bobby Mah

unread,
Oct 22, 2019, 3:35:21 PM10/22/19
to discuss-webrtc
Hi Sean,

I tried the chrome flags but here is what I see in the client and server `Hellos` respectively

Client hello

clienthello.png


Server hello

serverhello.png


As you can see its not choosing the GCM. Perhaps its an issue in the server and I am looking into that. Its just that I expected the chrome browser client to send GCM as the first cipher suite


Dubois, Sean

unread,
Oct 22, 2019, 3:40:27 PM10/22/19
to discuss...@googlegroups.com

Hey Bobby,

 

You will want to look at the use_srtp extension value. Inserted a screenshot so you can see what it looks like.

 

 

 

 

From: <discuss...@googlegroups.com> on behalf of Bobby Mah <bits...@gmail.com>
Reply-To: <discuss...@googlegroups.com>
Date: Tuesday, October 22, 2019 at 12:35 PM
To: discuss-webrtc <discuss...@googlegroups.com>
Subject: [discuss-webrtc] Re: Hardware acceleration for libsrtp

 

Hi Sean,

 

I tried the chrome flags but here is what I see in the client and server `Hellos` respectively

Client hello

 

Server hello

 

As you can see its not choosing the GCM. Perhaps its an issue in the server and I am looking into that. Its just that I expected the chrome browser client to send GCM as the first cipher suite

 

--


---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Bobby Mah

unread,
Oct 22, 2019, 4:32:00 PM10/22/19
to discuss-webrtc
Odd i dont see that

extension_srtp_missing.png


Roman Shpount

unread,
Oct 22, 2019, 4:40:13 PM10/22/19
to discuss-webrtc
As far as I know all AES suites, including AES_CM_128_HMAC_SHA1_80/AES_CM_128_HMAC_SHA1_32 should be supported by AES-NI and OpenSSL. Are you sure you are not missing some compliation setting that prevent you from using assembler optimized encryption?


On Wednesday, October 16, 2019 at 12:50:28 PM UTC-4, Dubois, Sean wrote:
Always happy to help! Feel free to reach out directly or on the Pion slack if you have questions better asked real-time :)

AES GCM is what you want if you want hardware acceleration, no other suites support it AFAIK!

How much of a issue is 6 more bytes? Would love to know what that actually looks like for the ‘average real world’ call. 

On Oct 16, 2019, at 09:37, Bobby Mah <bits...@gmail.com> wrote:



Thanks all for the information

Harald, in my scenario there's no encoding and SRTP encryption and decryption does take up 20% of CPU

Sean, thanks for the information about AES GCM. 

But, I was requesting information about hardware enabled crypto, so not sure if my question was answered.

Thanks again everyone!

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss...@googlegroups.com.

Philipp Hancke

unread,
Oct 22, 2019, 4:42:42 PM10/22/19
to discuss...@googlegroups.com
the issue seems to be that the hash for the mac is not accelerated. That GCM doesn't compute the mac is the most credible theory for the numbers Sean supplied in the issue.

To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/eb9e50b6-e959-4943-8990-14d64080a6a9%40googlegroups.com.

Bobby Mah

unread,
Oct 22, 2019, 4:45:53 PM10/22/19
to discuss-webrtc
Hi Roman!

Its possible that I am missing a compiler option. I used https://github.com/sourcey/webrtc-builds. If you know the option I can figure out how to pass it in.

Thanks

Roman Shpount

unread,
Oct 22, 2019, 4:58:51 PM10/22/19
to discuss-webrtc
I think this is libsrtp implementation issue: OpenSSL includes EVP_aes_128_cbc_hmac_sha1 (see https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aesni-sha1-x86_64.pl for more details), which combines AES and SHA1 HMAC calculation, but it is not used by libsrtp.

Justin Uberti

unread,
Oct 22, 2019, 6:41:46 PM10/22/19
to discuss-webrtc
I would focus on what Sean mentioned, the missing use_srtp extension. That's what actually determines the cryptosuite used for SRTP.

To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/ef482bba-67fd-4427-8bb7-247ef152d056%40googlegroups.com.

Roman Shpount

unread,
Oct 22, 2019, 7:01:11 PM10/22/19
to discuss-webrtc
There is a good chance that performance of AES_CM_128_HMAC_SHA1_80 in libsrtp is about half of what it should be if it were calculating AES and HMAC at the same time. So, before switching the cryptosuite, it could be possible to significantly improve SRTP performance by using a better optimized routine.

Bobby Mah

unread,
Oct 22, 2019, 11:30:47 PM10/22/19
to discuss-webrtc
Hey Sean,

I dont see the use_srtp anywhere in the clienthello. Is there a way to enable it?

Roman Shpount

unread,
Oct 23, 2019, 6:58:56 PM10/23/19
to discuss-webrtc
Browser should send it automatically. Make sure you are looking at the right handshake (i.e. DTLS handshake over UDP which is used for media, not TLS handshake over TCP which is used for web pages, web sockets, etc).

Harald Alvestrand

unread,
Oct 24, 2019, 4:33:37 AM10/24/19
to discuss...@googlegroups.com
I'm not sure what people are testing here.

FWIW - we have said that we want to enable the use of GCM_SHA384 (I think that's the right name) with people who want to use it.
We have NOT said that we're going to make Chrome choose that by default - the impact on small-packet audio calls is probably still worrisome, if I understand correctly how this thing hangs together.

Thus - if you're sending the Client Hello from Chrome, it's the server's responsibility to respond with a Server_Hello picking GCM_SHA384 - and that server won't be Chrome.

If you're sending the Client Hello from non-Chrome, the only way you'll get Chrome to choose GCM_SHA384 is to omit all the cryptosuites that Chrome would prefer using.

(Disclaimer: I haven't started looking into whether we have a flag that would change Chrome's response behavior in WebRTC yet, so what I say above might not be all the truth.)

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/fe12c34a-0ca5-41f4-a0b5-ed78a566758d%40googlegroups.com.

Bobby Mah

unread,
Oct 24, 2019, 12:44:39 PM10/24/19
to discuss-webrtc
Thanks Roman, I found it earlier, was looking for the wrong type.

Harald, thanks for your comments. My plan was to respond with a "ServerHello" with GCM_SHA384. I was eventually planning to omit all cryptosuites and send a "ClientHello" to Chrome with GCM_SHA384. I just needed to test quickly to verify that GCM_SHA384 would be executed using the AES-NI and AVX CPU instruction set reducing the overall load on the CPU. There is a flag that can be set in chrome, #enable-webrtc-srtp-aes-gcm. I ran my test and noticed that the CPU was still running higher than expected. I thus ran my code using "perf" and the top most function in the per report was...

# Samples: 2K of event 'cycles:ppp'
# Event count (approx.): 528404110
#
# Overhead  Command          Shared Object        Symbol
# ........  ...............  ...................  ..........................................................................................................................................................
#
     7.24%  Thread 0x0x7f30  auction_unit_tests   [.] _x86_64_AES_encrypt_compact
     1.75%  Thread 0x0x7f30  auction_unit_tests   [.] gcm_ghash_4bit

I noticed that there is a function called "gcm_ghash_avx". I would hope that along with the AES-NI functions would be used from boringssl. 


Roman Shpount

unread,
Oct 24, 2019, 1:04:41 PM10/24/19
to discuss-webrtc
Bobby,

I am not familiar with internals of BoringSSL, but I think AES_encrypt_compact is not hardware optimized. 

Justin Uberti

unread,
Oct 24, 2019, 7:54:04 PM10/24/19
to discuss-webrtc
Sounds like you are at least negotiating a GCM crypto suite, which is good. Can you confirm by reading out 'googSrtpCipher' from PeerConnection.getStats? (or chrome://webrtc-internals).

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Bobby Mah

unread,
Oct 24, 2019, 8:55:57 PM10/24/19
to discuss-webrtc
Justin,

I tried printing the "getStats()", however I do not see any key by the name "googSrtpCipher"

Thanks

Bobby Mah

unread,
Oct 24, 2019, 9:19:37 PM10/24/19
to discuss-webrtc
I checked using this as well http://webrtc.github.io/samples/src/content/peerconnection/constraints/

but even that does not show 'googSrtpCipher'

Bobby Mah

unread,
Oct 24, 2019, 9:28:00 PM10/24/19
to discuss-webrtc
From Wireshark in the "ServerHello" I see this

Screen Shot 2019-10-24 at 6.26.07 PM.png


Philipp Hancke

unread,
Oct 25, 2019, 2:29:26 AM10/25/19
to discuss...@googlegroups.com
on top of chrome://webrtc-internals you'll see a dropdown "Read stats from" -- choos the "legacy non-standard" (but *more useful*) option, then search for googComponent in the stats table.

I've filed https://bugs.chromium.org/p/chromium/issues/detail?id=1018077 since at least srtpCipher should be readily available.

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Bobby Mah

unread,
Oct 25, 2019, 11:36:38 AM10/25/19
to discuss-webrtc
Philipp, thanks for your help. Here is what it shows:

Screen Shot 2019-10-25 at 8.35.06 AM.png


Bobby Mah

unread,
Oct 25, 2019, 12:37:33 PM10/25/19
to discuss-webrtc
As my post shows that the srtp cipher is GCM. However, it's still not using the hardware support for crypto in the CPU. Any ideas why?

Justin Uberti

unread,
Oct 27, 2019, 5:12:06 PM10/27/19
to discuss-webrtc
srtpCipher looks good, so yeah, you just want to figure out why the NI version of the AES encrypt routine isn't being invoked.  Suggest setting a breakpoint in srtp_aes_gcm_openssl_encrypt and tracing downward from there.

On Fri, Oct 25, 2019 at 9:37 AM Bobby Mah <bits...@gmail.com> wrote:
As my post shows that the srtp cipher is GCM. However, it's still not using the hardware support for crypto in the CPU. Any ideas why?

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Bobby Mah

unread,
Oct 27, 2019, 6:45:46 PM10/27/19
to discuss-webrtc
Thanks Justin, will do

Bobby Mah

unread,
Oct 28, 2019, 8:17:42 PM10/28/19
to discuss-webrtc
Hi Justin,

Looks like the issue is when this function hwaes_capable gets called in third_party/boringssl/src/crypto/fipsmodule/aes/internal.h and returns 0

#if !defined(OPENSSL_NO_ASM)
   29
   30 #if defined(OPENSSL_X86_64)
   31 #define HWAES
   32 #define HWAES_ECB
   33
   34 static int hwaes_capable(void) {
   35   //printf("hwaes_capable x86_64 %d\n", OPENSSL_ia32cap_P[1]);
   36   return (OPENSSL_ia32cap_P[1] & (1 << (57 - 32))) != 0;
   37 }

OPENSSL_ia32cap_P[1] is 0

I checked my machine and it does support AES and AVX

grep -o aes /proc/cpuinfo | wc -l
4
grep -o avx /proc/cpuinfo | wc -l
8

Justin Uberti

unread,
Oct 29, 2019, 12:35:52 AM10/29/19
to discuss-webrtc
are you calling OPENSSL_init_ssl? That should ensure OPENSSL_cpuid_setup is called, which fills in those cap bits.

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Bobby Mah

unread,
Oct 29, 2019, 11:09:34 AM10/29/19
to discuss-webrtc
Thanks for your response Justin.

I will try that. I wasn't aware that my program would have to explicitly call that. I thought BoringSSL would do so behind the scenes

Bobby Mah

unread,
Oct 29, 2019, 3:46:39 PM10/29/19
to discuss-webrtc
That worked out nicely Justin, CPU usage went from 40% to 25%.

Thanks so much for your help Justin/

Thanks to everyone else who contributed too!

Sean DuBois

unread,
Oct 29, 2019, 3:59:16 PM10/29/19
to discuss-webrtc
That is fantastic news! Exciting to see how much of an impact it has.

This is the best thread I have seen on this list in a while, multiple people helping out really great stuff :)

Justin Uberti

unread,
Oct 29, 2019, 7:04:40 PM10/29/19
to discuss-webrtc
👊

Bobby, can you share more detail on your numbers? This might be useful for others to know.



--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Roman Shpount

unread,
Oct 29, 2019, 7:40:26 PM10/29/19
to discuss-webrtc
Since it seems to be a group concerned with SRTP performance, is there any interest in trying to optimize AES-CBC-SHA1 and AES-GCM? 


Also, SFU sending the same data to multiple end points using different encryption keys would greatly benefit from multi-buffer support (https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/communications-ia-multi-buffer-paper.pdf and https://github.com/intel/intel-ipsec-mb). This can likely result in over 2x performance improvement for both AES-CBC-SHA1 and AES-GCM.


On Tuesday, October 29, 2019 at 7:04:40 PM UTC-4, Justin Uberti wrote:
👊

Bobby, can you share more detail on your numbers? This might be useful for others to know.



On Tue, Oct 29, 2019 at 12:59 PM Sean DuBois <se...@pion.ly> wrote:
That is fantastic news! Exciting to see how much of an impact it has.

This is the best thread I have seen on this list in a while, multiple people helping out really great stuff :)

On Tuesday, October 29, 2019 at 12:46:39 PM UTC-7, Bobby Mah wrote:
That worked out nicely Justin, CPU usage went from 40% to 25%.

Thanks so much for your help Justin/

Thanks to everyone else who contributed too!

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss...@googlegroups.com.

Dubois, Sean

unread,
Oct 29, 2019, 7:56:07 PM10/29/19
to discuss...@googlegroups.com

For SFU performance I am most excited by https://tools.ietf.org/html/draft-ietf-perc-srtp-ekt-diet-10 I have no contact with the authors, but would love to support it with Pion. Maybe I will reach out directly

 

I am all for improving the performance of AES-CBC-SHA1, but I hit some roadblocks and decided it wasn’t worth the effort (and pushing for AES-GCM was more realistic)

 

* Most of the developers I have worked with are sticking with older stuff for stability reasons, so even if things land it will take a while for them to get it. Hardware Accelerated AES-GCM works today.

* Security reviewers see AES-CBC-SHA1 and instantly say it is a problem. It might not be fair, but this has come up at multiple companies. They just are ticking box and AES-GCM wins every time.

* This isn’t just a OpenSSL/libsrtp thing. Go doesn’t have a performant primitive for AES-CBC-SHA1, but AES-GCM is hardware accelerated and available everywhere today.

 

If there was a clear path forward, and someone was paying me always up for an adventure! 😊

To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/a9a74988-8812-4a4d-9d99-f5349759e27c%40googlegroups.com.

Justin Uberti

unread,
Oct 29, 2019, 9:43:20 PM10/29/19
to discuss-webrtc
Let's write up the quantified perf gains for GCM and then we can go see if someone wants to take another look at CTR perf.

Bobby Mah

unread,
Oct 29, 2019, 9:56:10 PM10/29/19
to discuss-webrtc
Hi all!

Here are some of the numbers from "perf" after running it for 20 seconds on both scenarios

Non-hardware based crypto:

# Overhead  Command          Shared Object         Symbol
# ........  ...............  ....................  ............................................................
    13.00%  Thread 0x0x4597  auction_unit_tests    [.] _x86_64_AES_encrypt_compact
     0.36%  Thread 0x0x4597  auction_unit_tests    [.] aes_nohw_encrypt
     0.12%  Thread 0x0x4597  auction_unit_tests    [.] aes_gcm_ctrl
     0.10%  Thread 0x0x4597  auction_unit_tests    [.] AES_encrypt
     0.05%  Thread 0x0x4597  auction_unit_tests    [.] aes_gcm_cipher
     0.05%  Thread 0x0x4597  auction_unit_tests    [.] aes_gcm_init_key
     0.03%  Thread 0x0x4597  auction_unit_tests    [.] srtp_aes_gcm_openssl_set_iv
     0.03%  Thread 0x0x4597  auction_unit_tests    [.] srtp_aes_gcm_openssl_decrypt
     0.02%  Thread 0x0x4597  auction_unit_tests    [.] srtp_aes_gcm_openssl_set_aad
     0.01%  Thread 0x0x4597  auction_unit_tests    [.] srtp_aes_gcm_openssl_get_tag
     0.01%  Thread 0x0x4597  auction_unit_tests    [.] srtp_aes_gcm_openssl_encrypt
     0.00%  Thread 0x0x4597  auction_unit_tests    [.] _x86_64_AES_set_encrypt_key


Hardware based crypto:

# Overhead  Command          Shared Object         Symbol
# ........  ...............  ....................  ............................................................
     0.44%  Thread 0x0x4717  auction_unit_tests    [.] _aesni_ctr32_ghash_6x
     0.15%  Thread 0x0x4717  auction_unit_tests    [.] aes_gcm_ctrl
     0.14%  Thread 0x0x4717  auction_unit_tests    [.] aes_hw_encrypt
     0.12%  Thread 0x0x4717  auction_unit_tests    [.] aes_gcm_cipher
     0.09%  Thread 0x0x4717  auction_unit_tests    [.] aes_hw_ctr32_encrypt_blocks
     0.07%  Thread 0x0x4717  auction_unit_tests    [.] _aesni_ctr32_6x
     0.06%  Thread 0x0x4717  auction_unit_tests    [.] aes_gcm_init_key
     0.04%  Thread 0x0x4717  auction_unit_tests    [.] aesni_gcm_encrypt
     0.03%  Thread 0x0x4717  auction_unit_tests    [.] srtp_aes_gcm_openssl_set_iv
     0.03%  Thread 0x0x4717  auction_unit_tests    [.] aesni_gcm_decrypt
     0.01%  Thread 0x0x4717  auction_unit_tests    [.] srtp_aes_gcm_openssl_set_aad
     0.01%  Thread 0x0x4717  auction_unit_tests    [.] srtp_aes_gcm_openssl_decrypt
     0.01%  Thread 0x0x4717  auction_unit_tests    [.] _aesni_encrypt8
     0.01%  Thread 0x0x4717  auction_unit_tests    [.] srtp_aes_gcm_openssl_encrypt
     0.00%  Thread 0x0x4717  auction_unit_tests    [.] srtp_aes_gcm_openssl_get_tag

Roman Shpount

unread,
Oct 30, 2019, 2:06:07 PM10/30/19
to discuss-webrtc
Inline


On Tuesday, October 29, 2019 at 7:56:07 PM UTC-4, Dubois, Sean wrote:

For SFU performance I am most excited by https://tools.ietf.org/html/draft-ietf-perc-srtp-ekt-diet-10 I have no contact with the authors, but would love to support it with Pion. Maybe I will reach out directly

 
This proposal is definitely interesting, especially since it allows to enable end-to-end encryption through SFU. The big question if and when it is going to be implemented.

I am all for improving the performance of AES-CBC-SHA1, but I hit some roadblocks and decided it wasn’t worth the effort (and pushing for AES-GCM was more realistic)

  
Do you mind sharing what roadblocks did you hit?

* Most of the developers I have worked with are sticking with older stuff for stability reasons, so even if things land it will take a while for them to get it. Hardware Accelerated AES-GCM works today.


It works with recent versions of libsrtp when it is correctly compiled with OpenSSL or BoringSSL and you force correct encryption profile to be negotiated. A lot of developers I work this get it wrong. Adding a new crypto should not be a big hurdle, especially if you consider interop with older implementations.

* Security reviewers see AES-CBC-SHA1 and instantly say it is a problem. It might not be fair, but this has come up at multiple companies. They just are ticking box and AES-GCM wins every time.


AES-CBC-SHA1 is old, but it is also the most commonly deployed. At least for me, it is not going to go away for the next 5-10 years due to legacy IP phone support.
 

* This isn’t just a OpenSSL/libsrtp thing. Go doesn’t have a performant primitive for AES-CBC-SHA1, but AES-GCM is hardware accelerated and available everywhere today.


As far as I know crypto in Go is just a set of assembler functions. It should not be difficult to write or contribute a new one to implement optimized AES-CBC-SHA1 

P.S. I was also talking about multi-buffer AES-GCM implementation. Even hardware accelerated AES-GCM has some room to improve. 

pablo platt

unread,
Oct 30, 2019, 2:34:12 PM10/30/19
to discuss...@googlegroups.com
Roman,

Do you know how performance of AES-CBC-SHA1 compares to AES-GCM on a machine that has SHA hardware extensions like AMD EPYC and the upcoming Intel Ice Lake?

Is it possible to use both stitching and multi-buffer or do I need to choose one?

This [1] looks like a command line example for stitching. Does OpenSSL has somthing like the EVP_ functions for stitching I can use in my project?




--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Roman Shpount

unread,
Oct 30, 2019, 3:06:26 PM10/30/19
to discuss-webrtc
Inline:


On Wednesday, October 30, 2019 at 2:34:12 PM UTC-4, pablo wrote:
Do you know how performance of AES-CBC-SHA1 compares to AES-GCM on a machine that has SHA hardware extensions like AMD EPYC and the upcoming Intel Ice Lake?
 
Not sure. Even with stitching with no SHA1 hardware optimization, AES128-CBC-SHA1 is roughly the same speed or faster then AES256-GCM. It is, however, less secure.

Is it possible to use both stitching and multi-buffer or do I need to choose one?
 
Yes, it is possible to use both. When you are using multi-buffer, you will need to encode multiple packets at the same time to achieve optimal performance. When you combine both stitching and multiple buffers, then you achieve optimal performance with a fewer streams.

Multi-buffer can also be used with AES-GCM where you do not need stitching.


This [1] looks like a command line example for stitching. Does OpenSSL has somthing like the EVP_ functions for stitching I can use in my project?

If does -- EVP_aes_128_cbc_hmac_sha1. You will need to update libsrtp to use it or write your own SRTP implementation to use it.

pablo platt

unread,
Oct 30, 2019, 5:45:50 PM10/30/19
to discuss...@googlegroups.com
Seems like  EVP_aes_128_cbc_hmac_sha1 is not intended for public use [1]:
WARNING: this is not intended for usage outside of TLS and requires calling of some undocumented ctrl functions. These ciphers do not conform to the EVP AEAD interface.


--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

pablo platt

unread,
Oct 30, 2019, 6:16:36 PM10/30/19
to discuss...@googlegroups.com
aes-128-gcm gives me 5.5 more 1024 size blocks per second than aes-128-cbc-hmac-sha1.
Tested on a cloud VM with 1 skylake vCPU.
sha hardware acceleration might close the gap.

openssl speed -evp aes-128-cbc-hmac-sha1
Doing aes-128-cbc-hmac-sha1 for 3s on 16 size blocks: 47865664 aes-128-cbc-hmac-sha1's in 2.98s
Doing aes-128-cbc-hmac-sha1 for 3s on 64 size blocks: 17255780 aes-128-cbc-hmac-sha1's in 2.99s
Doing aes-128-cbc-hmac-sha1 for 3s on 256 size blocks: 5719920 aes-128-cbc-hmac-sha1's in 2.98s
Doing aes-128-cbc-hmac-sha1 for 3s on 1024 size blocks: 1706089 aes-128-cbc-hmac-sha1's in 2.98s
Doing aes-128-cbc-hmac-sha1 for 3s on 8192 size blocks: 232634 aes-128-cbc-hmac-sha1's in 2.99s
Doing aes-128-cbc-hmac-sha1 for 3s on 16384 size blocks: 116536 aes-128-cbc-hmac-sha1's in 2.99s
OpenSSL 1.1.1  11 Sep 2018
built on: Thu Jun 20 17:36:28 2019 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-cn9tZy/openssl-1.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type                                16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc-hmac-sha1   256996.85k   369354.49k   491375.68k   586253.40k   637370.48k   638570.51k

openssl speed -evp aes-128-gcm
Doing aes-128-gcm for 3s on 16 size blocks: 77821813 aes-128-gcm's in 2.98s
Doing aes-128-gcm for 3s on 64 size blocks: 48905347 aes-128-gcm's in 2.98s
Doing aes-128-gcm for 3s on 256 size blocks: 24037056 aes-128-gcm's in 2.98s
Doing aes-128-gcm for 3s on 1024 size blocks: 9516449 aes-128-gcm's in 2.98s
Doing aes-128-gcm for 3s on 8192 size blocks: 1490621 aes-128-gcm's in 2.99s
Doing aes-128-gcm for 3s on 16384 size blocks: 760296 aes-128-gcm's in 2.99s
OpenSSL 1.1.1  11 Sep 2018
built on: Thu Jun 20 17:36:28 2019 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-cn9tZy/openssl-1.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type                 16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-gcm     417835.24k  1050316.18k  2064928.30k  3270081.80k  4084002.42k  4166116.94k

Roman Shpount

unread,
Oct 30, 2019, 8:04:22 PM10/30/19
to discuss-webrtc
I am comparing aes-128-cbc-hmac-sha1 vs  aes-256-gcm and seeing about 2x speed-up:

Block

aes-128-cbc-hmac-sha1

aes-256-gcm

Speedup

16

                      45,610,752

    46,566,133

1.02

64

                      14,598,340

    31,844,585

2.18

256

                        5,498,918

    11,588,535

2.11

1024

                        1,635,463

      3,219,048

1.97

8192

                           217,085

         414,262

1.91

16384

                           108,950

         207,887

1.91


$ openssl speed -evp aes-128-cbc-hmac-sha1
Doing aes-128-cbc-hmac-sha1 for 3s on 16 size blocks: 45610752 aes-128-cbc-hmac-sha1's in 2.99s
Doing aes-128-cbc-hmac-sha1 for 3s on 64 size blocks: 14598340 aes-128-cbc-hmac-sha1's in 3.00s
Doing aes-128-cbc-hmac-sha1 for 3s on 256 size blocks: 5498918 aes-128-cbc-hmac-sha1's in 3.00s
Doing aes-128-cbc-hmac-sha1 for 3s on 1024 size blocks: 1635463 aes-128-cbc-hmac-sha1's in 3.00s
Doing aes-128-cbc-hmac-sha1 for 3s on 8192 size blocks: 217085 aes-128-cbc-hmac-sha1's in 3.00s
Doing aes-128-cbc-hmac-sha1 for 3s on 16384 size blocks: 108950 aes-128-cbc-hmac-sha1's in 3.00s

OpenSSL 1.1.1  11 Sep 2018
built on: Thu Jun 20 17:36:28 2019 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-cn9tZy/openssl-1.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc-hmac-sha1   244070.91k   311431.25k   469241.00k   558238.04k   592786.77k   595012.27k

$ openssl speed -evp aes-256-gcm
Doing aes-256-gcm for 3s on 16 size blocks: 46566133 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 64 size blocks: 31844585 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 256 size blocks: 11588535 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 1024 size blocks: 3219048 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 8192 size blocks: 414262 aes-256-gcm's in 2.99s
Doing aes-256-gcm for 3s on 16384 size blocks: 207887 aes-256-gcm's in 3.00s

OpenSSL 1.1.1  11 Sep 2018
built on: Thu Jun 20 17:36:28 2019 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-cn9tZy/openssl-1.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-gcm     248352.71k   679351.15k   988888.32k  1098768.38k  1134994.75k  1135340.20k

When looking at our media server running 1000 concurrent channels of SRTP, our CPU utilization is roughly the same for both crypto suites. I might need to look at how exactly these encryption suites are implemented.

pablo platt

unread,
Oct 31, 2019, 11:19:04 AM10/31/19
to discuss...@googlegroups.com
Roman, I don't think that OpenSSL support stitching for AES-SHA1 (encrypt-then-mac).

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Roman Shpount

unread,
Oct 31, 2019, 3:33:11 PM10/31/19
to discuss-webrtc
You are correct. New assembler routine would need to be written for SRTP, but performance will likely be the same.

Another thing to look at for hardware acceleration on the server side is Intel QAT, which should already be available on Intel Salable Xeon CPU or add-on boards.

Philipp Hancke

unread,
Oct 31, 2019, 5:01:51 PM10/31/19
to discuss...@googlegroups.com
We just landed https://webrtc-review.googlesource.com/c/src/+/158404 which implements the strategy of not offering this as default suggested by Justin.
Will require some manual testing that this works as intended but since this does not change the default behaviour (much) it *might* just ride the release train without a flag.

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

pablo platt

unread,
Apr 18, 2020, 3:51:40 AM4/18/20
to discuss...@googlegroups.com
Hello @Roman
How hard will it be to add stitched aes-128-hmac-sha1 (encrypt-then-mac) to OpenSSL?
There is implementation of mac-then-encrypt but I'm not familiar with perl or assembly.
Is it just a matter of reordering staff in [1] or will I need deep understanding?

Does Intel QAT replaces stitched aes-128-hmac-sha1 or can I use both at the same time to accelerate?
Do I need to compile OpenSSL with QAT manually [2] on Ubuntu? Do I need to manually call a new OpenSSL function or will it just work?

Is there an implementation of multi-buffer that's easy to use for the end user?


Philipp Hancke

unread,
Apr 23, 2020, 2:16:06 PM4/23/20
to discuss...@googlegroups.com
btw, GCM support has landed in M84 without a flag, please test:
I apologize for spoiling the bugs birthday party!

Roman Shpount

unread,
Apr 23, 2020, 6:39:07 PM4/23/20
to discuss-webrtc
Adding it OpenSSL might be a bit more complicated then necessary, but adding an optimized crypto implementation to libsrtp should be simple enough. It should be possible to modify aesni-sha1-x86_64.pl to do encrypt-then-mac. Doing all the piping for this crypto implementation and then getting it accepted by OpenSSL would probably be more work then directly adding it to libsrtp.

The QAT_Engine is something else all together. This a dedicated hardware cypto engine which is present only on some Intel server platforms. This can also be used to offload SRTP to hardware, but I have no experience with this engine and do not have access to such hardware.

There is also a third option, which is available only for servers/SFU, which is instead of stitching MAC and encrypt, batch up processing for multiple packets. So, this way MAC is done on multiple packets at the same time using SSE and then encryption is done AES offload instructions. This option requires custom srtp library. 


On Saturday, April 18, 2020 at 3:51:40 AM UTC-4, pablo wrote:
Hello @Roman
How hard will it be to add stitched aes-128-hmac-sha1 (encrypt-then-mac) to OpenSSL?
There is implementation of mac-then-encrypt but I'm not familiar with perl or assembly.
Is it just a matter of reordering staff in [1] or will I need deep understanding?

Does Intel QAT replaces stitched aes-128-hmac-sha1 or can I use both at the same time to accelerate?
Do I need to compile OpenSSL with QAT manually [2] on Ubuntu? Do I need to manually call a new OpenSSL function or will it just work?

Is there an implementation of multi-buffer that's easy to use for the end user?


On Thu, Oct 31, 2019 at 11:01 PM 'Philipp Hancke' via discuss-webrtc <discuss...@googlegroups.com> wrote:
We just landed https://webrtc-review.googlesource.com/c/src/+/158404 which implements the strategy of not offering this as default suggested by Justin.
Will require some manual testing that this works as intended but since this does not change the default behaviour (much) it *might* just ride the release train without a flag.

Am Do., 31. Okt. 2019 um 20:33 Uhr schrieb Roman Shpount <rshp...@gmail.com>:
You are correct. New assembler routine would need to be written for SRTP, but performance will likely be the same.

Another thing to look at for hardware acceleration on the server side is Intel QAT, which should already be available on Intel Salable Xeon CPU or add-on boards.

On Thursday, October 31, 2019 at 11:19:04 AM UTC-4, pablo wrote:
Roman, I don't think that OpenSSL support stitching for AES-SHA1 (encrypt-then-mac).

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss...@googlegroups.com.

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages