ssl_certificate_by_lua slow down?

708 views
Skip to first unread message

Filippos Slavik

unread,
Jun 30, 2017, 11:50:00 AM6/30/17
to openresty-en
I've been experimenting with the ssl_certificate_by_lua directive and the latest official OpenResty release to dynamically control the SSL handshake process. Everything works beautifully and as expected, however, I've been roughly measuring a 13% to 15% of a slowdown in terms of "SSL negotiation per second" if compared to a "static" NGINX vhost configuration where the SSL handshake is, let's say, performed "natively" without the ssl_certificate_by_lua* directive in place. Let me explain further. 

The test is performed on the same system (Debian Jessie) and OpenResty from the official deb packages using a recent distribution of the siege CLI benchmarking tool which supports SNI - therefore there is no SSL session cache involved and the benchmarked "aspect" is "how many SSL handshakes the configuration/setup can pull". I'm aware that the actual SSL handshake is a rather expensive operation (and most of the CPU cycles spent, is in OpenSSL), however, I'm puzzled to see that there is such a measurable difference on the same system, with the same version of OpenResty/NGINX/OpenSSL between setups that use (or not) the ssl_certificate_by_lua directive.   

The code in the ssl_certificate_by_lua that I've used to benchmark was very simple, almost identical with the "Synopsis" code listing found at the official ngx.ssl documentation  - the only difference is that the actual cert is read in the init_by_lua* hook and cached in an LUA table for simplicity. 

Are you experiencing the same results in your specific use cases as the above or do you believe I'm missing something? 

Cheers!
Filip

jona...@findmeon.com

unread,
Jun 30, 2017, 1:33:41 PM6/30/17
to openresty-en
Assuming you used the synopsis as-is...

The synopsis code reads the PEM format and converts to DER on every handshake.  Your overhead is likely from that.

You can instead convert PEM to a cdata pointer, save it into the per-worker cache, and then use set_cert.  The hit to prime the cache has a slightly slowdown, but then it is considerably faster.

i opensourced our failover cert caching code here:  https://github.com/aptise/peter_sslers-lua-resty

Filippos Slavik

unread,
Jun 30, 2017, 3:11:36 PM6/30/17
to openresty-en
Hi,

Thanks, for the comment, however, I did not use the synopsis 100% as-is. As noted in my initial comment, the test code is reading the cert in an init_by_lua* block and caches the parsed certificate, in DER format, in a global LUA table that is then used by the worker to look up the cert and set it during SSL handshake (with set_der_cert & set_der_priv_key). I've used both global LUA table and LRU cache to test, with no significant difference in overall performance. I've used both global LUA table and LRU cache to test, with no significant difference in overall performance.  

> i opensourced our failover cert caching code here:  https://github.com/aptise/peter_sslers-lua-resty

Thanks, I'll definitely check the code.

Cheers!
Filip

jona...@findmeon.com

unread,
Jun 30, 2017, 5:00:44 PM6/30/17
to openresty-en


On Friday, June 30, 2017 at 3:11:36 PM UTC-4, Filippos Slavik wrote:
> Thanks, for the comment, however, I did not use the synopsis 100% as-is. As noted in my initial comment, the test code is reading the cert in an init_by_lua* block and caches the parsed certificate, in DER format, in a global LUA table that is then used by the worker to look up the cert and set it during SSL handshake (with set_der_cert & set_der_priv_key). I've used both global LUA table and LRU cache to test, with no significant difference in overall performance. I've used both global LUA table and LRU cache to test, with no significant difference in overall performance.

Yes I understand, but there will still be a performance hit with that strategy.  OpenResty supports multiple hooks/formats of SSL certs. 

* PEM is converted to DER via `cert_pem_to_der`, then activated via `set_der_cert`. The DER can be cached via lua-resty-lrucache (worker) or lusa_shared_dict (shared for all processes)

* PEM is converted to cdata pointer via `parse_pem_cert`, then activated via `set_cert`.  The cdata pointer can be cached via lua-resty-lrucache (worker)

Using a cached value from `parse_pem_cert` via `set_cert` is much more performant than using a cached value from `cert_pem_to_der` via `set_der_cert` -- but you can only cache within a worker, not shared across a process.

The `parse_pem_cert` + `set_cert` combo is closer to what nginx does with files based certs.

When you use `cert_pem_to_der`, you're only caching a preliminary transcode.  `set_der_cert` still has to read/process the certificate, it's just doing so from DER format. 

The reason why there are these two hooks is because historically, certificate loading was via DER format. The PEM->DER conversion was a convenience method, as certificates are commonly stored/distributed/archived in PEM format.  A year or two ago, someone was nice enough to write lower-level hooks into the SSL logic and bypass the whole certificate loading/transcoding logic.

Filippos Slavik

unread,
Jul 2, 2017, 12:13:43 PM7/2/17
to openresty-en
Hi Jonathan,

You are absolutely right. The performance slowdown was due to the DER certificate format I've been using in my code. Following your suggestions, I've simply switched to opaque cdata pointers returned by parse_pem_cert and now my benchmarks show almost identical SSL negotiation handshakes rates if compared to a "static" file based nginx certificate configuration.

Thank you!
Filip

jona...@findmeon.com

unread,
Jul 2, 2017, 1:13:56 PM7/2/17
to openresty-en


On Sunday, July 2, 2017 at 12:13:43 PM UTC-4, Filippos Slavik wrote:
 
Thank you!

No problem!   The docs aren't clear on this nuance.
 
Reply all
Reply to author
Forward
0 new messages