can't get SSL to work and there are no ssl errors in the logs

1,131 views
Skip to first unread message

adam....@gmail.com

unread,
May 31, 2017, 11:37:10 PM5/31/17
to Envoy Users
Hi folks,

I've modeled my service similar to the example front-proxy (https://github.com/lyft/envoy/tree/master/examples/front-proxy), and running on kubernetes.  This works like a charm.

However, when I add TLS into the mix and then hit the edge-envoy, I get "upstream connect error or disconnect/reset before headers".  here's what I've tried:

* logging onto the envoy-edge container, and run curl against the service-envoy directly, using the exact same ca_cert_file defined in the envoy edge configs, works like a charm
* the logs only show a 503 with a UF error; turning on trace logging doesn't show any additional SSL error info (unfortunately; that would be sweet if it did!)
* if I pull out all the ssl_context blocks on both the service and edge envoys, it works

I've posted configs and some curl calls here: https://gist.github.com/skippy/410197ac2d9e192e7af9458bcf5b469f

is there a way to better dive into what is going on?

Thanks,
adam



adam....@gmail.com

unread,
Jun 1, 2017, 1:09:13 AM6/1/17
to Envoy Users, adam....@gmail.com
ah!  I'm using Vault to generate the certs, and they were 384 bit eliptic curves.  If I use 256 bit ECCs or Vault's default RSA 2048, it works.  I've attached 384 ECC certs into the gist in case they are needed for debugging.

At a high level, I think the following two github issues should be open:

* support the full set of valid ECC options
* when running in trace mode, showing TLS/SSL steps, like curl, would be super useful.  Right now the only thing I see in the logs is this: 2017-06-01T04:22:37.162475152Z [2017-06-01T04:22:37.160Z] "GET /service-pdf/status.json HTTP/1.1" 503 UF 0 57 2 - "-" "curl/7.51.0" "85da5a32-026c-4088-9dad-9e2927d8d30c" "192.168.99.100:32529" "10.0.0.68:443"

having better visibility on why the cluster failed would be *awesome* and a huge time saver.

But, before diving into logging... how can I help?  I can't share my company dev env publicly, but I can easily run some test cases or grab more tracing info if someone can walk me through it.

but boy, I'm feeling much better at the moment! 

Adam

Matt Klein

unread,
Jun 5, 2017, 6:55:41 PM6/5/17
to adam....@gmail.com, Piotr Sikora, Envoy Users
Adam,

I've added Piotr. I'm not sure if he usually monitors this list or not. He is our TLS guru who usually helps with these types of issues. Hopefully he will have some ideas.

Thanks,
Matt

--
You received this message because you are subscribed to the Google Groups "Envoy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users+unsubscribe@googlegroups.com.
To post to this group, send email to envoy...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-users/05e7e5f4-c177-4bfd-a066-c1852649a5b5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Matt Klein
Software Engineer
mkl...@lyft.com / 206.327.4515

adam....@gmail.com

unread,
Jun 6, 2017, 10:50:39 AM6/6/17
to Envoy Users, adam....@gmail.com, piotr...@google.com
thanks Matt!  We all need TLS gurus on our teams :)

Piotr, this may seem like just a quirk, but it actually comes into play for folks trying to implement NIST SP 800-57; sure we can use large RSA keys, but EC performs much better.  384 is the minimum recommended key size, which is why I was trying that combination.  FYI.

Adam

Piotr Sikora

unread,
Jun 6, 2017, 5:53:55 PM6/6/17
to adam....@gmail.com, Matt Klein, Envoy Users
Hey Adam,
what's the content of /etc/envoy/cert/cert.ca in edge-envoy?

If it's the same as service-envoy-cert.ca, then it should work just
fine (and indeed, it works perfectly fine using provided gist files in
my local setup), otherwise, you need to make sure that edge-envoy's CA
file can be used to validate service-envoy-cert.crt.

Best regards,
Piotr Sikora
>> email to envoy-users...@googlegroups.com.

adam....@gmail.com

unread,
Jun 7, 2017, 11:29:10 AM6/7/17
to Envoy Users, adam....@gmail.com, mkl...@lyft.com
Hi Piotr,

the only thing I changed to get it to work was have vault move from using EC 384 to RSA 2048.  the cert.ca's on both were the same.  I should have mentioned earlier that I'm using the lyft/envoy:1.3.0 docker image; perhaps we're linking to different versions of boringssl or some such thing.  

With that said, I'll go back through everything and reverify yet again.  If I can recreate, I'll let you know, and then would you mind hoping off of google groups and chatting 1-on-1 so I can share more details on our company environment?  I think I can give you repeatable instructions for k8s.  But first thing first, let me dive back in and recreate to verify.

Thank you,
Adam

adam....@gmail.com

unread,
Jun 9, 2017, 1:03:43 AM6/9/17
to Envoy Users, adam....@gmail.com, mkl...@lyft.com
Hey Piotr,

I was able to recreate it.  I just changed the vault role from EC 256 -> EC 384, reloaded envoy-edge and envoy-service to pick up new certs, and it failed.  the CA is the same on both servers.

Piotr, if you want to drop offline, I can share my setup with you.  It is 'easy' to recreate, but if you've never used vault before, there are a few steps needed to get it setup (init, unseal, load configs, etc).  But I have it documented so you can run through the steps.  OR, if you would find it easier, I can create various certs, make them last longer than the current 2hr expiration, and pass those along.  

Thanks,
adam

Matt Klein

unread,
Jun 9, 2017, 9:09:11 PM6/9/17
to Adam Greene, David Benjamin, Adam Langley, Envoy Users
+David/AdamL for visibility in case this is a boringssl issue.

AdamG, I think at this point it would be best if you could open a GH issue that provides a self contained way to repro the cert loading problem and then it will be easier to investigate.

Thanks,
Matt

--
You received this message because you are subscribed to the Google Groups "Envoy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users+unsubscribe@googlegroups.com.

To post to this group, send email to envoy...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Adam Langley

unread,
Jun 9, 2017, 9:21:06 PM6/9/17
to Matt Klein, Adam Greene, David Benjamin, Envoy Users
On Fri, Jun 9, 2017 at 6:09 PM, Matt Klein <mkl...@lyft.com> wrote:
+David/AdamL for visibility in case this is a boringssl issue.

AdamG, I think at this point it would be best if you could open a GH issue that provides a self contained way to repro the cert loading problem and then it will be easier to investigate.

If it's a parse error of the certs or keys, maybe just send the cert / key.


Cheers

AGL 

Adam Greene

unread,
Jun 9, 2017, 9:22:09 PM6/9/17
to Adam Langley, Matt Klein, David Benjamin, Envoy Users
Hey Adam,

I'll start there.  I'll try to get to it this evening.  Thanks kindly.  

--
You received this message because you are subscribed to a topic in the Google Groups "Envoy Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/envoy-users/Y9NhplORK-A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to envoy-users+unsubscribe@googlegroups.com.

To post to this group, send email to envoy...@googlegroups.com.

Matt Klein

unread,
Jun 10, 2017, 12:52:25 PM6/10/17
to Adam Greene, Adam Langley, David Benjamin, Envoy Users
https://github.com/lyft/envoy/issues/1082 for those not monitoring Envoy repo.

David Benjamin

unread,
Jun 10, 2017, 6:36:06 PM6/10/17
to Matt Klein, Adam Greene, Adam Langley, Envoy Users
Is Envoy acting as a client talking to some backend P-384 servers, or is it trying to serve a P-384 certificate?

If the former, Envoy's default curves don't include P-384, and I don't see any indication of separate client and server configurations. That is probably the problem.

As a server, this means that you will only negotiate those two curves for ECDH, which is a sensible configuration. I don't believe, in our implementation, it affects what certificates the server is willing to serve. But as a client it is the acceptable curve list for *both* ECDH and ECDSA. TLS negotiates the two with the same list. (I got this fixed in TLS 1.3, but earlier versions of TLS are what they are.)

This is one of those places where client and server configurations may wish to differ. Envoy's defaults are good for a server. Servers get to pick curves and a lot more effort is spent making P-256 implementations fast and constant-time. But clients often have to deal with whatever servers people deploy and that means they often need to tolerate P-384.

On Sat, Jun 10, 2017 at 12:52 PM Matt Klein <mkl...@lyft.com> wrote:
https://github.com/lyft/envoy/issues/1082 for those not monitoring Envoy repo.
On Fri, Jun 9, 2017 at 6:21 PM, Adam Greene <adam....@gmail.com> wrote:
Hey Adam,

I'll start there.  I'll try to get to it this evening.  Thanks kindly.  
On Fri, Jun 9, 2017 at 6:20 PM, 'Adam Langley' via Envoy Users <envoy...@googlegroups.com> wrote:
On Fri, Jun 9, 2017 at 6:09 PM, Matt Klein <mkl...@lyft.com> wrote:
+David/AdamL for visibility in case this is a boringssl issue.

AdamG, I think at this point it would be best if you could open a GH issue that provides a self contained way to repro the cert loading problem and then it will be easier to investigate.

If it's a parse error of the certs or keys, maybe just send the cert / key.


Cheers

AGL 

--
You received this message because you are subscribed to a topic in the Google Groups "Envoy Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/envoy-users/Y9NhplORK-A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to envoy-users...@googlegroups.com.

adam....@gmail.com

unread,
Jun 10, 2017, 6:58:25 PM6/10/17
to Envoy Users, mkl...@lyft.com, adam....@gmail.com, a...@google.com, davi...@google.com
Hi David


Is Envoy acting as a client talking to some backend P-384 servers, or is it trying to serve a P-384 certificate?

I believe the answer is both.

The breaking flow is:
external client ---> Envoy Edge  ---> Envoy Service (serving encrypted http2 w/P-384) --->localhost service (unencrypted)

Envoy edge seems to be telling me that no hosts are found in the Envoy Service cluster.  So the Envoy Edge seems to be able to serve a P-384, but when envoy edge is talking to Envoy Service, it fails and is thus not registering it as a healthy member of the cluster.  Thus the break is with Envoy serving p-384 to an Envoy client.  That is my guess anyway.

FYI, if I do this (hit envoy service directly):
external client ---> Envoy Service (serving encrypted http/1.1 w/P-384) --->localhost service (unencrypted)

it works.
 

Matt Klein

unread,
Jun 10, 2017, 7:04:25 PM6/10/17
to Adam Greene, Envoy Users, Adam Langley, David Benjamin
I don't think it's a matter of not healthy. I think the issue is that when Envoy is acting as a client, it's not negotiating the P-384 curve correctly.

David, if the server (envoy) serves a P-384 cert, what ECDH config should the client (envoy) have? This is all way beyond me, but I think everything is configurable that we need to be configurable.

There is a larger issue if whether we should have different defaults in general and for client/server.

--
You received this message because you are subscribed to the Google Groups "Envoy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users+unsubscribe@googlegroups.com.

To post to this group, send email to envoy...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

David Benjamin

unread,
Jun 11, 2017, 12:32:43 AM6/11/17
to Matt Klein, Adam Greene, Envoy Users, Adam Langley
On Sat, Jun 10, 2017 at 7:04 PM Matt Klein <mkl...@lyft.com> wrote:
I don't think it's a matter of not healthy. I think the issue is that when Envoy is acting as a client, it's not negotiating the P-384 curve correctly.

David, if the server (envoy) serves a P-384 cert, what ECDH config should the client (envoy) have? This is all way beyond me, but I think everything is configurable that we need to be configurable.

If the server serves a P-384 certificate, that means you are using P-384 ECDSA. The ECDH portion of the protocol can be P-256 or P-384 or whatever.

However, until TLS 1.3, it is impossible to advertise different capabilities for ECDH and ECDSA. The client "curve list" applied to both. That is, your DEFAULT_ECDH_CURVES variable is misnamed. It should say DEFAULT_CURVES as it applies to both. If the server serves a P-384 certificate, your client needs to accept P-384 in the curve list for the connection to work.

Parameter negotiation in TLS 1.2 and earlier is unfortunately a mess. ECDSA use is affected by three different parameter lists (cipher suites must be an ECDSA cipher, curve list must contain suitable curve, and signature algorithm list must contain ECDSA signing algorithms).

There is a larger issue if whether we should have different defaults in general and for client/server.

On Sat, Jun 10, 2017 at 3:58 PM, <adam....@gmail.com> wrote:
Hi David

Is Envoy acting as a client talking to some backend P-384 servers, or is it trying to serve a P-384 certificate?

I believe the answer is both.

The breaking flow is:
external client ---> Envoy Edge  ---> Envoy Service (serving encrypted http2 w/P-384) --->localhost service (unencrypted)

Envoy edge seems to be telling me that no hosts are found in the Envoy Service cluster.  So the Envoy Edge seems to be able to serve a P-384, but when envoy edge is talking to Envoy Service, it fails and is thus not registering it as a healthy member of the cluster.  Thus the break is with Envoy serving p-384 to an Envoy client.  That is my guess anyway.

FYI, if I do this (hit envoy service directly):
external client ---> Envoy Service (serving encrypted http/1.1 w/P-384) --->localhost service (unencrypted)

it works.
 

--
You received this message because you are subscribed to the Google Groups "Envoy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users...@googlegroups.com.

Matt Klein

unread,
Jun 11, 2017, 2:32:14 PM6/11/17
to David Benjamin, Piotr Sikora, Adam Greene, Envoy Users, Adam Langley
Adding Piotr again since he added the ability to configure curves.

David, sorry for being totally dense, but what values need to be specified in "cipher_suites" and "ecdh_curves" (https://lyft.github.io/envoy/docs/configuration/cluster_manager/cluster_ssl.html) for this case to work correctly? If I'm understanding correctly, I think that Adam's certs should work with the right advertisements?

There is a larger question as to whether we have the right defaults for the client side of things.

On Sat, Jun 10, 2017 at 9:32 PM, David Benjamin <davi...@google.com> wrote:
On Sat, Jun 10, 2017 at 7:04 PM Matt Klein <mkl...@lyft.com> wrote:
I don't think it's a matter of not healthy. I think the issue is that when Envoy is acting as a client, it's not negotiating the P-384 curve correctly.

David, if the server (envoy) serves a P-384 cert, what ECDH config should the client (envoy) have? This is all way beyond me, but I think everything is configurable that we need to be configurable.

If the server serves a P-384 certificate, that means you are using P-384 ECDSA. The ECDH portion of the protocol can be P-256 or P-384 or whatever.

However, until TLS 1.3, it is impossible to advertise different capabilities for ECDH and ECDSA. The client "curve list" applied to both. That is, your DEFAULT_ECDH_CURVES variable is misnamed. It should say DEFAULT_CURVES as it applies to both. If the server serves a P-384 certificate, your client needs to accept P-384 in the curve list for the connection to work.

Parameter negotiation in TLS 1.2 and earlier is unfortunately a mess. ECDSA use is affected by three different parameter lists (cipher suites must be an ECDSA cipher, curve list must contain suitable curve, and signature algorithm list must contain ECDSA signing algorithms).
There is a larger issue if whether we should have different defaults in general and for client/server.

On Sat, Jun 10, 2017 at 3:58 PM, <adam....@gmail.com> wrote:
Hi David

Is Envoy acting as a client talking to some backend P-384 servers, or is it trying to serve a P-384 certificate?

I believe the answer is both.

The breaking flow is:
external client ---> Envoy Edge  ---> Envoy Service (serving encrypted http2 w/P-384) --->localhost service (unencrypted)

Envoy edge seems to be telling me that no hosts are found in the Envoy Service cluster.  So the Envoy Edge seems to be able to serve a P-384, but when envoy edge is talking to Envoy Service, it fails and is thus not registering it as a healthy member of the cluster.  Thus the break is with Envoy serving p-384 to an Envoy client.  That is my guess anyway.

FYI, if I do this (hit envoy service directly):
external client ---> Envoy Service (serving encrypted http/1.1 w/P-384) --->localhost service (unencrypted)

it works.
 

--
You received this message because you are subscribed to the Google Groups "Envoy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users+unsubscribe@googlegroups.com.

To post to this group, send email to envoy...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Matt Klein
Software Engineer
mkl...@lyft.com / 206.327.4515

Piotr Sikora

unread,
Jun 12, 2017, 8:57:37 AM6/12/17
to David Benjamin, Matt Klein, Adam Greene, Adam Langley, Envoy Users
Hey David,
thanks, that was it!

My Envoy and bssl tool both negotiated TLSv1.3, that's why it worked
fine for me :(

Anyway, Adam (Greene), you can add:

"ssl_context": {
"ecdh_curves": "X25519:P-256:P-384"
},

to you cluster configuration and it will work just fine.

David, is there any reason why P-384 shouldn't be enabled by default
when acting as client?

Best regards,
Piotr Sikora


On Sat, Jun 10, 2017 at 3:35 PM, 'David Benjamin' via Envoy Users
> You received this message because you are subscribed to the Google Groups
> "Envoy Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to envoy-users...@googlegroups.com.
> To post to this group, send email to envoy...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/envoy-users/CAF8qwaBpjDm0B9ZA8s-7ASowhWpOyKYxUeGYQxiVJL4gbExj7Q%40mail.gmail.com.

David Benjamin

unread,
Jun 12, 2017, 11:12:50 AM6/12/17
to Piotr Sikora, Matt Klein, Adam Greene, Adam Langley, Envoy Users
On Mon, Jun 12, 2017 at 8:57 AM Piotr Sikora <piotr...@google.com> wrote:
Hey David,
thanks, that was it!

My Envoy and bssl tool both negotiated TLSv1.3, that's why it worked
fine for me :(

Anyway, Adam (Greene), you can add:

        "ssl_context": {
          "ecdh_curves": "X25519:P-256:P-384"
        },

to you cluster configuration and it will work just fine.

David, is there any reason why P-384 shouldn't be enabled by default
when acting as client?

Chrome enables it. Yeah, I would expect most clients would need it enabled to talk to servers that are out there.

Matt Klein

unread,
Jun 12, 2017, 12:21:09 PM6/12/17
to David Benjamin, Piotr Sikora, Adam Greene, Adam Langley, Envoy Users
It should be very straightforward to split ContextConfigImpl into ClientContextConfigImpl and ServerContextConfigImpl with a common base class so that we can have different defaults. It sounds like this is what we should do.


>>>> To post to this group, send email to envoy...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/envoy-users/CAL9PXLzjiTkPixKwZV0zyhNV-h60sXt60LVQEfy%3DmnomO%3DOtaA%40mail.gmail.com.
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>
>>
>>
>> --
>> Matt Klein
>> Software Engineer
>> mkl...@lyft.com / 206.327.4515
>
> --
> You received this message because you are subscribed to the Google Groups
> "Envoy Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> To post to this group, send email to envoy...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/envoy-users/CAF8qwaBpjDm0B9ZA8s-7ASowhWpOyKYxUeGYQxiVJL4gbExj7Q%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.

Michael Payne

unread,
Jun 12, 2017, 5:36:29 PM6/12/17
to envoy-users, davi...@google.com, mkl...@lyft.com, adam....@gmail.com, a...@google.com
My Envoy and bssl tool both negotiated TLSv1.3, that's why it worked
fine for me :(

We have chatted about enabling TLS 1.3 before but I didn't know you had it working. Are you running a forked build? Will you add this to Envoy master?

Reply all
Reply to author
Forward
0 new messages