Problems with SSL on a custom domain

1,072 views
Skip to first unread message

Kai van Duuren

unread,
Aug 21, 2015, 7:14:50 AM8/21/15
to Google App Engine
We are having problems serving our web app on App Engine with SSL on a custom domain. We had followed the instructions for uploading a certificate to a custom domain via the admin console. This had been working without problems for several weeks, but since yesterday we have been observing some SSL connection errors.

Retrieving the main page from our location in the UK results in a connection error. Accessing via a proxy in San Fransisco loads the page without problem.

We have tried deleting and reinstalling the certificate via the admin console but this had no effect.

The url in question is https://dashboard.geospock.com

What could be causing this? Is this a routing issue internal to Google or could it be a configuration problem on our end?

Thanks.

Patrice (Cloud Platform Support)

unread,
Aug 21, 2015, 2:56:53 PM8/21/15
to Google App Engine
Hi Kai,

What exact error do you get when you connect from the UK? A screenshot with the exact error might be interesting to see so we can further troubleshoot this.

Cheers!

Jon Travers

unread,
Aug 22, 2015, 5:04:01 AM8/22/15
to Google App Engine
Hi Patrice, thanks for your reply. I'm Kai's colleague here in the UK, and have been working with him on this problem. To answer your question:

The manifestation of the problem when we pointed a browser at our web app (https://dashboard.geospock.com), and forced a full refresh, was that the browser would refuse to connect at all, showing an error something like "Safari was unable to establish a secure connection to the website". As Kai mentioned, whether or not it worked was dependent on which network we connected from: It failed on our UK office broadband connection, and from a VPS located in London, whereas it worked fine on our UK mobile data connections, from a San Fransisco VPS, and also on my UK broadband connection at home (different ISP). The easiest way I found to test was to run the following openssl command:

openssl s_client -servername dashboard.geospock.com -connect dashboard.geospock.com:443 -showcerts


If successful, this prints out our custom SSL certificate (the app is hosted via a custom Google Apps domain). When it failed, openssl would crash, with the following output:

CONNECTED(00000003)

58671:error:14077417:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert illegal parameter:/SourceCache/OpenSSL098/OpenSSL098-52.30.1/src/ssl/s23_clnt.c:593:


Where possible, we made a note of the IP addresses resolved for dashboard.geospock.com in the locations we were testing. For our (failing) office network, we were always routed to: 64.233.167.121, from the London VPS (not working) we got 64.233.166.121, whereas from the SFO VPS (which worked), we got 173.194.79.121, and from my home broadband (also working) I get 173.194.67.121.

I notice that currently when I run the openssl test, but connect it directly to one of the failing resolved IP addresses, or try the London VPS (which was failing), it's working fine. I suppose this means the problem has now been resolved. It certainly was still failing yesterday when we originally posted, and for at least 24 hours before that.

The whole situation is very concerning for us, since all our production apps are hosted on App Engine with custom SSL certificates served via Google Apps domains. If our apps are failing intermittently for periods of several days in some locations, that will have a very serious impact on our business. We need to understand exactly what happened, and how we can prevent it in the future.

Thanks
Jon

Hugo Visser

unread,
Aug 23, 2015, 9:37:23 AM8/23/15
to Google App Engine
I have a similar report from the UK from a user of my mobile app. Access over SSL is not working while it is for the majority of other users. Also had one in the past, which was resolved automatically.

Hugo

Jon Travers

unread,
Aug 24, 2015, 7:39:10 AM8/24/15
to Google App Engine
A quick update on this. As of now, 24 Aug 11:30am UTC, it's failing again, but only from our office broadband connection, it's working from our London VPS. The resolved IP addresses seem to have shifted around a bit in some locations - currently dashboard.geospock.com is resolving to 64.233.166.121 from our office connection.

Would really appreciate some information on what's going on here.
Jon

Patrice (Cloud Platform Support)

unread,
Aug 24, 2015, 10:57:47 AM8/24/15
to Google App Engine
Hi Jon,

Thank you for all that extra information, this definitely will help in finding out what issue you're having.

Do you have on hand the IPs of the different locations/VPSes? I'm trying to investigate, but having the exact IP that threw up the request would definitely be helpful.

Cheers

Jon Travers

unread,
Aug 24, 2015, 11:09:22 AM8/24/15
to Google App Engine
Yes certainly. For the two locations where we've seen problems: our office broadband is using IP address 94.10.92.68, and our London VPS is on 46.101.45.203.

Hope that helps
Jon

Patrice (Cloud Platform Support)

unread,
Aug 26, 2015, 10:11:32 AM8/26/15
to Google App Engine
Hi Jon,

Sorry for the delay, there was a lot of movement and investigation to get to the bottom of this. What seems to be the issue is that older clients that support sslv3 but not tls appeared to get errors.

Running $ openssl s_client -ssl3 -connect <fqdn>:443 -servername <fqdn> 
would always fail with a handhshake error. As soon as we drop "-ssl3, everything goes ok. Looking into the "illegal parameter" you get, we cannot reproduce, and we have to pin it down to the version of software that you're running. Looking online, every report of open ssl throwing "illegal parameter" seemed to have to do with the version of openSSL or a client-side config.

I then went to check with the back-end team to see what was happening there. Turns out it is indeed working as intended. SNI does not support SSLv3. To get such a certificate up and running, I would suggest moving to a Virtual IP, which can help your situation.

I hope that this will provide enough to shed some light on this.

Cheers

Jon Travers

unread,
Aug 26, 2015, 11:05:05 AM8/26/15
to Google App Engine
Hi Patrice, thanks for your reply. Unfortunately, your explanation does not fit the facts which I've supplied. Let me try to convince you by restating in more detail two parts of my investigation:

1. Ignoring openssl for a moment, it is a FACT that if I open up the latest version of Chrome on my laptop here in our office, and type in the URL https://dashboard.geospock.com, the page doesn't load, and displays the following error:

SSL connection error
ERR_SSL_PROTOCOL_ERROR
Unable to make a secure connection to the server. This may be a problem with the server, or it may be requiring a client authentication certificate that you don't have.

It's doing this right now, as I type this. If I repeat the test on my home broadband connection, it loads and works just fine. The latest version of Chrome quite clearly supports all the latest TLS versions (right?), so you can't explain that away by claiming that I need to upgrade to a client the supports more than SSLv3. That's what I'm doing!

2. Looking at the openssl results, I also observed the debug info mentions SSLv3, and found that very odd. I experimented with using the openssl switches to disallow SSLv3 connections, and it had no effect on the behaviour. I also noted that in cases where it works (on the same computer, with the exact same command line), openssl indicates that it has established a TLSv1 connection, not SSLv3. So my conclusion is that the SSLv3 info in the crash is there because the SSL negotiation is somehow being corrupted, causing openssl to attempt to fallback to SSLv3, and then crashing it. For the record, the version of openssl on my computer is OpenSSL 0.9.8zg 14 July 2015.

So the bottom line here, unless you're prepared to investigate further, is that serving App Engine apps on a custom Google Apps domain using a custom certificate via SNI is unusable (because it might fail randomly at any time in any geographical location). I suppose we could try using a virtual IP address instead, but that absolutely shouldn't be necessary. For our business-critical apps, we've now rerouted our traffic via our own SSL proxy to achieve exactly what Google Apps is supposed to be doing (our proxy is using SNI), and that's working just fine, while the Google infrastructure continues to fail. I find this unacceptable. Is there really nothing more you can do?

Jon Travers

unread,
Aug 26, 2015, 11:16:54 AM8/26/15
to Google App Engine
Here is some more detailed debugging information from the failing openssl SSL negotiation. Perhaps this would give you a clue what's actually going on?:

$ openssl s_client -debug -msg -servername dashboard.geospock.com -connect dashboard.geospock.com:443 -showcerts
CONNECTED(00000003)
write to 0x7f96b8d006a0 [0x7f96b9802000] (131 bytes => 131 (0x83))
0000 - 16 03 01 00 7e 01 00 00-7a 03 01 55 dd d7 93 c7   ....~...z..U....
0010 - f0 b2 0d ee ea f4 1c 2b-ee 50 b6 ff 0f e6 8f 59   .......+.P.....Y
0020 - 8b 81 9e 05 2f 17 84 e2-20 ed b7 00 00 2e 00 39   ..../... ......9
0030 - 00 38 00 35 00 16 00 13-00 0a 00 33 00 32 00 2f   .8.5.......3.2./
0040 - 00 9a 00 99 00 96 00 05-00 04 00 15 00 12 00 09   ................
0050 - 00 14 00 11 00 08 00 06-00 03 00 ff 01 00 00 23   ...............#
0060 - 00 00 00 1b 00 19 00 00-16 64 61 73 68 62 6f 61   .........dashboa
0070 - 72 64 2e 67 65 6f 73 70-6f 63 6b 2e 63 6f 6d 00   rd.geospock.com.
0080 - 23                                                #
0083 - <SPACES/NULS>
>>> TLS 1.0 Handshake [length 007e], ClientHello
    01 00 00 7a 03 01 55 dd d7 93 c7 f0 b2 0d ee ea
    f4 1c 2b ee 50 b6 ff 0f e6 8f 59 8b 81 9e 05 2f
    17 84 e2 20 ed b7 00 00 2e 00 39 00 38 00 35 00
    16 00 13 00 0a 00 33 00 32 00 2f 00 9a 00 99 00
    96 00 05 00 04 00 15 00 12 00 09 00 14 00 11 00
    08 00 06 00 03 00 ff 01 00 00 23 00 00 00 1b 00
    19 00 00 16 64 61 73 68 62 6f 61 72 64 2e 67 65
    6f 73 70 6f 63 6b 2e 63 6f 6d 00 23 00 00
read from 0x7f96b8d006a0 [0x7f96b9807600] (7 bytes => 7 (0x7))
0000 - 15 03 01 00 02 02 2f                              ....../
<<< TLS 1.0 Alert [length 0002], fatal illegal_parameter
    02 2f
8175:error:14077417:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert illegal parameter:/SourceCache/OpenSSL098/OpenSSL098-52.40.1/src/ssl/s23_clnt.c:593:

Patrice (Cloud Platform Support)

unread,
Aug 26, 2015, 3:09:52 PM8/26/15
to Google App Engine
Hi again Jon,

I continued trying to find what is exactly happening here. So here goes :

Running this in San Francisco :

openssl s_client -debug -msg -servername dashboard.geospock.com -connect dashboard.geospock.com:443 -showcerts 

works perfectly and consistently. So I think we can all see that SF won't have issues connecting here.

I then connected through an European location, and from there, trying this also succeeds:

openssl s_client -servername dashboard.geospock.com -connect 64.233.167.121:443 -showcerts 
curl -k -I --resolve dashboard.geospock.com:443:64.233.167.121 https://dashboard.geospock.com/ 

From your error and your message, I get you're using TLS 1.0, while I am running with TLS 1.2. I started looking into the version of openSSL we're both using, and while you're using the 0.98zg, I am on 1.0.1f. 

I believe since 0.9.8j SNI support was part of OpenSSL, so you using 0.98zg should work fine, but as a test, do you mind trying to run the same with 1.01 latest? I believe they are up to 1.0.1p. If I remember correctly, there was a backport patch implemented in 0.9.8j to let openSSL work with SNI (https://en.wikipedia.org/wiki/Server_Name_Indication ).

The basis of the issue we have here is that I'm incapable of reproducing any of the behaviors you're experiencing, so it's hard to investigate further. I'm definitely interested in continuing this, but without a reliable way for me to replicate, it's impossible to send it up the chain to get it looked at and possibly fixed. If you have a specific test case that consistently fail, that I can then reproduce on my side as consistently, I'll be able to get some traction on this. 

Thanks

Hugo Visser

unread,
Aug 29, 2015, 2:37:01 PM8/29/15
to Google App Engine
Like I mentioned previously, today I got another user from the UK reporting a similar issue. For some reason the SNI SSL served version of my app isn't reachable or working for them, but going to https://myapp.appspot.com works. I don't see a huge drop in usage in the app so I can't really tell what is causing it, just that those users can't establish a SSL connection to my custom domain url from their devices.

Hugo

Patrice (Cloud Platform Support)

unread,
Sep 1, 2015, 11:16:34 AM9/1/15
to Google App Engine
Hi Hugo, 

Continuing to look into this, I realized that a lot of people who have similar issues are all using a precise ISP. Do you know the ISP that your users have?

Cheers!

Hugo Visser

unread,
Sep 1, 2015, 2:58:41 PM9/1/15
to Google App Engine
The latest report came from a user on Sky. He also mentioned that sometimes it works, and sometimes it doesn't. I can't verify if that's true though. I've asked both users to open a browser to my custom domain app url. I've now resorted to updating the app and setting the endpoint to the appspot domain, which is not what I'd want ideally, but I don't want to lose any users over it either.

The user that contacted me stated that it broke several weeks ago.

Hugo

Patrice (Cloud Platform Support)

unread,
Sep 1, 2015, 3:03:50 PM9/1/15
to Google App Engine
Hi Hugo,

Exactly what I was expecting. Whoever reports these always seem to be on Sky, which could be the problem.

I don't know if you'll be able to get this since it's your users and not you directly, but if you could get a tcpdump and a print screen of the page "chrome://net-internals" from your affected user, maybe we'll be able to figure something out.

Once you get these, don't send them publicly to the group. You can use the arrow pointing down by the reply button and select "reply privately to author" on one of my messages. 

Cheers!

Jon Travers

unread,
Sep 1, 2015, 6:06:51 PM9/1/15
to Google App Engine
Hi Patrice

I can confirm that our office broadband, where we've consistently seen this problem, is indeed a Sky connection. Since I last posted, I've done some further investigation using Wireshark to capture and decode the TSL protocol traffic being sent by Safari, Chrome, and Curl. My conclusion was that the connection failure is caused by a very specific combination of factors:
  1. There is some specific property of the TLS client setup that must be present. I'm unable to identify exactly what, but on my machine it fails with up-to-date versions of Safari and Chrome, and also with the built-in OpenSSL (0.9.8zg), but not with the version of Curl that I have, nor with an updated version of OpenSSL. I can see differences in the connection parameters between these clients, but it's unclear which one is the cause.
  2. The Initial TLS "ServerHello" message is being modified in a specific way by some IP routing node between my computer, and the endpoint at ghs.googlehosted.com. Based on all the evidence, we believe this is definitely happening inside our ISP, Sky Broadband, but the same issue could also occur on other routes. Whether this is the result of a configuration error at Sky, or the result of something they're doing deliberately, and how widespread this problem is, is all unclear.
  3. The server at the destination of the SNI TLS connection, ghs.googlehosted.com, has been configured in such a way that the combination of 1 and 2 is regarded as a terminal error, and the connection is immediately terminated with an 'Illegal parameter' error. Connecting to another SNI SSL server (our own proxy) does not trigger the same error response, but since I can't see the traffic as it arrives at the Google, and I don't have access to any further debug information from the server, I can't determine whether the rejection is valid. Most likely it is, and our own proxy works simply because it has a less strict security setup.
Now with these conclusions established, the only feasible solution I could see was to try running the connection via a dedicated virtual IP address (which costs $39 per month), rather than using the (free) SNI setup. I finally gave in, paid our $39, updated our DNS records, and it worked perfectly, no more connection errors. It's not exactly clear why this works, but I'm actually very happy with this as a solution (other than having to pay), especially since even if Sky fixes their routing error, I can still be confident that if the same problem crops up somewhere else on the Internet we won't be affected.

Bottom line: I would strongly advise anybody reading this to regard the Google Apps SNI SSL service as inherently unreliable. Pay the money and go with a virtual IP if at all possible, because the last thing you want is for some of your customers to be unable to access your app depending on where they are. I also think this advice should be clearly given in the Google documentation.

Now Patrice, if you still want to investigate the specific issue at Sky, then I'm very happy to provide you with additional debug information, once I get into the office tomorrow morning (it's 11pm here now and I'm at home). Can you be more specific about what you want? When you talk about tcpdump output, are you looking for a capture of the network packets? So my Wireshark capture would also be OK?

Cheers
Jon

Jon Travers

unread,
Sep 1, 2015, 6:16:28 PM9/1/15
to Google App Engine
Sorry, in point 2 of my explanation above I should have said the "ClientHello" message, not "ServerHello".

Hugo Visser

unread,
Sep 2, 2015, 7:40:01 AM9/2/15
to Google App Engine
OK, got another one on Sky. I'll ask for the info and will try to reach out to the other ones that have contacted me.

Hugo

Patrice (Cloud Platform Support)

unread,
Sep 2, 2015, 4:56:21 PM9/2/15
to Google App Engine
Hi to you two,

To answer Jon's question : yeah a Wireshark capture should give us enough. That and the "chrome://net-internals" page would be the best things you can send up so we continue investigating on this.

In the meantime, the use of VIP is definitely a workaround this issue, and as you point out, if other ISPs start having the same behavior, the VIP will make sure this is stable throughout changes on those fronts.

Thank you in advance for the information (again, use the "reply privately" button).

Cheers!

Patrice (Cloud Platform Support)

unread,
Sep 3, 2015, 1:51:51 PM9/3/15
to Google App Engine
Hi again to both of you.

Just doing a quick check, as I've heard from another user that was reporting this that apparently Sky made some changes and it now works for them.

Could you confirm if it is fixed on your end as well?

Thanks :)

Jon Travers

unread,
Sep 15, 2015, 9:43:54 AM9/15/15
to Google App Engine
Hi Patrice

My apologies for not replying sooner. I can confirm that SNI connections via Sky to custom domains on App Engine are now working OK with Safari and Chrome. Interestingly, connections from openssl v0.9.8 continue to exhibit the failing behaviour (i.e. they fail on Sky's network, but work just fine from elsewhere). My confidence in SNI on App Engine remains low, so we're sticking with paying for the VIP routing on our most critical apps, but thanks anyway for taking the time to investigate this for us. I don't suppose you'd like to share any more details of what the problem was at Sky? I'd be curious to know.

Thanks again
Jon
Reply all
Reply to author
Forward
0 new messages