Slow response times using default Ingress on GKE

6,330 views
Skip to first unread message

Dave Jensen

unread,
Apr 18, 2018, 8:14:28 PM4/18/18
to Kubernetes user discussion and Q&A
We have what I believe to be a very straightforward ingress setup on GKE. However, we started noticing random slowdowns almost immediately. On further investigation it looked like the time to first byte (TTFB) was very slow (1 - 3 seconds) but sporadically. Sometimes it would be a pre-flight OPTIONS request, sometimes an application request, and other times on static files. Even the echoserver would sporadically have a long TTFB.

I set up a portforward to one of the pods serving up our REST API server. Sure enough, the slowdown was eliminated. 

Before I go down the rabbit hole of trying other ingress controllers, I figured I'd ask the community if I was doing something wrong.

ingress.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: brewd-ingress
spec:
  tls:
  - hosts:
    secretName: redacted
  rules:
    http:
      paths:
      - backend:
          serviceName: gateway-service
          servicePort: 7000
    http:
      paths:
      - backend:
          serviceName: web-service
          servicePort: 8080
    http:
      paths:
        - backend:
            serviceName: echoserver
            servicePort: 8080

gateway-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: gateway-service
  labels: 
    app: gateway
spec:
  type: NodePort
  ports:
  - port: 7000
  selector:
    app: gateway
---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: gateway-deployment
spec:
  selector:
    matchLabels:
      app: gateway
  replicas: 1
  template:
    metadata:
      labels:
        app: gateway
    spec:
      containers:
      - name: gateway
        imagePullPolicy: Always
        ports:
        - containerPort: 7000
        env:
        - name: REDACTED_ENV
          value: stage

The web-service yaml looks almost exactly the same as the above.

Dave Jensen

unread,
Apr 21, 2018, 6:19:25 PM4/21/18
to Kubernetes user discussion and Q&A
Is there a community of Kubernetes/GCP users that is more active than this Google Group?

Rodrigo Campos

unread,
Apr 22, 2018, 2:07:15 PM4/22/18
to kubernet...@googlegroups.com
Don't know of. But I don't use gcp, so not sure if I can help.

Have you tried using the service nodeport? You can do that and send traffic to one node, this will be round robin to all pods anyways. To see if it happens too.

You can also try (not sure on gcp) manually configuring a mod balancer with a nodeport service (just set all nodes as backends), or a service type load balancer. And see if it happens if those cases.

That will give a hint if the issue seems to be with the ingress, load balancer or kube-proxy.
--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

fa...@google.com

unread,
Apr 22, 2018, 2:15:59 PM4/22/18
to Kubernetes user discussion and Q&A

Hello Dave,

Since your question is technical, you may get help from community experts and enthusiasts at serverfault.com. Make sure you include proper tags when you ask your question. For example, if using Google Cloud Kubernetes Engine [1] you may tag it as [google-kubernetes-engine], and [kubernetes]; but if using a Kubernetes cluster on Compute Engine, you may tag it as [kubernetes], and [google-compute-engine].

On the other hand, while I was checking the Ingress prerequisites [2], I understood that “Google Kubernetes Engine deploys an ingress controller on the master” for you, and like this tutorial [3] it creates the load balancer. According to the prerequisites, and this github page [4], it is still in beta, and you may check the limitations and expectations [5] with latency on the same page.
That said, and if you suspect a defect with the Ingress Controller/ Google Cloud Load Balancer you may open a report through issue tracker [6]. But after verifying the correct settings with the community, and reviewing the above as not an expected behavior. I hope that helps.

[1] https://console.cloud.google.com/kubernetes
[2] https://kubernetes.io/docs/concepts/services-networking/ingress/#prerequisites
[3] https://cloud.google.com/kubernetes-engine/docs/tutorials/http-balancer
[4] https://github.com/kubernetes/ingress-gce/blob/master/BETA_LIMITATIONS.md#glbc-beta-limitations
[5] https://github.com/kubernetes/ingress-gce/blob/master/BETA_LIMITATIONS.md#latency
[6] https://cloud.google.com/support/docs/issue-trackers

Ahmet Alp Balkan

unread,
Apr 23, 2018, 12:57:26 PM4/23/18
to kubernet...@googlegroups.com
I suspect this issue is rooted in the GCP infrastructure and does not have much to do with the open source part of the system. There's a good #gke community on kubernetes slack channel, however I don't think anyone will be able to help with this particular issue.

I recommend reaching out to the Google Cloud Platform Support. They should be able to help by routing your issue through the right channels.

I also recommend you to come up with a good way to reproduce and measure the TTFB and provide some endpoints that Google folks can reproduce on their end. To measure tail latency etc. You can use tools like Apache AB or https://github.com/rakyll/hey to do send some load and see the tail latencies.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
To post to this group, send email to kubernet...@googlegroups.com.

Dave Jensen

unread,
May 1, 2018, 7:05:00 PM5/1/18
to Kubernetes user discussion and Q&A
The @GCPcloud twitter account said that I would receive a follow-up from support early last week. There has been no response from them. I've decided to file an issue on the github repo for the project.

Dave Jensen

unread,
May 1, 2018, 7:08:48 PM5/1/18
to Kubernetes user discussion and Q&A
Hello fa...,

We followed the tutorial [3] and it technically works but it does not perform to expectations.

Dave

Dave Jensen

unread,
May 11, 2018, 11:47:37 AM5/11/18
to Kubernetes user discussion and Q&A
> Have you tried using the service nodeport? You can do that and send traffic to one node, this will be round robin to all pods anyways. To see if it happens too.

I just tried a portforward to the service (instead of a single pod) and it's blazing fast. 

Either the GCP LB is bad or the ingress-gce is bad. It's just super frustrating because no matter what channel I take, there is no support. I realize this is the Kubernetes mailing list but there is no dedicated GCP mailing list all the GCP docs point here. Given our super simple setup ... by following tutorials, something is wrong.

mars...@upstream.tech

unread,
May 11, 2018, 3:11:18 PM5/11/18
to Kubernetes user discussion and Q&A
Hi Dave, we are experiencing identical conditions with a very similar setup. Please let us know if you learn anything and we will do the same.

Rodrigo Campos

unread,
May 11, 2018, 4:29:08 PM5/11/18
to kubernet...@googlegroups.com
Have you tried what I suggested?

Also, isn't it possible to open a ticket on google cloud support about gke? (Never used Google cloud, don't know if it's free or what)
--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

Kenneth Massada

unread,
May 11, 2018, 4:50:54 PM5/11/18
to Kubernetes user discussion and Q&A
Dave, are you able to use our support center to file a case https://cloud.google.com/support/, if you have already could you share the number with us, I'll make sure we get someone to follow up. 
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
To post to this group, send email to kubernet...@googlegroups.com.

Dave Jensen

unread,
May 11, 2018, 5:16:52 PM5/11/18
to kubernet...@googlegroups.com
I attempted to file an issue but was denied because we're bronze for support. We are a Spark customer but I also cannot find a way to set up the 1:1 Office Hours. 

However, now that there is at least one other person having this issue (see Marshall above), I feel like it would be nice for this to be handled a public forum. It means there is a defect somewhere ... possibly a documentation defect.

Rodrigo, if I'm not mistaken, I essentially accessed the service via NodePort when I portforwarded to the service. I have been trying to setup Contour as a Load Balancer with limited success. 

Manually setting up a load balancer on GCP is uh, seems like busy work that will result in the same setup. When I apply my ingress yaml, a GCP Load Balancer is created for me. I could copy all of those settings but then I'll just have a copy of the same setup.

You received this message because you are subscribed to a topic in the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-users/omg-b8_FcBM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-use...@googlegroups.com.

Rodrigo Campos

unread,
May 12, 2018, 9:39:29 AM5/12/18
to kubernet...@googlegroups.com
For debug sake, I think it's better to confirm than to assume. And there can be differences, for sure. But do what you want, of course :-)


On Friday, May 11, 2018, Dave Jensen <da...@jensen47.com> wrote:
I attempted to file an issue but was denied because we're bronze for support. We are a Spark customer but I also cannot find a way to set up the 1:1 Office Hours. 

However, now that there is at least one other person having this issue (see Marshall above), I feel like it would be nice for this to be handled a public forum. It means there is a defect somewhere ... possibly a documentation defect.

Rodrigo, if I'm not mistaken, I essentially accessed the service via NodePort when I portforwarded to the service. I have been trying to setup Contour as a Load Balancer with limited success. 

Manually setting up a load balancer on GCP is uh, seems like busy work that will result in the same setup. When I apply my ingress yaml, a GCP Load Balancer is created for me. I could copy all of those settings but then I'll just have a copy of the same setup.

To unsubscribe from this group and all its topics, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

Bowei Du

unread,
May 12, 2018, 8:09:53 PM5/12/18
to Kubernetes user discussion and Q&A
Hi Dave,

What version of GKE (master) are you currently using?

Is there a URL that can be tested/probed? (i.e. repro case)?

As a start, it would be useful to post an anonymized `tcpdump` of the HTTP session that exhibits the delay, the output of curl with timing (something like: https://stackoverflow.com/questions/18215389/how-do-i-measure-request-and-response-times-at-once-using-curl), or the output of the tool you are using to measure TTFB.

Bowei

francois...@polynom.io

unread,
May 14, 2018, 5:10:55 PM5/14/18
to Kubernetes user discussion and Q&A
Hi guyz, I have the exact same problem. If I deploy the service as LB, it is blazing fast, but if I use the basic ingress on Google Ingress, I have random latency whith response time ranging from 50ms to 5s.

ama...@upstream.tech

unread,
May 16, 2018, 11:08:26 AM5/16/18
to Kubernetes user discussion and Q&A
Hi Francois,

We are having a similar issue. Could you give more information on how you deployed as a LB? Want to get that setup while we continue looking for a solution to this.

Nicks

unread,
May 16, 2018, 5:41:32 PM5/16/18
to Kubernetes user discussion and Q&A
I created an HTTP LB setup on GCP using a golang HTTP server without kubernetes and was able to see rare long-tail latencies in >1 second. After I set `IdleTimeout` to larger than ten minutes, I stopped seeing those slow responses.  The echoheaders image uses nginx and doesn't set `keepalive_timeout`  (sent PR to update this).  

Dave Jensen

unread,
May 17, 2018, 2:37:28 PM5/17/18
to Kubernetes user discussion and Q&A
Thank you, I think this solved the issue. We set the IdleTimeout (in Golang) to 620s and, in our staging environment, I have not seen a request take longer than 200ms.

Dave Jensen

unread,
May 21, 2018, 6:59:21 PM5/21/18
to Kubernetes user discussion and Q&A
The issue appears to be back. For the past few nights I've been seeing sporadic 4-5 second response times on calls. Again, it's calls like OPTIONS that really stand out.

rmu...@texastribune.org

unread,
May 21, 2018, 10:06:58 PM5/21/18
to Kubernetes user discussion and Q&A
Just wanna note we're seeing this on our end too. Narrowed it down to being the GCE Ingress by testing our deployment behind the older style LoadBalancer service, where we are seeing sub-200ms responses. Also confirmed Cloudflare wasn't the issue by preventing the host from being resolved via SSL and removed DNS from the equation. At least for us all signs point to GCE Ingress being the problem.

I also tried using what was suggested to Dave and adjusted our IdleTimeout/keepalive options and had no changes.

shyam kishore alapati

unread,
May 23, 2018, 10:50:01 AM5/23/18
to kubernet...@googlegroups.com
This looks something to do with google load balancer or the kube-proxy. I setup a google Http load balancer pointing to the NodePort and see the same issue. If I open up the service as LoadBalancer and access it directly I don’t see any issues.

Dave Jensen

unread,
May 29, 2018, 12:41:55 PM5/29/18
to Kubernetes user discussion and Q&A
Despite multiple independent reports of this issue, apparently it's our problem and we're on our own to solve it, or not solve it.

Here's the response I received from Google Support after they contacted the Kubernetes Engineering Team:

> According the stats you have provided, we can confirm that there are some request that are taking a pretty long time. However, we cannot see bottlenecks between client and GCLB (Google Cloud Load Balancer), maybe is worth doing a MTR trace and get some reports on the average latency between the client and GCLB, if the stats looks fine, it can be multiple reasons behind the LB and the routing to the backend service.
>
> As per inspection, the GCLB's latency overall, it does not seem much latency due to GCLB, only one time on May 16th had a spike more than 800ms, the average is over 100ms but no more than 200ms.

I wish I had been given more details on the "multiple reasons behind the LB" but I was not.

Dave Jensen

unread,
May 29, 2018, 12:58:59 PM5/29/18
to Kubernetes user discussion and Q&A
Question for the four or more others having the same problem, what is your stack?

We have two webservers. One is serving static content, which is our single-phase app. The server is built in Golang using the Echo framework, which, IIRC, uses the Golang http.Server behind the scenes. The other server is for API calls from the SPA. It is also built in Golang with Echo, this is the server where we notice the issues. This server also talks gRPC to another server, which talks to Mongodb.

(Again, when we go _around_ the load balancer, they are zero latency issues. Load times are as expected).

rmu...@texastribune.org

unread,
May 29, 2018, 3:02:20 PM5/29/18
to Kubernetes user discussion and Q&A

We ended up moving back to ingress-nginx (https://github.com/kubernetes/ingress-nginx), where the problem evaporated. Wasn't what I wanted, but I had to get things done.

We have Ruby on Rails, Django (Python), and Express-based Node.js apps all running in this cluster, and all three platforms were seeing this identical type of delay.

shyam kishore alapati

unread,
May 29, 2018, 4:34:43 PM5/29/18
to kubernet...@googlegroups.com
We have Nginx reverse proxied to Java Spring boot App. I tried creating ingress directly to both nginx and the backend app. I see the same latencies when I setup an ingress for them. If I set them up as LoadBalancer they work fine without any issues.

Dave Jensen

unread,
Jun 4, 2018, 5:38:34 PM6/4/18
to Kubernetes user discussion and Q&A
I'm getting nowhere with Google Support. Basically, Google Support sends the issue to the Google Kubernetes team, who in turn responds with there was a GCLB spike on a specific day and time and that's all I get. Support goes on to recommend profiling. Given that we have 5 people experiencing this on 3-4 different tech stacks; I'm sure we 5 understand that profiling isn't going to reveal much. So...

I think a Github repo with a reproduceable case would help. I'll have time to do this at the beginning of July but if anybody happens to have time sooner than that, please feel free to post it here. I'll pass it along with my open support case, otherwise, I'll let you all know what happens in July.
Reply all
Reply to author
Forward
0 new messages