Kubernetes api hanging with not default autoscaling nodepool when adding a pod through kubernetes api

Ludovic Havet

unread,

Sep 6, 2018, 11:30:17 AM9/6/18

to Kubernetes user discussion and Q&A

Here is more details on the problem.

The idea behind it is to manually manage pods into the not default nodepool.

The nodepool is created in autoscaling. It is not the default nodepool but an additional one created manually through gcloud api.

When a pod is created in the nodepool and that there is no remaining ressource, autoscaling is acting and a new node is being created.

The problem is that the whole cluster hangs and its api is not accessible as long as the new node is being created for the nodepool. This means during about 1-2 minutes.

It is as if kubernetes was not responding on the api, during one of its not default nodepool autoscaling.

This does not happen when autoscaling is happening on the default nodepool.

I think it is an issue with GKE.

Any help would be appreciated :)

Thanks!

Aleksandra Malinowska

unread,

Sep 7, 2018, 7:47:47 AM9/7/18

to Kubernetes user discussion and Q&A

Hi,

What is the Kubernetes master version of the affected cluster? Does this happen every time when adding a new node, or just for a first node in the new node pool?

One reason Kubernetes API server may be temporarily unavailable is if it's being restarted (this can happen e.g. due to configuration update, auto-upgrade or resizing). Exact behavior depends on version and cluster size (number of nodes), so it's hard to pinpoint the reason without this information.

Thanks,

Aleksandra

Ludovic Havet

unread,

Sep 7, 2018, 12:49:57 PM9/7/18

to Kubernetes user discussion and Q&A

Hi,

Actually it happens on all versions since we started developement.

We are currently on 1.10.6-gke.1, but it has always been like that.

I guess we can see easily in the system logs if kubernetes is being restarted?

Thanks

Ludovic

Aleksandra Malinowska

unread,

Sep 10, 2018, 8:54:02 AM9/10/18

to Kubernetes user discussion and Q&A

Hi,

At this version, master restarts shouldn't be quite as frequent (especially due to autoscaling changes), although they may still happen when clusters size exceeds some threshold.

It's hard to guess what may be causing it in this case without having a look at the cluster - can you open the ticket with GCP support, providing cluster details?

Thanks,
Aleksandra

Reply all

Reply to author

Forward