Re: [kubernetes/kubernetes] apiserver timeouts after rolling-update of etcd cluster (#47131)

2,498 views
Skip to first unread message

k8s-ci-robot

unread,
Jun 7, 2017, 12:53:25 PM6/7/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

@javipolo: Reiterating the mentions to trigger a notification:
@kubernetes/sig-api-machinery-misc.

In response to this:

@kubernetes/sig-api-machinery-misc I guess :)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Xiang Li

unread,
Jun 7, 2017, 12:59:13 PM6/7/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

@javipolo Can you provide more logging? Probably in 10minutes window? The compacted thing is expected as the connection between etcd and api server was cut off. But it should disappear when etcd is back and resync is finished.

javipolo

unread,
Jun 7, 2017, 1:14:46 PM6/7/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

Yep. before I just provided the ERROR part of the logs. Now I'm putting the INFO part of them also. I leave logs since 16:50 until restart. Around 17:45 was when we did the rolling update of etcd. If you need anything else just ask for it and I'll try to provide it :) (thanks!)

Log file created at: 2017/06/07 16:23:48
Running on machine: kubeapi-05
Binary: Built with gc go1.7.5 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
.
.
.
.
I0607 16:50:59.279231   19698 trace.go:61] Trace "List /api/v1/namespaces/search/events" (started 2017-06-07 16:50:58.733629042 +0200 CEST):
[12.69µs] [12.69µs] About to List from storage
[280.765661ms] [280.752971ms] Listing from storage done
[293.367569ms] [12.601908ms] Self-linking done
[545.53202ms] [252.164451ms] Writing http response done (4300 items)
"List /api/v1/namespaces/search/events" [545.534815ms] [2.795µs] END
E0607 17:45:11.393411   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.393624   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *extensions.Deployment ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:11.448063   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.448149   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.PersistentVolume ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:11.453552   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.453700   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.PodTemplate ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:11.476405   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.476536   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.PersistentVolumeClaim ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:11.476637   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.476735   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.ConfigMap ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:11.492509   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.492594   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.LimitRange ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:11.492611   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.492665   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.ResourceQuota ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:11.492806   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.492862   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.ServiceAccount ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:45:12.448461   19698 cacher.go:125] Terminating all watchers from cacher *api.PersistentVolume
W0607 17:45:12.476786   19698 cacher.go:125] Terminating all watchers from cacher *api.PersistentVolumeClaim
W0607 17:45:12.492813   19698 cacher.go:125] Terminating all watchers from cacher *api.ResourceQuota
W0607 17:45:12.492860   19698 cacher.go:125] Terminating all watchers from cacher *api.LimitRange
W0607 17:45:12.493052   19698 cacher.go:125] Terminating all watchers from cacher *api.ServiceAccount
W0607 17:45:12.499419   19698 reflector.go:323] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: watch of *api.ServiceAccount ended with: too old resource version: 8902015 (8925724)
W0607 17:45:12.499526   19698 reflector.go:323] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: watch of *api.ResourceQuota ended with: too old resource version: 8902015 (8925724)
W0607 17:45:12.499581   19698 reflector.go:323] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: watch of *api.LimitRange ended with: too old resource version: 8902015 (8925724)
E0607 17:45:52.439496   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0607 17:45:52.439634   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0607 17:45:52.440143   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:52.441236   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *batch.Job ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:45:52.441066   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *storage.StorageClass ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:45:52.441336   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.Secret ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:52.456094   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:52.457264   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.Service ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:45:53.441577   19698 cacher.go:125] Terminating all watchers from cacher *api.Secret
W0607 17:45:53.452317   19698 reflector.go:323] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: watch of *api.Secret ended with: too old resource version: 8902015 (8925793)
W0607 17:45:53.457554   19698 cacher.go:125] Terminating all watchers from cacher *api.Service
E0607 17:47:36.116540   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:47:36.116631   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.ReplicationController ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:47:36.141317   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:47:36.141439   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *extensions.Ingress ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:47:36.141463   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0607 17:47:36.141467   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:47:36.141521   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *apps.StatefulSet ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:47:36.141531   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *storage.StorageClass ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:47:37.116969   19698 cacher.go:125] Terminating all watchers from cacher *api.ReplicationController
W0607 17:47:37.141672   19698 cacher.go:125] Terminating all watchers from cacher *apps.StatefulSet
E0607 17:48:26.488736   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.489735   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.Namespace ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.501637   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.501656   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.501754   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *rbac.ClusterRoleBinding ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.501776   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *rbac.ClusterRole ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.502128   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.502178   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *rbac.RoleBinding ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.502311   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.502357   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *policy.PodDisruptionBudget ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.502602   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.502654   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *rbac.Role ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.560101   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.560191   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *autoscaling.HorizontalPodAutoscaler ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.560404   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.560563   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.ReplicationController ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.560592   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.560623   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *batch.CronJob ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.560625   19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.560717   19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *certificates.CertificateSigningRequest ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:48:27.489933   19698 cacher.go:125] Terminating all watchers from cacher *api.Namespace
W0607 17:48:27.498083   19698 reflector.go:323] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: watch of *api.Namespace ended with: too old resource version: 8902015 (8926322)
W0607 17:49:32.064253   19698 storage_extensions.go:127] third party resource sync failed: the server cannot complete the requested operation at this time, try again later (get thirdpartyresources.extensions)
W0607 17:50:42.066257   19698 storage_extensions.go:127] third party resource sync failed: the server cannot complete the requested operation at this time, try again later (get thirdpartyresources.extensions)
I0607 17:51:14.583499   19698 trace.go:61] Trace "Create /apis/extensions/v1beta1/namespaces/search/daemonsets" (started 2017-06-07 17:50:44.581144781 +0200 CEST):
[1.332617ms] [1.332617ms] About to convert to expected version
[1.608209ms] [275.592µs] Conversion done
[1.644553ms] [36.344µs] About to store object in database
"Create /apis/extensions/v1beta1/namespaces/search/daemonsets" [30.00229701s] [30.000652457s] END
W0607 17:51:52.068431   19698 storage_extensions.go:127] third party resource sync failed: the server cannot complete the requested operation at this time, try again later (get thirdpartyresources.extensions)

Xiang Li

unread,
Jun 7, 2017, 1:18:14 PM6/7/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

@javipolo

When did you restart API server? Are there more logs like etcdserver: mvcc: required revision has been compacted after 17:51? I want to know if API server simply prints out these errors forever or it will stop after you upgrade ALL your etcd members.

Davanum Srinivas

unread,
Jun 7, 2017, 1:21:06 PM6/7/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

/cc @jbeda

javipolo

unread,
Jun 7, 2017, 1:23:55 PM6/7/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

The logs I provided end up when restarting the apiserver, so no, there are no more of those errors after 17:51

I can provide the restart times of etcds and apiservers (date/time of the host are the same in every server)

etcd restart times

   Active: active (running) since Wed 2017-06-07 17:45:12 CEST; 1h 33min ago
   Active: active (running) since Wed 2017-06-07 17:48:27 CEST; 1h 30min ago
   Active: active (running) since Wed 2017-06-07 17:45:53 CEST; 1h 33min ago
   Active: active (running) since Wed 2017-06-07 17:46:43 CEST; 1h 32min ago
   Active: active (running) since Wed 2017-06-07 17:47:37 CEST; 1h 31min ago

apiserver restart times:

   Active: active (running) since Wed 2017-06-07 17:52:47 CEST; 1h 27min ago
   Active: active (running) since Wed 2017-06-07 17:52:47 CEST; 1h 27min ago
   Active: active (running) since Wed 2017-06-07 17:52:48 CEST; 1h 27min ago
   Active: active (running) since Wed 2017-06-07 17:52:48 CEST; 1h 27min ago
   Active: active (running) since Wed 2017-06-07 17:52:49 CEST; 1h 27min ago

Yichen Wong

unread,
Jun 8, 2017, 3:15:15 AM6/8/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

I have encountered the same problem, this problem has been troubled me.

kubernetes version: v1.6.4
etcd version: 3.1.8

etcd status:

[root@ip-10-255-2-76 kubernetes]# etcdctl \
> --ca-file=/etc/kubernetes/ssl/ca.pem \
> --cert-file=/etc/kubernetes/ssl/kubernetes.pem \
> --key-file=/etc/kubernetes/ssl/kubernetes-key.pem \
> cluster-health
2017-06-08 15:09:07.272311 I | warning: ignoring ServerName for user-provided CA for 
backwards compatibility is deprecated
2017-06-08 15:09:07.273036 I | warning: ignoring ServerName for user-provided CA for    
backwards compatibility is deprecated 
member 1ccf784fffd9bd65 is healthy: got healthy result from https://10.255.2.174:2379
member 81eb13a8d04dcfee is healthy: got healthy result from https://10.255.2.76:2379
member a05de482e89e5998 is healthy: got healthy result from https://10.255.2.147:2379
cluster is healthy

show logs: kube-aiserver

[root@ip-10-255-2-76 kubernetes]# tail -500f /data/logs/kubernetes/kube-apiserver.INFO
	Log file created at: 2017/06/08 14:32:51
	Running on machine: ip-10-255-2-76
	Binary: Built with gc go1.7.5 for linux/amd64
	Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
	I0608 14:32:51.608111    9597 aws.go:762] Building AWS cloudprovider
	I0608 14:32:51.608413    9597 aws.go:725] Zone not specified in configuration file; querying AWS metadata service
	I0608 14:32:51.776140    9597 tags.go:76] AWS cloud filtering on ClusterID: kubernetes
	E0608 14:32:52.320857    9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.ClusterRole: Get https://10.255.2.76:6443/apis/rbac.authorization.k8s.io/v1beta1/clusterroles?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
	E0608 14:32:52.321118    9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Secret: Get https://10.255.2.76:6443/api/v1/secrets?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
	E0608 14:32:52.321172    9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *storage.StorageClass: Get https://10.255.2.76:6443/apis/storage.k8s.io/v1beta1/storageclasses?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
	E0608 14:32:52.321214    9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ServiceAccount: Get https://10.255.2.76:6443/api/v1/serviceaccounts?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
	E0608 14:32:52.321249    9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.LimitRange: Get https://10.255.2.76:6443/api/v1/limitranges?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
	E0608 14:32:52.321279    9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Namespace: Get https://10.255.2.76:6443/api/v1/namespaces?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
	E0608 14:32:52.321317    9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.RoleBinding: Get https://10.255.2.76:6443/apis/rbac.authorization.k8s.io/v1beta1/rolebindings?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
	E0608 14:32:52.321352    9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.Role: Get https://10.255.2.76:6443/apis/rbac.authorization.k8s.io/v1beta1/roles?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
	E0608 14:32:52.321384    9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ResourceQuota: Get https://10.255.2.76:6443/api/v1/resourcequotas?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
	E0608 14:32:52.367614    9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.ClusterRoleBinding: Get https://10.255.2.76:6443/apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
	I0608 14:32:52.650280    9597 serve.go:79] Serving securely on 10.255.2.76:6443
	I0608 14:32:52.650325    9597 serve.go:94] Serving insecurely on 10.255.2.76:8080
	I0608 14:32:53.671016    9597 trace.go:61] Trace "GuaranteedUpdate etcd3: *api.ConfigMap" (started 2017-06-08 14:32:53.098384831 +0800 CST):
	[32.359241ms] [32.359241ms] initial value restored
	[555.829144ms] [523.469903ms] Transaction prepared
	[572.583965ms] [16.754821ms] Transaction committed
	"GuaranteedUpdate etcd3: *api.ConfigMap" [572.599509ms] [15.544µs] END
	I0608 14:32:53.671102    9597 trace.go:61] Trace "Update /api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication" (started 2017-06-08 14:32:53.098317621 +0800 CST):
	[22.218µs] [22.218µs] About to convert to expected version
	[39.51µs] [17.292µs] Conversion done
	[42.677µs] [3.167µs] About to store object in database
	[572.712537ms] [572.66986ms] Object stored in database
	[572.716218ms] [3.681µs] Self-link added
	"Update /api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication" [572.763593ms] [47.375µs] END
	I0608 14:32:53.891199    9597 storage_rbac.go:227] created role.rbac.authorization.k8s.io/system:controller:bootstrap-signer in kube-system
	I0608 14:32:53.925799    9597 storage_rbac.go:227] created role.rbac.authorization.k8s.io/system:controller:token-cleaner in kube-system
	I0608 14:32:53.973401    9597 storage_rbac.go:227] created role.rbac.authorization.k8s.io/system:controller:bootstrap-signer in kube-public
	I0608 14:32:54.011158    9597 storage_rbac.go:257] created rolebinding.rbac.authorization.k8s.io/system:controller:bootstrap-signer in kube-system
	I0608 14:32:54.040216    9597 storage_rbac.go:257] created rolebinding.rbac.authorization.k8s.io/system:controller:token-cleaner in kube-system
	I0608 14:32:54.104971    9597 storage_rbac.go:257] created rolebinding.rbac.authorization.k8s.io/system:controller:bootstrap-signer in kube-public
	I0608 14:42:51.937522    9597 compact.go:159] etcd: compacted rev (404), endpoints ([https://10.255.2.147:2379 https://10.255.2.76:2379 https://10.255.2.174:2379])
	I0608 14:47:51.945296    9597 compact.go:159] etcd: compacted rev (709), endpoints ([https://10.255.2.147:2379 https://10.255.2.76:2379 https://10.255.2.174:2379])
	E0608 14:49:58.232233    9597 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
	I0608 14:52:51.953504    9597 compact.go:159] etcd: compacted rev (1012), endpoints ([https://10.255.2.147:2379 https://10.255.2.76:2379 https://10.255.2.174:2379])
	I0608 14:57:51.962993    9597 compact.go:159] etcd: compacted rev (1311), endpoints ([https://10.255.2.147:2379 https://10.255.2.76:2379 https://10.255.2.174:2379])
	E0608 14:58:13.284420    9597 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
	I0608 15:02:51.971159    9597 compact.go:159] etcd: compacted rev (1612), endpoints ([https://10.255.2.147:2379 https://10.255.2.76:2379 https://10.255.2.174:2379])
	I0608 15:07:51.979360    9597 compact.go:159] etcd: compacted rev (1911), endpoints ([https://10.255.2.147:2379 https://10.255.2.76:2379 https://10.255.2.174:2379])
	E0608 15:07:58.354849    9597 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted  

ricl

unread,
Jul 3, 2017, 11:11:10 PM7/3/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

+1

Sartner

unread,
Jul 20, 2017, 7:19:23 PM7/20/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

the same problem
etcd 3.1.7
k8s: 1.6.7

André Cruz

unread,
Jul 26, 2017, 9:43:45 AM7/26/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

I also have the same messages on the apiserver log, however it's rare that I get timeouts on operations invoked on it.

K8s: 1.6.2
etcd: 3.1.10 && 3.0.17 (these messages appeared before I upgraded to 3.1.10 and are still there)

Timothy St. Clair

unread,
Jul 26, 2017, 9:51:09 AM7/26/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

Hongchao Deng

unread,
Jul 26, 2017, 12:18:41 PM7/26/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

OK. So this issue already exists. I can share my story as well.

Basically, we were doing some etcd scale up and down stress testing. We saw that APIserver hanged and some resources unusable. I tried a dozen times and they were whimsical:

  • Mostly you could still read, but write didn't work.
  • Not all resources hang. Not the same resources hang every time. Sometimes it is endpoints, sometimes it is services, sometimes it is pods, third party resources, etc.

I have tried to reproduce with scratch apiserver:

--etcd-servers=http://${lb_service_ip}:2379 --service-cluster-ip-range=10.3.0.0/24

Haven't reproduced yet. So I doubt it needs some data.

Note that in my case it happened only on scaling down from 3 to 1. Scaling down from 2 to 1 wouldn't happen. I also tried to scale down slowly, waiting 60s before removing the second one, and it wouldn't happen.

Xigang Wang

unread,
Aug 8, 2017, 11:24:40 PM8/8/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

@xiang90
the same problem.

etcd version:

etcd Version: 3.1.0
Git SHA: 8ba2897
Go Version: go1.7.4
Go OS/Arch: linux/amd64

kubernetes version:

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-29T23:15:59Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-29T22:55:19Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

show kube-apiserver logs:

Aug 09 10:50:56 kube-apiserver[62362]: E0809 10:50:56.548757   62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:38 kube-apiserver[62362]: E0809 10:51:38.087021   62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.890132   62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.890135   62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.890136   62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.890191   62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.915202   62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.915298   62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.915311   62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.915315   62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted

thx

Reinholds Zviedris

unread,
Aug 14, 2017, 10:45:43 AM8/14/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

Similar problem (see log below).

In short - installing new K8S master node based on this guide: https://coreos.com/kubernetes/docs/latest/getting-started.html - when trying to start kubelet service on it - everything starts up, but the apiserver all the time crashes (?). Due to that worker node can't register it self as well master node isn't fully working.

etcd version: 3.2.5
kubernetes version 1.6.1_coreos.0

E0814 14:32:14.665943 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ResourceQuota: Get https://localhost:443/api/v1/resourcequotas?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0814 14:32:14.665943 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Secret: Get https://localhost:443/api/v1/secrets?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0814 14:32:14.665943 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Namespace: Get https://localhost:443/api/v1/namespaces?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0814 14:32:14.665947 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.LimitRange: Get https://localhost:443/api/v1/limitranges?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0814 14:32:14.665950 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *storage.StorageClass: Get https://localhost:443/apis/storage.k8s.io/v1beta1/storageclasses?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0814 14:32:14.665950 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ServiceAccount: Get https://localhost:443/api/v1/serviceaccounts?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused [restful] 2017/08/14 14:32:14 log.go:30: [restful/swagger] listing is available at https://192.168.178.155:443/swaggerapi/ [restful] 2017/08/14 14:32:14 log.go:30: [restful/swagger] https://192.168.178.155:443/swaggerui/ is mapped to folder /swagger-ui/ I0814 14:32:14.740919 1 serve.go:79] Serving securely on 0.0.0.0:443 I0814 14:32:14.741036 1 serve.go:94] Serving insecurely on 127.0.0.1:8080 I0814 14:32:25.189208 1 trace.go:61] Trace "Create /api/v1/namespaces/kube-system/pods" (started 2017-08-14 14:32:15.188789893 +0000 UTC): [25.862µs] [25.862µs] About to convert to expected version [138.138µs] [112.276µs] Conversion done "Create /api/v1/namespaces/kube-system/pods" [10.00036266s] [10.000224522s] END I0814 14:32:29.307733 1 trace.go:61] Trace "Create /api/v1/namespaces/kube-system/pods" (started 2017-08-14 14:32:19.307339406 +0000 UTC): [50.67µs] [50.67µs] About to convert to expected version [178.618µs] [127.948µs] Conversion done "Create /api/v1/namespaces/kube-system/pods" [10.000361s] [10.000182382s] END I0814 14:32:43.639855 1 trace.go:61] Trace "Create /api/v1/namespaces/kube-system/pods" (started 2017-08-14 14:32:33.639564997 +0000 UTC): [27.653µs] [27.653µs] About to convert to expected version [109.672µs] [82.019µs] Conversion done "Create /api/v1/namespaces/kube-system/pods" [10.000262566s] [10.000152894s] END

Reinholds Zviedris

unread,
Aug 16, 2017, 11:35:37 AM8/16/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

Update to the situation - I solved my issue. Turns out that if you are running etcd3 cluster with TLS then in kube-apiserver.yaml file you should add cert, key and CA file locations - otherwise API server isn't able to connect and store information in etcd. That I found out only by adding --v=4 for verbose logging in kube-apiserver.yaml and then I discovered that all the URLs that kube-apiserver tries to access has 403 error code.

abel-von

unread,
Aug 22, 2017, 9:38:01 PM8/22/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

Any progress about this issue? we have also encountered this, and even worse is that some updating operation failed. the log is:

‘’‘
E0812 19:02:48.471971 24 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
E0812 19:02:48.536491 24 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0812 19:02:48.536547 24 reflector.go:323] vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *paas.Stack ended with: etcdserver: mvcc: required revision has been compacted
’‘’

this happens twice in our environment, and we did't find the reproducing condition.

tbchj

unread,
Aug 29, 2017, 2:21:36 AM8/29/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

also have the same question. any solution ? I checked log find many line like:


E0812 19:02:48.536491 24 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted

k8s version: 1.7.2
etcd version: 3.0.15

and also have a strange queation.
I created some pod, but etcd record nothing except network config record.
use "etcdctl ls /" ,I didn't see any records about the pod I created.
My colleagues are also experiencing this problem, and didn't resolve it.
but when I changed k8s version to 1.3.10, etcd has the pod record existed (use "etcdctl ls /"). why ?
need help. waiting...

Jordan Liggitt

unread,
Aug 29, 2017, 11:49:58 AM8/29/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

etcdctl ls / uses the etcd v2 API, which is not what kubernetes 1.6+ uses by default

ETCDCTL_API=3 etcdctl get --prefix / uses the etcd v3 API, which will show you your kubernetes data

tbchj

unread,
Aug 30, 2017, 3:00:42 AM8/30/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

@liggitt thank you. it works.
the same question like #43893

tbchj

unread,
Aug 30, 2017, 9:58:59 PM8/30/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

by the way, after I change apiserver parameter "storage_backend" value to etcd2, the error "E0812 19:02:48.536491 24 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted" disappeared.
I guess if there has some relation between them.

Jordan Liggitt

unread,
Aug 30, 2017, 10:03:56 PM8/30/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

event compaction is perfectly normal, and only exists in etcd v3 api. attempting to watch from a compacted version returns a 410 error, which API clients should respond to by doing a full relist to get the current resourceVersion.

Kubernetes Submit Queue

unread,
Sep 1, 2017, 2:15:39 PM9/1/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

[MILESTONENOTIFIER] Milestone Labels Incomplete

@hongchaodeng @javipolo @xiang90

Action required: This issue requires label changes. If the required changes are not made within 6 days, the issue will be moved out of the v1.8 milestone.

kind: Must specify at most one of ['kind/bug', 'kind/feature', 'kind/cleanup'].
priority: Must specify at most one of ['priority/critical-urgent', 'priority/important-soon', 'priority/important-longterm'].

Additional instructions available here

Kubernetes Submit Queue

unread,
Sep 3, 2017, 4:08:42 AM9/3/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

[MILESTONENOTIFIER] Milestone Labels Incomplete

@hongchaodeng @javipolo @xiang90

Action required: This issue requires label changes. If the required changes are not made within 5 days, the issue will be moved out of the v1.8 milestone.

Kubernetes Submit Queue

unread,
Sep 4, 2017, 4:16:29 AM9/4/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

[MILESTONENOTIFIER] Milestone Labels Incomplete

@hongchaodeng @javipolo @xiang90

Action required: This issue requires label changes. If the required changes are not made within 4 days, the issue will be moved out of the v1.8 milestone.

javipolo

unread,
Sep 4, 2017, 11:41:33 AM9/4/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

@xiang90 I cannot label this issue, but the correct labels should be 'kind/bug' and 'priority/important-soon'

Can you label the issue or tell me how should I do it?
Thanks

Xiang Li

unread,
Sep 4, 2017, 12:21:38 PM9/4/17
to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

@javipolo @liggitt

There are two different issues mentioned in this thread:

  1. timeout
  1. mvcc: required revision has been compacted

    2 is normal. We should change API server error logging to make it a warning or even a debug logging.

    Can someone provide a way to reproduce of the timeout issue? I have not seen this behavior in our clusters.

    javipolo

    unread,
    Sep 4, 2017, 12:35:16 PM9/4/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    @xiang90 in my case by just restarting etcd servers I can randomly trigger the issue ....

    Kubernetes Submit Queue

    unread,
    Sep 5, 2017, 4:10:45 AM9/5/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    [MILESTONENOTIFIER] Milestone Labels Incomplete

    @hongchaodeng @javipolo @xiang90

    Action required: This issue requires label changes. If the required changes are not made within 3 days, the issue will be moved out of the v1.8 milestone.

    kind: Must specify at most one of ['kind/bug', 'kind/feature', 'kind/cleanup'].
    priority: Must specify at most one of ['priority/critical-urgent', 'priority/important-soon', 'priority/important-longterm'].

    Additional instructions available here

    javipolo

    unread,
    Sep 5, 2017, 10:46:35 AM9/5/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    /kind bug
    /priority important-soon

    Kubernetes Submit Queue

    unread,
    Sep 5, 2017, 10:47:34 AM9/5/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    [MILESTONENOTIFIER] Milestone Labels Complete

    @hongchaodeng @javipolo @xiang90

    Issue label settings:

    • sig/api-machinery: Issue will be escalated to these SIGs if needed.
    • priority/important-soon: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
    • kind/bug: Fixes a bug discovered during the current release.
    Additional instructions available here

    Kubernetes Submit Queue

    unread,
    Sep 7, 2017, 2:10:41 PM9/7/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    [MILESTONENOTIFIER] Milestone Labels Complete

    @hongchaodeng @javipolo @xiang90

    Issue label settings:

    • sig/api-machinery: Issue will be escalated to these SIGs if needed.
    • priority/important-soon: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
    • kind/bug: Fixes a bug discovered during the current release.
    Additional instructions available here The commands available for adding these labels are documented here

    Davanum Srinivas

    unread,
    Sep 18, 2017, 1:26:44 PM9/18/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    @xiang90 @hongchaodeng @javipolo So is there any PR for this? if not can we move it out of 1.8 milestone please?

    Mehdy Bohlool

    unread,
    Sep 19, 2017, 1:44:00 PM9/19/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    @xiang90 @hongchaodeng @javipolo this is a daily and friendly ping from release team. Is this still relevant to 1.8? If yes, who is actively working on it?

    Xiang Li

    unread,
    Sep 19, 2017, 1:48:45 PM9/19/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    we should move this out of 1.8. there is no existing pr for it.

    Mehdy Bohlool

    unread,
    Sep 19, 2017, 1:49:46 PM9/19/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    @xiang90 thanks. Removing the milestone.

    Daniel Smith

    unread,
    Sep 19, 2017, 1:49:59 PM9/19/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    It's not clear that there's any actual problem here.

    Daniel Smith

    unread,
    Sep 19, 2017, 1:56:01 PM9/19/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    I take that back. OP seems to have a problem. Note that we haven't attempted running Kubernetes against etcd 3.1.x or up. Some of the other commenters on this issue may have hit the fact that the etcd backend version changed to 3 from 2.

    cc @jpbetz

    Xiang Li

    unread,
    Sep 19, 2017, 3:57:32 PM9/19/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    @lavalamp @jpbetz

    the original questions is about the logging line

    E0607 17:45:11.447234     367 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
    

    I explained that this is probably just normal. And the compacted problem is recoverable automatically, thus it requires no human interaction. It does not worth an error level.

    There are other etcd potential hanging problems raised in this thread. I did not response them here (or take a close look) since it is too noise.

    Joe Betz

    unread,
    Sep 19, 2017, 5:34:02 PM9/19/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    If I'm reading this right, watch chan error: etcdserver: mvcc: required might be a red herring and there might be a variety of unrelated issues being attributed to it here.

    And, as @zxpower observed, when upgrading to etcd 3 with TLS, if -v=4 verbose logging is not enabled in kube-apiserver.yaml, it can be difficult to diagnose the problem correctly.

    How about we update this issues title/description to better guide developers that believe their issue is the same as this one to first check a few things, e.g.: "If you believe you've encountered this issue, note that watch chan error: etcdserver: mvcc: required can be logged for a variety of unrelated reasons and might be a red herring. Please first check the connectivity between the kube-apiserver and etcd, enabling -v=4 logging in the kube-apiserver.yaml. Be sure to include your kubernets and etcd versions and the steps one would need to take to reproduce the issue."

    ?

    Maxim Ivanov

    unread,
    Oct 20, 2017, 6:18:38 AM10/20/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    I suspect it is a deadlock in Etcd client kubernetes Apiserver is shipped with. We see requests to some or all apiservers hanging after Etcd cluster roll.

    When APIServer hangs we see number of requests to Etcd piling up:

    $ curl https://apiserver/debug/pprof/goroutine?debug=2 | grep -F 'registry.(*Store).'  | wc -l
    499
    

    Problem is made worse by the fact that since 1.7.x, apiserver checks serviceaccount tokens in Etcd, so even if a request would end up being served from cache, mandatory round trip to Etcd makes it timeout and fail.

    Kubernetes 1.7.x ships etcd client version 3.1.5, there were numerous improvements on a client side since then, some of them were addressing deadlocks in balancer code which is very much in use when etcd servers are rolled and change their availability status.

    Here are some relevant issues:
    coreos/etcd#7842
    coreos/etcd#8711
    coreos/etcd#8710

    @liggitt , how feasible it is to update bundled etcd client version in Kube 1.7.x and 1.8.x to >= 3.2.9 ?

    Timothy St. Clair

    unread,
    Oct 20, 2017, 10:27:44 AM10/20/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    I could rev the client to 3.2.x, but the server side of upstream has to stay at 3.1.10 due to supportable upgrade path.

    @hongchaodeng @xiang90 - any known issues with client skew testing?

    If we want to do this, we need to do it now.
    /cc @kubernetes/sig-cluster-lifecycle-misc

    Lucas Käldström

    unread,
    Oct 20, 2017, 3:42:27 PM10/20/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    Clayton Coleman

    unread,
    Oct 21, 2017, 5:44:22 PM10/21/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention
    Openshift 3.6 and 3.7 run against both 3.1 and 3.2 using the 3.2 client.

    The only issue we've seen are the load balancer rollover and hanging
    watches against 3.1 servers when more than two thousand watches are in
    flight for a single apiserver.

    On Oct 20, 2017, at 3:41 PM, Lucas Käldström <notifi...@github.com>
    wrote:

    cc @kubernetes/sig-scalability-feature-requests
    <https://github.com/orgs/kubernetes/teams/sig-scalability-feature-requests>



    You are receiving this because you are on a team that was mentioned.
    Reply to this email directly, view it on GitHub
    <https://github.com/kubernetes/kubernetes/issues/47131#issuecomment-338304010>,
    or mute the thread
    <https://github.com/notifications/unsubscribe-auth/ABG_p1bClYMiAV3JLlqx7E990C5CX_Ukks5suPd1gaJpZM4Ny_Ke>
    .

    Xiang Li

    unread,
    Oct 24, 2017, 1:12:13 AM10/24/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    The only issue we've seen are the load balancer rollover and hanging
    watches against 3.1 servers when more than two thousand watches are in
    flight for a single apiserver.

    I believe there are some breakages in either etcd client or gRPC (the version we vendored) or both. We are actively testing and fixing this in etcd master. We will probably backport the fix to 3.2 client.

    Xiang Li

    unread,
    Oct 24, 2017, 2:55:16 AM10/24/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    @jpbetz

    It would be great if we can get things coreos/etcd#8711 done to rule out etcd issues.

    Joe Betz

    unread,
    Oct 24, 2017, 4:09:58 PM10/24/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    cc @wojtek-t

    Oliver Beattie

    unread,
    Oct 29, 2017, 10:33:33 AM10/29/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    Not to add to the noise, but we've encountered this issue in production, and it ended up leading to a complete cluster outage of all ≈300 nodes (through a very unfortunate series of events.)

    I have gathered all the relevant logs from our 3 k8s master and 9 etcd nodes. There may not be anything of additional interest there, but if you would like to see them please let me know and I can share them privately.

    Timothy St. Clair

    unread,
    Oct 30, 2017, 11:06:39 PM10/30/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    @obeattie I'm sooo sorry. I'm updating the client tomorrow, and going to poke folks about getting the next rev in line for release.
    /cc @luxas @roberthbailey @jbeda

    Davanum Srinivas

    unread,
    Oct 31, 2017, 7:00:01 AM10/31/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    Clayton Coleman

    unread,
    Nov 1, 2017, 11:59:19 PM11/1/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    I think this error message:

    E0607 17:45:11.447234 367 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted

    Is actually caused by the etcd3 storage watch code in kubernetes not properly handling the expiration of the revision and restarting the watch from the RV reported by the error. We should be able to transparently restart the watch without logging that error. It is not the hanging error described later in the thread.

    Clayton Coleman

    unread,
    Nov 2, 2017, 12:01:24 AM11/2/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    Spawned #54974

    Mike Metral

    unread,
    Nov 4, 2017, 11:16:05 PM11/4/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    I'm hitting this error on 1.7.5 and etcd 3.1.8 but not due to an etcd rollout, it simply just started occurring on my cluster to the point that its affecting usability. Are there any known work arounds?

    Pierre Fersing

    unread,
    Nov 5, 2017, 6:57:07 AM11/5/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    I'm testing a workaround that seem to work so far for the etcd client that hang: tell kube-apiserver to use localhost as etcd cluster endpoints instead of the 3 etcd servers. It works for me because i've 3 master nodes each with one kube-apiserver and one etcd server.

    Timothy St. Clair

    unread,
    Nov 8, 2017, 12:34:07 PM11/8/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    PSA - we are waiting on a 3.2.10 client update.

    Joe Betz

    unread,
    Nov 16, 2017, 5:08:34 PM11/16/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    3.2.10 is available

    Lucas Käldström

    unread,
    Nov 17, 2017, 6:19:55 PM11/17/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    From what I understand, what will happen now is that we'll update

    • the built-in etcd Go client from 3.1.x at HEAD to 3.2.10
    • gRPC from v1.3.0 to v1.7.3, which will require significant regens of autogenerated files and might have other side-effects

    Then we'll figure out a 3.1.x server version to recommend for v1.9. Currently we're at 3.1.10 at HEAD

    cc @timothysc -- right?

    Kubernetes Submit Queue

    unread,
    Nov 17, 2017, 6:20:11 PM11/17/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    [MILESTONENOTIFIER] Milestone Issue Labels Incomplete

    @hongchaodeng @javipolo @xiang90

    Action required: This issue requires label changes. If the required changes are not made within 2 days, the issue will be moved out of the v1.9 milestone.

    kind: Must specify exactly one of kind/bug, kind/cleanup or kind/feature.

    Help

    Kubernetes Submit Queue

    unread,
    Nov 17, 2017, 6:22:00 PM11/17/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    [MILESTONENOTIFIER] Milestone Issue Current

    @hongchaodeng @javipolo @xiang90

    Issue Labels
    • sig/api-machinery sig/cluster-lifecycle sig/scalability: Issue will be escalated to these SIGs if needed.
    • priority/important-soon: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
    • kind/bug: Fixes a bug discovered during the current release.

    Lucas Käldström

    unread,
    Nov 17, 2017, 6:22:20 PM11/17/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    cc @kubernetes/kubernetes-release-managers as this is most probably a show-stopper for v1.9; we will most probably consider blocking the release on fixing this issue.

    Joe Betz

    unread,
    Nov 17, 2017, 6:24:17 PM11/17/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    @luxas That's the basic set of steps planned, yes. Note that we're planning to merge the work to the k8s 1.10 branch, and then cherry pick to a 1.9.x release once it proves stable, we're not targeting the initial 1.9 release.

    I'll be picking up the engineering work on most of this. I'll link to a issue explaining in more detail soon.

    Anthony Yeh

    unread,
    Nov 17, 2017, 6:45:25 PM11/17/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    @jpbetz Thanks for clarifying. Sounds like fun for @mbohlool when he gets back from vacation.

    Kubernetes Submit Queue

    unread,
    Nov 20, 2017, 6:55:29 PM11/20/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    [MILESTONENOTIFIER] Milestone Issue Needs Attention

    @hongchaodeng @javipolo @xiang90 @kubernetes/sig-api-machinery-misc @kubernetes/sig-cluster-lifecycle-misc @kubernetes/sig-scalability-misc

    Action required: During code slush, issues in the milestone should be in progress.
    If this issue is not being actively worked on, please remove it from the milestone.
    If it is being worked on, please add the status/in-progress label so it can be tracked with other in-flight issues.

    Note: If this issue is not resolved or labeled as priority/critical-urgent by Wed, Nov 22 it will be moved out of the v1.9 milestone.

    Issue Labels
    • sig/api-machinery sig/cluster-lifecycle sig/scalability: Issue will be escalated to these SIGs if needed.
    • priority/important-soon: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
    • kind/bug: Fixes a bug discovered during the current release.
    Help

    Kubernetes Submit Queue

    unread,
    Nov 22, 2017, 3:12:09 AM11/22/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    Kubernetes Submit Queue

    unread,
    Nov 22, 2017, 6:28:44 PM11/22/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    [MILESTONENOTIFIER] Milestone Removed From Issue

    Important: Code freeze is in effect and only issues with priority/critical-urgent may remain in the v1.9 milestone.

    jianglingxia

    unread,
    Nov 23, 2017, 3:54:42 AM11/23/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    also have the same question,I want to know that the cause of the problem is what? @xiang90
    k8s version is 1.6.9 thanks
    the log in api-server is like this
    I1123 03:11:02.243886 1185 trace.go:61] Trace "Update /api/v1/namespaces/admin/pods/f1a32f2d-a43a-4a74-ac35-a4936fb070be-r2qsd/status" (started 2017-11-23 03:11:01.710184949 +0800 CST): [69.492µs] [69.492µs] About to convert to expected version [384.168µs] [314.676µs] Conversion done [390.729µs] [6.561µs] About to store object in database [533.50445ms] [533.113721ms] Object stored in database [533.511425ms] [6.975µs] Self-link added "Update /api/v1/namespaces/admin/pods/f1a32f2d-a43a-4a74-ac35-a4936fb070be-r2qsd/status" [533.65373ms] [142.305µs] END E1123 03:11:14.142108 1185 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted I1123 03:13:58.495456 1185 trace.go:61] Trace "List /api/v1/events" (started 2017-11-23 03:13:57.985830247 +0800 CST): [9.582µs] [9.582µs] About to List from storage [190.568834ms] [190.559252ms] Listing from storage done [276.924895ms] [86.356061ms] Self-linking done [509.573188ms] [232.648293ms] Writing http response done (5721 items) "List /api/v1/events" [509.574659ms] [1.471µs] END I1123 03:14:09.477836 1185 compact.go:159] etcd: compacted rev (6740640), endpoints ([http://90.169.0.165:2379]) I1123 03:19:09.546562 1185 compact.go:159] etcd: compacted rev (6746029), endpoints ([http://90.169.0.165:2379]) I1123 03:24:09.615500 1185 compact.go:159] etcd: compacted rev (6751463), endpoints ([http://90.169.0.165:2379]) E1123 03:24:51.219757 1185 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted I1123 03:29:10.332547 1185 trace.go:61] Trace "GuaranteedUpdate etcd3: *extensions.DaemonSet" (started 2017-11-23 03:29:09.750180607 +0800 CST): [143.417µs] [143.417µs] initial value restored [494.675µs] [351.258µs] Transaction prepared [582.174725ms] [581.68005ms] Transaction committed

    Scott Miller

    unread,
    Dec 7, 2017, 9:58:45 PM12/7/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    Any update on this? We've had to implement a brute force restart of kube-apiserver regularly as we roll the etcd credentials frequently (every couple of days) in a deployment with short lived TLS credentials from Vault. We weren't sure what was causing the apiserver to become unresponsive but every symptom in this issue rings true. K8S 1.7.8, etcd 3.2.6.

    Joe Betz

    unread,
    Dec 8, 2017, 12:38:26 AM12/8/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    @sgmiller v3.2.11 has just been released and contains multiple fixes in the etcd client to gRPC. We plan to upgrade kubernetes to use the new etcd client. See #56114 for details.

    I should warn that the bulk of the reports found in this (lengthy) issue provide sufficient detail or analysis to say for certain if they all have the same root cause, or if the gPRC fixes will resolve them. But they are close enough related that we'd like to push out these changes and get feedback on what (if any) of these types of issues remain.

    If you have any more details you can provide about the specific issues you've encountered, please let us know.

    Clayton Coleman

    unread,
    Dec 12, 2017, 3:55:21 PM12/12/17
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    Note that #57061 very likely to be grpc/grpc-go#1005, which was fixed after v1.0.4. Users on Kube 1.6 or 1.7 would be broken against etcd 3.2 servers at very large loads.

    rayterrill

    unread,
    Jan 5, 2018, 5:43:33 PM1/5/18
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    Is it possible to run into this issue on a new Kubernetes install with 1.9.1?

    Stuck here:
    sudo kubeadm init --pod-network-cidr=10.244.0.0/16 [init] Using Kubernetes version: v1.9.1 [init] Using Authorization modes: [Node RBAC] [preflight] Running pre-flight checks. [WARNING FileExisting-crictl]: crictl not found in system path [preflight] Starting the kubelet service [certificates] Generated ca certificate and key. [certificates] Generated apiserver certificate and key. [certificates] apiserver serving cert is signed for DNS names [popkubem kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.31.29.60] [certificates] Generated apiserver-kubelet-client certificate and key. [certificates] Generated sa key and public key. [certificates] Generated front-proxy-ca certificate and key. [certificates] Generated front-proxy-client certificate and key. [certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki" [kubeconfig] Wrote KubeConfig file to disk: "admin.conf" [kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf" [kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf" [kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf" [controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml" [controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml" [controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml" [etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml" [init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests". [init] This might take a minute or longer if the control plane images have to be pulled.
    It looks like the API server is failing to fully deploy, and I don't see an etcd container. Should I?

    sudo docker logs 7ac8aa056296 I0105 22:38:06.981611 1 server.go:121] Version: v1.9.1 I0105 22:38:07.271031 1 feature_gate.go:184] feature gates: map[Initializers:true] I0105 22:38:07.271056 1 initialization.go:90] enabled Initializers feature as part of admission plugin setup I0105 22:38:07.283529 1 master.go:225] Using reconciler: master-count W0105 22:38:07.322097 1 genericapiserver.go:342] Skipping API batch/v2alpha1 because it has no resources. W0105 22:38:07.332187 1 genericapiserver.go:342] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources. W0105 22:38:07.333085 1 genericapiserver.go:342] Skipping API storage.k8s.io/v1alpha1 because it has no resources. W0105 22:38:07.345359 1 genericapiserver.go:342] Skipping API admissionregistration.k8s.io/v1alpha1 because it has no resources. [restful] 2018/01/05 22:38:07 log.go:33: [restful/swagger] listing is available at https://172.31.29.60:6443/swaggerapi [restful] 2018/01/05 22:38:07 log.go:33: [restful/swagger] https://172.31.29.60:6443/swaggerui/ is mapped to folder /swagger-ui/ [restful] 2018/01/05 22:38:08 log.go:33: [restful/swagger] listing is available at https://172.31.29.60:6443/swaggerapi [restful] 2018/01/05 22:38:08 log.go:33: [restful/swagger] https://172.31.29.60:6443/swaggerui/ is mapped to folder /swagger-ui/ I0105 22:38:10.984334 1 serve.go:89] Serving securely on [::]:6443 I0105 22:38:10.988383 1 apiservice_controller.go:112] Starting APIServiceRegistrationController I0105 22:38:10.988395 1 cache.go:32] Waiting for caches to sync for APIServiceRegistrationController controller I0105 22:38:10.988492 1 available_controller.go:262] Starting AvailableConditionController I0105 22:38:10.988500 1 cache.go:32] Waiting for caches to sync for AvailableConditionController controller I0105 22:38:10.989071 1 crd_finalizer.go:242] Starting CRDFinalizer I0105 22:38:10.989563 1 controller.go:84] Starting OpenAPI AggregationController I0105 22:38:11.059043 1 naming_controller.go:274] Starting NamingConditionController I0105 22:38:11.088902 1 crdregistration_controller.go:110] Starting crd-autoregister controller I0105 22:38:11.088966 1 controller_utils.go:1019] Waiting for caches to sync for crd-autoregister controller I0105 22:38:11.089024 1 customresource_discovery_controller.go:152] Starting DiscoveryController I0105 22:38:25.045005 1 trace.go:76] Trace[723418120]: "Create /api/v1/namespaces/default/events" (started: 2018-01-05 22:38:11.044239793 +0000 UTC m=+4.207731430) (total time: 14.000729847s): Trace[723418120]: [14.000729847s] [14.000687312s] END I0105 22:38:35.047389 1 trace.go:76] Trace[1319983526]: "Create /api/v1/namespaces/default/events" (started: 2018-01-05 22:38:25.047018697 +0000 UTC m=+18.210510289) (total time: 10.000340856s): Trace[1319983526]: [10.000340856s] [10.000275818s] END I0105 22:38:45.044971 1 trace.go:76] Trace[200541705]: "Create /api/v1/nodes" (started: 2018-01-05 22:38:11.043822246 +0000 UTC m=+4.207313825) (total time: 34.001110155s): Trace[200541705]: [4.000724928s] [4.000618881s] About to store object in database Trace[200541705]: [34.001110155s] [30.000385227s] END I0105 22:38:45.050026 1 trace.go:76] Trace[1283603501]: "Create /api/v1/namespaces/default/events" (started: 2018-01-05 22:38:35.049574339 +0000 UTC m=+28.213065973) (total time: 10.000418685s): Trace[1283603501]: [10.000418685s] [10.000173571s] END I0105 22:38:45.986463 1 trace.go:76] Trace[295087653]: "Create /api/v1/namespaces" (started: 2018-01-05 22:38:11.985283044 +0000 UTC m=+5.148774636) (total time: 34.001141453s): Trace[295087653]: [4.000683598s] [4.000621727s] About to store object in database Trace[295087653]: [34.001141453s] [30.000457855s] END E0105 22:38:45.987123 1 client_ca_hook.go:78] Timeout: request did not complete within allowed duration

    Denis Mikhaylov

    unread,
    Feb 12, 2018, 8:57:54 AM2/12/18
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    We see something similar when doing hard poweroff of a VM on which one of etcd nodes is running. K8s version 1.9.1

    Xiang Li

    unread,
    Feb 12, 2018, 12:09:59 PM2/12/18
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    Closed #47131.

    Xiang Li

    unread,
    Feb 12, 2018, 12:10:04 PM2/12/18
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    This problem should be solved or at least mitigated by #57160. The PR bumps both gRPC and etcd client to fix the timeout problem caused by connection reset and balancing.

    I am closing out this issue. If anyone still see the timeout problem in a release (after 1.10) with #57160, please create a new issue with reproduce steps.

    Denis Mikhaylov

    unread,
    Feb 14, 2018, 3:24:49 AM2/14/18
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    @xiang90 any chance this gets backported to 1.9.x?

    Xiang Li

    unread,
    Feb 14, 2018, 12:03:17 PM2/14/18
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    @notxcain

    I do not manage the releases. @jpbetz might have an answer for you.

    Joe Betz

    unread,
    Feb 14, 2018, 1:11:02 PM2/14/18
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    I agree, #57160 should resolve this. Unfortunately, the PR introduces a lot of change. It bumps the versions of multiple major dependencies (etcd, docker, containerd, grpc, ...). I do not believe it is well suited for backport to stable kubernetes branches at this time. After 1.10 is released and this has had some time to bake in production, I'd be open to revisiting this, but for now let's limit it to k8s 1.10.

    Andrew Woodward

    unread,
    Jun 15, 2018, 1:11:10 PM6/15/18
    to kubernetes/kubernetes, k8s-mirror-api-machinery-misc, Team mention

    We've recently updated to etcd 3.1.13 and have kube on 1.6.13, as we got onto the v3 etcd store we appear to have been hit hard by this, we are working our way towards 1.10.x but are a few months from getting there especially since the lockups of api-server are distracting from moving forward. I see #56114 should Improve the problem, I was wondering if we would get most of the behavior if we introduced a grpc proxy from 3.2.11 (or newer) onto the host with api-server to back to the etcd server.

    digging through https://kubernetes.io/docs/imported/release/notes/ I see 'Upgrade to etcd client 3.2.13 and grpc 1.7.5 to improve HA etcd cluster stability. (#57480, ...)' and 'The supported etcd server version is 3.1.12, as compared to 3.0.17 in v1.9 (#60988)' so I see strong evidence that the client can talk to our etcd server.

    It seems viable to switch to the gRPC proxy to avoid this. looking for confirmation that the proxy would use the loadbalancer as updated in coreos/etcd#8840

    Reply all
    Reply to author
    Forward
    0 new messages