@javipolo: Reiterating the mentions to trigger a notification:
@kubernetes/sig-api-machinery-misc.
In response to this:
@kubernetes/sig-api-machinery-misc I guess :)
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
@javipolo Can you provide more logging? Probably in 10minutes window? The compacted thing is expected as the connection between etcd and api server was cut off. But it should disappear when etcd is back and resync is finished.
Yep. before I just provided the ERROR part of the logs. Now I'm putting the INFO part of them also. I leave logs since 16:50 until restart. Around 17:45 was when we did the rolling update of etcd. If you need anything else just ask for it and I'll try to provide it :) (thanks!)
Log file created at: 2017/06/07 16:23:48
Running on machine: kubeapi-05
Binary: Built with gc go1.7.5 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
.
.
.
.
I0607 16:50:59.279231 19698 trace.go:61] Trace "List /api/v1/namespaces/search/events" (started 2017-06-07 16:50:58.733629042 +0200 CEST):
[12.69µs] [12.69µs] About to List from storage
[280.765661ms] [280.752971ms] Listing from storage done
[293.367569ms] [12.601908ms] Self-linking done
[545.53202ms] [252.164451ms] Writing http response done (4300 items)
"List /api/v1/namespaces/search/events" [545.534815ms] [2.795µs] END
E0607 17:45:11.393411 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.393624 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *extensions.Deployment ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:11.448063 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.448149 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.PersistentVolume ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:11.453552 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.453700 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.PodTemplate ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:11.476405 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.476536 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.PersistentVolumeClaim ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:11.476637 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.476735 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.ConfigMap ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:11.492509 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.492594 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.LimitRange ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:11.492611 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.492665 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.ResourceQuota ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:11.492806 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:11.492862 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.ServiceAccount ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:45:12.448461 19698 cacher.go:125] Terminating all watchers from cacher *api.PersistentVolume
W0607 17:45:12.476786 19698 cacher.go:125] Terminating all watchers from cacher *api.PersistentVolumeClaim
W0607 17:45:12.492813 19698 cacher.go:125] Terminating all watchers from cacher *api.ResourceQuota
W0607 17:45:12.492860 19698 cacher.go:125] Terminating all watchers from cacher *api.LimitRange
W0607 17:45:12.493052 19698 cacher.go:125] Terminating all watchers from cacher *api.ServiceAccount
W0607 17:45:12.499419 19698 reflector.go:323] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: watch of *api.ServiceAccount ended with: too old resource version: 8902015 (8925724)
W0607 17:45:12.499526 19698 reflector.go:323] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: watch of *api.ResourceQuota ended with: too old resource version: 8902015 (8925724)
W0607 17:45:12.499581 19698 reflector.go:323] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: watch of *api.LimitRange ended with: too old resource version: 8902015 (8925724)
E0607 17:45:52.439496 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0607 17:45:52.439634 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0607 17:45:52.440143 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:52.441236 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *batch.Job ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:45:52.441066 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *storage.StorageClass ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:45:52.441336 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.Secret ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:45:52.456094 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:45:52.457264 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.Service ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:45:53.441577 19698 cacher.go:125] Terminating all watchers from cacher *api.Secret
W0607 17:45:53.452317 19698 reflector.go:323] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: watch of *api.Secret ended with: too old resource version: 8902015 (8925793)
W0607 17:45:53.457554 19698 cacher.go:125] Terminating all watchers from cacher *api.Service
E0607 17:47:36.116540 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:47:36.116631 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.ReplicationController ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:47:36.141317 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:47:36.141439 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *extensions.Ingress ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:47:36.141463 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0607 17:47:36.141467 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:47:36.141521 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *apps.StatefulSet ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:47:36.141531 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *storage.StorageClass ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:47:37.116969 19698 cacher.go:125] Terminating all watchers from cacher *api.ReplicationController
W0607 17:47:37.141672 19698 cacher.go:125] Terminating all watchers from cacher *apps.StatefulSet
E0607 17:48:26.488736 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.489735 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.Namespace ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.501637 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.501656 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.501754 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *rbac.ClusterRoleBinding ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.501776 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *rbac.ClusterRole ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.502128 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.502178 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *rbac.RoleBinding ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.502311 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.502357 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *policy.PodDisruptionBudget ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.502602 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.502654 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *rbac.Role ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.560101 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.560191 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *autoscaling.HorizontalPodAutoscaler ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.560404 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.560563 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *api.ReplicationController ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.560592 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.560623 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *batch.CronJob ended with: etcdserver: mvcc: required revision has been compacted
E0607 17:48:26.560625 19698 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0607 17:48:26.560717 19698 reflector.go:323] k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *certificates.CertificateSigningRequest ended with: etcdserver: mvcc: required revision has been compacted
W0607 17:48:27.489933 19698 cacher.go:125] Terminating all watchers from cacher *api.Namespace
W0607 17:48:27.498083 19698 reflector.go:323] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: watch of *api.Namespace ended with: too old resource version: 8902015 (8926322)
W0607 17:49:32.064253 19698 storage_extensions.go:127] third party resource sync failed: the server cannot complete the requested operation at this time, try again later (get thirdpartyresources.extensions)
W0607 17:50:42.066257 19698 storage_extensions.go:127] third party resource sync failed: the server cannot complete the requested operation at this time, try again later (get thirdpartyresources.extensions)
I0607 17:51:14.583499 19698 trace.go:61] Trace "Create /apis/extensions/v1beta1/namespaces/search/daemonsets" (started 2017-06-07 17:50:44.581144781 +0200 CEST):
[1.332617ms] [1.332617ms] About to convert to expected version
[1.608209ms] [275.592µs] Conversion done
[1.644553ms] [36.344µs] About to store object in database
"Create /apis/extensions/v1beta1/namespaces/search/daemonsets" [30.00229701s] [30.000652457s] END
W0607 17:51:52.068431 19698 storage_extensions.go:127] third party resource sync failed: the server cannot complete the requested operation at this time, try again later (get thirdpartyresources.extensions)
When did you restart API server? Are there more logs like etcdserver: mvcc: required revision has been compacted
after 17:51? I want to know if API server simply prints out these errors forever or it will stop after you upgrade ALL your etcd members.
/cc @jbeda
The logs I provided end up when restarting the apiserver, so no, there are no more of those errors after 17:51
I can provide the restart times of etcds and apiservers (date/time of the host are the same in every server)
etcd restart times
Active: active (running) since Wed 2017-06-07 17:45:12 CEST; 1h 33min ago
Active: active (running) since Wed 2017-06-07 17:48:27 CEST; 1h 30min ago
Active: active (running) since Wed 2017-06-07 17:45:53 CEST; 1h 33min ago
Active: active (running) since Wed 2017-06-07 17:46:43 CEST; 1h 32min ago
Active: active (running) since Wed 2017-06-07 17:47:37 CEST; 1h 31min ago
apiserver restart times:
Active: active (running) since Wed 2017-06-07 17:52:47 CEST; 1h 27min ago
Active: active (running) since Wed 2017-06-07 17:52:47 CEST; 1h 27min ago
Active: active (running) since Wed 2017-06-07 17:52:48 CEST; 1h 27min ago
Active: active (running) since Wed 2017-06-07 17:52:48 CEST; 1h 27min ago
Active: active (running) since Wed 2017-06-07 17:52:49 CEST; 1h 27min ago
I have encountered the same problem, this problem has been troubled me.
kubernetes version: v1.6.4
etcd version: 3.1.8
etcd status:
[root@ip-10-255-2-76 kubernetes]# etcdctl \
> --ca-file=/etc/kubernetes/ssl/ca.pem \
> --cert-file=/etc/kubernetes/ssl/kubernetes.pem \
> --key-file=/etc/kubernetes/ssl/kubernetes-key.pem \
> cluster-health
2017-06-08 15:09:07.272311 I | warning: ignoring ServerName for user-provided CA for
backwards compatibility is deprecated
2017-06-08 15:09:07.273036 I | warning: ignoring ServerName for user-provided CA for
backwards compatibility is deprecated
member 1ccf784fffd9bd65 is healthy: got healthy result from https://10.255.2.174:2379
member 81eb13a8d04dcfee is healthy: got healthy result from https://10.255.2.76:2379
member a05de482e89e5998 is healthy: got healthy result from https://10.255.2.147:2379
cluster is healthy
show logs: kube-aiserver
[root@ip-10-255-2-76 kubernetes]# tail -500f /data/logs/kubernetes/kube-apiserver.INFO
Log file created at: 2017/06/08 14:32:51
Running on machine: ip-10-255-2-76
Binary: Built with gc go1.7.5 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0608 14:32:51.608111 9597 aws.go:762] Building AWS cloudprovider
I0608 14:32:51.608413 9597 aws.go:725] Zone not specified in configuration file; querying AWS metadata service
I0608 14:32:51.776140 9597 tags.go:76] AWS cloud filtering on ClusterID: kubernetes
E0608 14:32:52.320857 9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.ClusterRole: Get https://10.255.2.76:6443/apis/rbac.authorization.k8s.io/v1beta1/clusterroles?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
E0608 14:32:52.321118 9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Secret: Get https://10.255.2.76:6443/api/v1/secrets?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
E0608 14:32:52.321172 9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *storage.StorageClass: Get https://10.255.2.76:6443/apis/storage.k8s.io/v1beta1/storageclasses?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
E0608 14:32:52.321214 9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ServiceAccount: Get https://10.255.2.76:6443/api/v1/serviceaccounts?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
E0608 14:32:52.321249 9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.LimitRange: Get https://10.255.2.76:6443/api/v1/limitranges?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
E0608 14:32:52.321279 9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Namespace: Get https://10.255.2.76:6443/api/v1/namespaces?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
E0608 14:32:52.321317 9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.RoleBinding: Get https://10.255.2.76:6443/apis/rbac.authorization.k8s.io/v1beta1/rolebindings?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
E0608 14:32:52.321352 9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.Role: Get https://10.255.2.76:6443/apis/rbac.authorization.k8s.io/v1beta1/roles?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
E0608 14:32:52.321384 9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ResourceQuota: Get https://10.255.2.76:6443/api/v1/resourcequotas?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
E0608 14:32:52.367614 9597 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.ClusterRoleBinding: Get https://10.255.2.76:6443/apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings?resourceVersion=0: dial tcp 10.255.2.76:6443: getsockopt: connection refused
I0608 14:32:52.650280 9597 serve.go:79] Serving securely on 10.255.2.76:6443
I0608 14:32:52.650325 9597 serve.go:94] Serving insecurely on 10.255.2.76:8080
I0608 14:32:53.671016 9597 trace.go:61] Trace "GuaranteedUpdate etcd3: *api.ConfigMap" (started 2017-06-08 14:32:53.098384831 +0800 CST):
[32.359241ms] [32.359241ms] initial value restored
[555.829144ms] [523.469903ms] Transaction prepared
[572.583965ms] [16.754821ms] Transaction committed
"GuaranteedUpdate etcd3: *api.ConfigMap" [572.599509ms] [15.544µs] END
I0608 14:32:53.671102 9597 trace.go:61] Trace "Update /api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication" (started 2017-06-08 14:32:53.098317621 +0800 CST):
[22.218µs] [22.218µs] About to convert to expected version
[39.51µs] [17.292µs] Conversion done
[42.677µs] [3.167µs] About to store object in database
[572.712537ms] [572.66986ms] Object stored in database
[572.716218ms] [3.681µs] Self-link added
"Update /api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication" [572.763593ms] [47.375µs] END
I0608 14:32:53.891199 9597 storage_rbac.go:227] created role.rbac.authorization.k8s.io/system:controller:bootstrap-signer in kube-system
I0608 14:32:53.925799 9597 storage_rbac.go:227] created role.rbac.authorization.k8s.io/system:controller:token-cleaner in kube-system
I0608 14:32:53.973401 9597 storage_rbac.go:227] created role.rbac.authorization.k8s.io/system:controller:bootstrap-signer in kube-public
I0608 14:32:54.011158 9597 storage_rbac.go:257] created rolebinding.rbac.authorization.k8s.io/system:controller:bootstrap-signer in kube-system
I0608 14:32:54.040216 9597 storage_rbac.go:257] created rolebinding.rbac.authorization.k8s.io/system:controller:token-cleaner in kube-system
I0608 14:32:54.104971 9597 storage_rbac.go:257] created rolebinding.rbac.authorization.k8s.io/system:controller:bootstrap-signer in kube-public
I0608 14:42:51.937522 9597 compact.go:159] etcd: compacted rev (404), endpoints ([https://10.255.2.147:2379 https://10.255.2.76:2379 https://10.255.2.174:2379])
I0608 14:47:51.945296 9597 compact.go:159] etcd: compacted rev (709), endpoints ([https://10.255.2.147:2379 https://10.255.2.76:2379 https://10.255.2.174:2379])
E0608 14:49:58.232233 9597 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
I0608 14:52:51.953504 9597 compact.go:159] etcd: compacted rev (1012), endpoints ([https://10.255.2.147:2379 https://10.255.2.76:2379 https://10.255.2.174:2379])
I0608 14:57:51.962993 9597 compact.go:159] etcd: compacted rev (1311), endpoints ([https://10.255.2.147:2379 https://10.255.2.76:2379 https://10.255.2.174:2379])
E0608 14:58:13.284420 9597 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
I0608 15:02:51.971159 9597 compact.go:159] etcd: compacted rev (1612), endpoints ([https://10.255.2.147:2379 https://10.255.2.76:2379 https://10.255.2.174:2379])
I0608 15:07:51.979360 9597 compact.go:159] etcd: compacted rev (1911), endpoints ([https://10.255.2.147:2379 https://10.255.2.76:2379 https://10.255.2.174:2379])
E0608 15:07:58.354849 9597 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
+1
the same problem
etcd 3.1.7
k8s: 1.6.7
I also have the same messages on the apiserver log, however it's rare that I get timeouts on operations invoked on it.
K8s: 1.6.2
etcd: 3.1.10 && 3.0.17 (these messages appeared before I upgraded to 3.1.10 and are still there)
/assign @hongchaodeng @xiang90
OK. So this issue already exists. I can share my story as well.
Basically, we were doing some etcd scale up and down stress testing. We saw that APIserver hanged and some resources unusable. I tried a dozen times and they were whimsical:
I have tried to reproduce with scratch apiserver:
--etcd-servers=http://${lb_service_ip}:2379 --service-cluster-ip-range=10.3.0.0/24
Haven't reproduced yet. So I doubt it needs some data.
Note that in my case it happened only on scaling down from 3 to 1. Scaling down from 2 to 1 wouldn't happen. I also tried to scale down slowly, waiting 60s before removing the second one, and it wouldn't happen.
@xiang90
the same problem.
etcd Version: 3.1.0
Git SHA: 8ba2897
Go Version: go1.7.4
Go OS/Arch: linux/amd64
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-29T23:15:59Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-29T22:55:19Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Aug 09 10:50:56 kube-apiserver[62362]: E0809 10:50:56.548757 62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:38 kube-apiserver[62362]: E0809 10:51:38.087021 62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.890132 62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.890135 62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.890136 62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.890191 62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.915202 62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.915298 62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.915311 62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
Aug 09 10:51:55 kube-apiserver[62362]: E0809 10:51:55.915315 62362 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
thx
Similar problem (see log below).
In short - installing new K8S master node based on this guide: https://coreos.com/kubernetes/docs/latest/getting-started.html - when trying to start kubelet service on it - everything starts up, but the apiserver all the time crashes (?). Due to that worker node can't register it self as well master node isn't fully working.
etcd version: 3.2.5
kubernetes version 1.6.1_coreos.0
E0814 14:32:14.665943 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ResourceQuota: Get https://localhost:443/api/v1/resourcequotas?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0814 14:32:14.665943 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Secret: Get https://localhost:443/api/v1/secrets?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0814 14:32:14.665943 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Namespace: Get https://localhost:443/api/v1/namespaces?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0814 14:32:14.665947 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.LimitRange: Get https://localhost:443/api/v1/limitranges?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0814 14:32:14.665950 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *storage.StorageClass: Get https://localhost:443/apis/storage.k8s.io/v1beta1/storageclasses?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0814 14:32:14.665950 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ServiceAccount: Get https://localhost:443/api/v1/serviceaccounts?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused [restful] 2017/08/14 14:32:14 log.go:30: [restful/swagger] listing is available at https://192.168.178.155:443/swaggerapi/ [restful] 2017/08/14 14:32:14 log.go:30: [restful/swagger] https://192.168.178.155:443/swaggerui/ is mapped to folder /swagger-ui/ I0814 14:32:14.740919 1 serve.go:79] Serving securely on 0.0.0.0:443 I0814 14:32:14.741036 1 serve.go:94] Serving insecurely on 127.0.0.1:8080 I0814 14:32:25.189208 1 trace.go:61] Trace "Create /api/v1/namespaces/kube-system/pods" (started 2017-08-14 14:32:15.188789893 +0000 UTC): [25.862µs] [25.862µs] About to convert to expected version [138.138µs] [112.276µs] Conversion done "Create /api/v1/namespaces/kube-system/pods" [10.00036266s] [10.000224522s] END I0814 14:32:29.307733 1 trace.go:61] Trace "Create /api/v1/namespaces/kube-system/pods" (started 2017-08-14 14:32:19.307339406 +0000 UTC): [50.67µs] [50.67µs] About to convert to expected version [178.618µs] [127.948µs] Conversion done "Create /api/v1/namespaces/kube-system/pods" [10.000361s] [10.000182382s] END I0814 14:32:43.639855 1 trace.go:61] Trace "Create /api/v1/namespaces/kube-system/pods" (started 2017-08-14 14:32:33.639564997 +0000 UTC): [27.653µs] [27.653µs] About to convert to expected version [109.672µs] [82.019µs] Conversion done "Create /api/v1/namespaces/kube-system/pods" [10.000262566s] [10.000152894s] END
Update to the situation - I solved my issue. Turns out that if you are running etcd3
cluster with TLS then in kube-apiserver.yaml
file you should add cert, key and CA file locations - otherwise API server isn't able to connect and store information in etcd. That I found out only by adding --v=4
for verbose logging in kube-apiserver.yaml
and then I discovered that all the URLs that kube-apiserver tries to access has 403 error code.
Any progress about this issue? we have also encountered this, and even worse is that some updating operation failed. the log is:
‘’‘
E0812 19:02:48.471971 24 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
E0812 19:02:48.536491 24 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
W0812 19:02:48.536547 24 reflector.go:323] vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: watch of *paas.Stack ended with: etcdserver: mvcc: required revision has been compacted
’‘’
this happens twice in our environment, and we did't find the reproducing condition.
also have the same question. any solution ? I checked log find many line like:
E0812 19:02:48.536491 24 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
k8s version: 1.7.2
etcd version: 3.0.15
and also have a strange queation.
I created some pod, but etcd record nothing except network config record.
use "etcdctl ls /" ,I didn't see any records about the pod I created.
My colleagues are also experiencing this problem, and didn't resolve it.
but when I changed k8s version to 1.3.10, etcd has the pod record existed (use "etcdctl ls /"). why ?
need help. waiting...
etcdctl ls /
uses the etcd v2 API, which is not what kubernetes 1.6+ uses by default
ETCDCTL_API=3 etcdctl get --prefix /
uses the etcd v3 API, which will show you your kubernetes data
by the way, after I change apiserver parameter "storage_backend" value to etcd2, the error "E0812 19:02:48.536491 24 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted" disappeared.
I guess if there has some relation between them.
event compaction is perfectly normal, and only exists in etcd v3 api. attempting to watch from a compacted version returns a 410 error, which API clients should respond to by doing a full relist to get the current resourceVersion.
[MILESTONENOTIFIER] Milestone Labels Incomplete
@hongchaodeng @javipolo @xiang90
Action required: This issue requires label changes. If the required changes are not made within 6 days, the issue will be moved out of the v1.8 milestone.
kind: Must specify at most one of ['kind/bug', 'kind/feature', 'kind/cleanup'].
priority: Must specify at most one of ['priority/critical-urgent', 'priority/important-soon', 'priority/important-longterm'].
[MILESTONENOTIFIER] Milestone Labels Incomplete
Action required: This issue requires label changes. If the required changes are not made within 5 days, the issue will be moved out of the v1.8 milestone.
[MILESTONENOTIFIER] Milestone Labels Incomplete
Action required: This issue requires label changes. If the required changes are not made within 4 days, the issue will be moved out of the v1.8 milestone.
@xiang90 I cannot label this issue, but the correct labels should be 'kind/bug' and 'priority/important-soon'
Can you label the issue or tell me how should I do it?
Thanks
There are two different issues mentioned in this thread:
2 is normal. We should change API server error logging to make it a warning or even a debug logging.
Can someone provide a way to reproduce of the timeout issue? I have not seen this behavior in our clusters.
@xiang90 in my case by just restarting etcd servers I can randomly trigger the issue ....
[MILESTONENOTIFIER] Milestone Labels Incomplete
Action required: This issue requires label changes. If the required changes are not made within 3 days, the issue will be moved out of the v1.8 milestone.
kind: Must specify at most one of ['kind/bug', 'kind/feature', 'kind/cleanup'].
priority: Must specify at most one of ['priority/critical-urgent', 'priority/important-soon', 'priority/important-longterm'].
—
/kind bug
/priority important-soon
[MILESTONENOTIFIER] Milestone Labels Complete
@hongchaodeng @javipolo @xiang90
Issue label settings:
[MILESTONENOTIFIER] Milestone Labels Complete
@hongchaodeng @javipolo @xiang90
Issue label settings:
@xiang90 @hongchaodeng @javipolo So is there any PR for this? if not can we move it out of 1.8 milestone please?
@xiang90 @hongchaodeng @javipolo this is a daily and friendly ping from release team. Is this still relevant to 1.8? If yes, who is actively working on it?
we should move this out of 1.8. there is no existing pr for it.
@xiang90 thanks. Removing the milestone.
It's not clear that there's any actual problem here.
I take that back. OP seems to have a problem. Note that we haven't attempted running Kubernetes against etcd 3.1.x or up. Some of the other commenters on this issue may have hit the fact that the etcd backend version changed to 3 from 2.
cc @jpbetz
the original questions is about the logging line
E0607 17:45:11.447234 367 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
I explained that this is probably just normal. And the compacted problem is recoverable automatically, thus it requires no human interaction. It does not worth an error level.
There are other etcd potential hanging problems raised in this thread. I did not response them here (or take a close look) since it is too noise.
If I'm reading this right, watch chan error: etcdserver: mvcc: required
might be a red herring and there might be a variety of unrelated issues being attributed to it here.
And, as @zxpower observed, when upgrading to etcd 3 with TLS, if -v=4
verbose logging is not enabled in kube-apiserver.yaml
, it can be difficult to diagnose the problem correctly.
How about we update this issues title/description to better guide developers that believe their issue is the same as this one to first check a few things, e.g.: "If you believe you've encountered this issue, note that watch chan error: etcdserver: mvcc: required
can be logged for a variety of unrelated reasons and might be a red herring. Please first check the connectivity between the kube-apiserver and etcd, enabling -v=4
logging in the kube-apiserver.yaml
. Be sure to include your kubernets and etcd versions and the steps one would need to take to reproduce the issue."
?
I suspect it is a deadlock in Etcd client kubernetes Apiserver is shipped with. We see requests to some or all apiservers hanging after Etcd cluster roll.
When APIServer hangs we see number of requests to Etcd piling up:
$ curl https://apiserver/debug/pprof/goroutine?debug=2 | grep -F 'registry.(*Store).' | wc -l
499
Problem is made worse by the fact that since 1.7.x, apiserver checks serviceaccount tokens in Etcd, so even if a request would end up being served from cache, mandatory round trip to Etcd makes it timeout and fail.
Kubernetes 1.7.x ships etcd client version 3.1.5, there were numerous improvements on a client side since then, some of them were addressing deadlocks in balancer code which is very much in use when etcd servers are rolled and change their availability status.
Here are some relevant issues:
coreos/etcd#7842
coreos/etcd#8711
coreos/etcd#8710
@liggitt , how feasible it is to update bundled etcd client version in Kube 1.7.x and 1.8.x to >= 3.2.9 ?
I could rev the client to 3.2.x, but the server side of upstream has to stay at 3.1.10 due to supportable upgrade path.
@hongchaodeng @xiang90 - any known issues with client skew testing?
If we want to do this, we need to do it now.
/cc @kubernetes/sig-cluster-lifecycle-misc
The only issue we've seen are the load balancer rollover and hanging
watches against 3.1 servers when more than two thousand watches are in
flight for a single apiserver.
I believe there are some breakages in either etcd client or gRPC (the version we vendored) or both. We are actively testing and fixing this in etcd master. We will probably backport the fix to 3.2 client.
cc @wojtek-t
Not to add to the noise, but we've encountered this issue in production, and it ended up leading to a complete cluster outage of all ≈300 nodes (through a very unfortunate series of events.)
I have gathered all the relevant logs from our 3 k8s master and 9 etcd nodes. There may not be anything of additional interest there, but if you would like to see them please let me know and I can share them privately.
@obeattie I'm sooo sorry. I'm updating the client tomorrow, and going to poke folks about getting the next rev in line for release.
/cc @luxas @roberthbailey @jbeda
I think this error message:
E0607 17:45:11.447234 367 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
Is actually caused by the etcd3 storage watch code in kubernetes not properly handling the expiration of the revision and restarting the watch from the RV reported by the error. We should be able to transparently restart the watch without logging that error. It is not the hanging error described later in the thread.
Spawned #54974
I'm hitting this error on 1.7.5 and etcd 3.1.8 but not due to an etcd rollout, it simply just started occurring on my cluster to the point that its affecting usability. Are there any known work arounds?
I'm testing a workaround that seem to work so far for the etcd client that hang: tell kube-apiserver to use localhost as etcd cluster endpoints instead of the 3 etcd servers. It works for me because i've 3 master nodes each with one kube-apiserver and one etcd server.
PSA - we are waiting on a 3.2.10 client update.
3.2.10 is available
From what I understand, what will happen now is that we'll update
Then we'll figure out a 3.1.x server version to recommend for v1.9. Currently we're at 3.1.10 at HEAD
cc @timothysc -- right?
[MILESTONENOTIFIER] Milestone Issue Labels Incomplete
@hongchaodeng @javipolo @xiang90
Action required: This issue requires label changes. If the required changes are not made within 2 days, the issue will be moved out of the v1.9 milestone.
kind: Must specify exactly one of kind/bug
, kind/cleanup
or kind/feature
.
[MILESTONENOTIFIER] Milestone Issue Current
@hongchaodeng @javipolo @xiang90
Issue Labelssig/api-machinery
sig/cluster-lifecycle
sig/scalability
: Issue will be escalated to these SIGs if needed.priority/important-soon
: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.kind/bug
: Fixes a bug discovered during the current release.cc @kubernetes/kubernetes-release-managers as this is most probably a show-stopper for v1.9; we will most probably consider blocking the release on fixing this issue.
@luxas That's the basic set of steps planned, yes. Note that we're planning to merge the work to the k8s 1.10 branch, and then cherry pick to a 1.9.x release once it proves stable, we're not targeting the initial 1.9 release.
I'll be picking up the engineering work on most of this. I'll link to a issue explaining in more detail soon.
[MILESTONENOTIFIER] Milestone Issue Needs Attention
@hongchaodeng @javipolo @xiang90 @kubernetes/sig-api-machinery-misc @kubernetes/sig-cluster-lifecycle-misc @kubernetes/sig-scalability-misc
Action required: During code slush, issues in the milestone should be in progress.
If this issue is not being actively worked on, please remove it from the milestone.
If it is being worked on, please add the status/in-progress
label so it can be tracked with other in-flight issues.
Note: If this issue is not resolved or labeled as priority/critical-urgent
by Wed, Nov 22 it will be moved out of the v1.9 milestone.
sig/api-machinery
sig/cluster-lifecycle
sig/scalability
: Issue will be escalated to these SIGs if needed.priority/important-soon
: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.kind/bug
: Fixes a bug discovered during the current release.—
[MILESTONENOTIFIER] Milestone Removed From Issue
@hongchaodeng @javipolo @xiang90 @kubernetes/sig-api-machinery-misc @kubernetes/sig-cluster-lifecycle-misc @kubernetes/sig-scalability-misc
Important: Code freeze is in effect and only issues with priority/critical-urgent
may remain in the v1.9 milestone.
also have the same question,I want to know that the cause of the problem is what? @xiang90
k8s version is 1.6.9 thanks
the log in api-server is like this
I1123 03:11:02.243886 1185 trace.go:61] Trace "Update /api/v1/namespaces/admin/pods/f1a32f2d-a43a-4a74-ac35-a4936fb070be-r2qsd/status" (started 2017-11-23 03:11:01.710184949 +0800 CST): [69.492µs] [69.492µs] About to convert to expected version [384.168µs] [314.676µs] Conversion done [390.729µs] [6.561µs] About to store object in database [533.50445ms] [533.113721ms] Object stored in database [533.511425ms] [6.975µs] Self-link added "Update /api/v1/namespaces/admin/pods/f1a32f2d-a43a-4a74-ac35-a4936fb070be-r2qsd/status" [533.65373ms] [142.305µs] END E1123 03:11:14.142108 1185 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted I1123 03:13:58.495456 1185 trace.go:61] Trace "List /api/v1/events" (started 2017-11-23 03:13:57.985830247 +0800 CST): [9.582µs] [9.582µs] About to List from storage [190.568834ms] [190.559252ms] Listing from storage done [276.924895ms] [86.356061ms] Self-linking done [509.573188ms] [232.648293ms] Writing http response done (5721 items) "List /api/v1/events" [509.574659ms] [1.471µs] END I1123 03:14:09.477836 1185 compact.go:159] etcd: compacted rev (6740640), endpoints ([http://90.169.0.165:2379]) I1123 03:19:09.546562 1185 compact.go:159] etcd: compacted rev (6746029), endpoints ([http://90.169.0.165:2379]) I1123 03:24:09.615500 1185 compact.go:159] etcd: compacted rev (6751463), endpoints ([http://90.169.0.165:2379]) E1123 03:24:51.219757 1185 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted I1123 03:29:10.332547 1185 trace.go:61] Trace "GuaranteedUpdate etcd3: *extensions.DaemonSet" (started 2017-11-23 03:29:09.750180607 +0800 CST): [143.417µs] [143.417µs] initial value restored [494.675µs] [351.258µs] Transaction prepared [582.174725ms] [581.68005ms] Transaction committed
Any update on this? We've had to implement a brute force restart of kube-apiserver regularly as we roll the etcd credentials frequently (every couple of days) in a deployment with short lived TLS credentials from Vault. We weren't sure what was causing the apiserver to become unresponsive but every symptom in this issue rings true. K8S 1.7.8, etcd 3.2.6.
@sgmiller v3.2.11 has just been released and contains multiple fixes in the etcd client to gRPC. We plan to upgrade kubernetes to use the new etcd client. See #56114 for details.
I should warn that the bulk of the reports found in this (lengthy) issue provide sufficient detail or analysis to say for certain if they all have the same root cause, or if the gPRC fixes will resolve them. But they are close enough related that we'd like to push out these changes and get feedback on what (if any) of these types of issues remain.
If you have any more details you can provide about the specific issues you've encountered, please let us know.
Note that #57061 very likely to be grpc/grpc-go#1005, which was fixed after v1.0.4. Users on Kube 1.6 or 1.7 would be broken against etcd 3.2 servers at very large loads.
Is it possible to run into this issue on a new Kubernetes install with 1.9.1?
Stuck here:
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 [init] Using Kubernetes version: v1.9.1 [init] Using Authorization modes: [Node RBAC] [preflight] Running pre-flight checks. [WARNING FileExisting-crictl]: crictl not found in system path [preflight] Starting the kubelet service [certificates] Generated ca certificate and key. [certificates] Generated apiserver certificate and key. [certificates] apiserver serving cert is signed for DNS names [popkubem kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.31.29.60] [certificates] Generated apiserver-kubelet-client certificate and key. [certificates] Generated sa key and public key. [certificates] Generated front-proxy-ca certificate and key. [certificates] Generated front-proxy-client certificate and key. [certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki" [kubeconfig] Wrote KubeConfig file to disk: "admin.conf" [kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf" [kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf" [kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf" [controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml" [controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml" [controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml" [etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml" [init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests". [init] This might take a minute or longer if the control plane images have to be pulled.
It looks like the API server is failing to fully deploy, and I don't see an etcd container. Should I?
sudo docker logs 7ac8aa056296 I0105 22:38:06.981611 1 server.go:121] Version: v1.9.1 I0105 22:38:07.271031 1 feature_gate.go:184] feature gates: map[Initializers:true] I0105 22:38:07.271056 1 initialization.go:90] enabled Initializers feature as part of admission plugin setup I0105 22:38:07.283529 1 master.go:225] Using reconciler: master-count W0105 22:38:07.322097 1 genericapiserver.go:342] Skipping API batch/v2alpha1 because it has no resources. W0105 22:38:07.332187 1 genericapiserver.go:342] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources. W0105 22:38:07.333085 1 genericapiserver.go:342] Skipping API storage.k8s.io/v1alpha1 because it has no resources. W0105 22:38:07.345359 1 genericapiserver.go:342] Skipping API admissionregistration.k8s.io/v1alpha1 because it has no resources. [restful] 2018/01/05 22:38:07 log.go:33: [restful/swagger] listing is available at https://172.31.29.60:6443/swaggerapi [restful] 2018/01/05 22:38:07 log.go:33: [restful/swagger] https://172.31.29.60:6443/swaggerui/ is mapped to folder /swagger-ui/ [restful] 2018/01/05 22:38:08 log.go:33: [restful/swagger] listing is available at https://172.31.29.60:6443/swaggerapi [restful] 2018/01/05 22:38:08 log.go:33: [restful/swagger] https://172.31.29.60:6443/swaggerui/ is mapped to folder /swagger-ui/ I0105 22:38:10.984334 1 serve.go:89] Serving securely on [::]:6443 I0105 22:38:10.988383 1 apiservice_controller.go:112] Starting APIServiceRegistrationController I0105 22:38:10.988395 1 cache.go:32] Waiting for caches to sync for APIServiceRegistrationController controller I0105 22:38:10.988492 1 available_controller.go:262] Starting AvailableConditionController I0105 22:38:10.988500 1 cache.go:32] Waiting for caches to sync for AvailableConditionController controller I0105 22:38:10.989071 1 crd_finalizer.go:242] Starting CRDFinalizer I0105 22:38:10.989563 1 controller.go:84] Starting OpenAPI AggregationController I0105 22:38:11.059043 1 naming_controller.go:274] Starting NamingConditionController I0105 22:38:11.088902 1 crdregistration_controller.go:110] Starting crd-autoregister controller I0105 22:38:11.088966 1 controller_utils.go:1019] Waiting for caches to sync for crd-autoregister controller I0105 22:38:11.089024 1 customresource_discovery_controller.go:152] Starting DiscoveryController I0105 22:38:25.045005 1 trace.go:76] Trace[723418120]: "Create /api/v1/namespaces/default/events" (started: 2018-01-05 22:38:11.044239793 +0000 UTC m=+4.207731430) (total time: 14.000729847s): Trace[723418120]: [14.000729847s] [14.000687312s] END I0105 22:38:35.047389 1 trace.go:76] Trace[1319983526]: "Create /api/v1/namespaces/default/events" (started: 2018-01-05 22:38:25.047018697 +0000 UTC m=+18.210510289) (total time: 10.000340856s): Trace[1319983526]: [10.000340856s] [10.000275818s] END I0105 22:38:45.044971 1 trace.go:76] Trace[200541705]: "Create /api/v1/nodes" (started: 2018-01-05 22:38:11.043822246 +0000 UTC m=+4.207313825) (total time: 34.001110155s): Trace[200541705]: [4.000724928s] [4.000618881s] About to store object in database Trace[200541705]: [34.001110155s] [30.000385227s] END I0105 22:38:45.050026 1 trace.go:76] Trace[1283603501]: "Create /api/v1/namespaces/default/events" (started: 2018-01-05 22:38:35.049574339 +0000 UTC m=+28.213065973) (total time: 10.000418685s): Trace[1283603501]: [10.000418685s] [10.000173571s] END I0105 22:38:45.986463 1 trace.go:76] Trace[295087653]: "Create /api/v1/namespaces" (started: 2018-01-05 22:38:11.985283044 +0000 UTC m=+5.148774636) (total time: 34.001141453s): Trace[295087653]: [4.000683598s] [4.000621727s] About to store object in database Trace[295087653]: [34.001141453s] [30.000457855s] END E0105 22:38:45.987123 1 client_ca_hook.go:78] Timeout: request did not complete within allowed duration
We see something similar when doing hard poweroff of a VM on which one of etcd nodes is running. K8s version 1.9.1
Closed #47131.
This problem should be solved or at least mitigated by #57160. The PR bumps both gRPC and etcd client to fix the timeout problem caused by connection reset and balancing.
I am closing out this issue. If anyone still see the timeout problem in a release (after 1.10) with #57160, please create a new issue with reproduce steps.
@xiang90 any chance this gets backported to 1.9.x?
I agree, #57160 should resolve this. Unfortunately, the PR introduces a lot of change. It bumps the versions of multiple major dependencies (etcd, docker, containerd, grpc, ...). I do not believe it is well suited for backport to stable kubernetes branches at this time. After 1.10 is released and this has had some time to bake in production, I'd be open to revisiting this, but for now let's limit it to k8s 1.10.
We've recently updated to etcd 3.1.13 and have kube on 1.6.13, as we got onto the v3 etcd store we appear to have been hit hard by this, we are working our way towards 1.10.x but are a few months from getting there especially since the lockups of api-server are distracting from moving forward. I see #56114 should Improve the problem, I was wondering if we would get most of the behavior if we introduced a grpc proxy from 3.2.11 (or newer) onto the host with api-server to back to the etcd server.
digging through https://kubernetes.io/docs/imported/release/notes/ I see 'Upgrade to etcd client 3.2.13 and grpc 1.7.5 to improve HA etcd cluster stability. (#57480, ...)' and 'The supported etcd server version is 3.1.12, as compared to 3.0.17 in v1.9 (#60988)' so I see strong evidence that the client can talk to our etcd server.
It seems viable to switch to the gRPC proxy to avoid this. looking for confirmation that the proxy would use the loadbalancer as updated in coreos/etcd#8840