Kubernetes Upgrade Caused a Crash in PV Controller(Controller Manager)

64 views
Skip to first unread message

krma...@gmail.com

unread,
May 10, 2018, 7:10:54 PM5/10/18
to Kubernetes developer/contributor discussion
Following was the scenario:-

We have three masters in HA configuration and each master has a haproxy which is distributing load to all apiservers. While trying to upgrade from 1.7.14 to 1.9.7 we saw the following crash.

Apr 27 20:00:23 aaaa hyperkube[109319]: E0427 20:00:23.874600  109319 goroutinemap.go:165] Operation for "stateful-0[2763a3db-4a49-11e8-9a0c-1418775ac502]" failed. No retries permitted until 2018-04-27 20:02:25.874559943 +0000 GMT m=+5485.318306295 (durationBeforeRetry 2m2s). Error: "recovered from panic \"runtime error: invalid memory address or nil pointer dereference\". (err=<nil>) Call stack:\n/workspace/anago-v1.9.7-beta.0.111+dd5e1a2978fd0b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72\n/workspace/anago-v1.9.7-beta.0.111+dd5e1a2978fd0b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:155\n/usr/local/go/src/runtime/asm_amd64.s:509\n/usr/local/go/src/runtime/panic.go:491\n/usr/local/go/src/runtime/panic.go:63\n/usr/local/go/src/runtime/signal_unix.go:367\n/workspace/anago-v1.9.7-beta.0.111+dd5e1a2978fd0b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/volume/persistentvolume/pv_controller.go:1347\n/workspace/anago-v1.9.7-beta.0.111+dd5e1a2978fd0b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/volume/persistentvolume/pv_controller.go:1272\n/workspace/anago-v1.9.7-beta.0.111+dd5e1a2978fd0b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/goroutinemap/goroutinemap.go:130\n/usr/local/go/src/runtime/asm_amd64.s:2337\n"


Since reclaimPolicy was a new field in 1.9.7, it seems the pv controller running 1.9.7 code was at this time talking to 1.7.14 apiserver and got a storageclass where the this field reclaimPolicy was not defaulted.

This is my current understanding, correct me if i am wrong.

Questions:-

1: Does defaulting occur only in the apiserver or does it also occur in controllermanager client , if so this would not have happened /
2: Is upgrading masters one by one not a supported path ? 
3: How does GKE handle HA master upgrades in this scenario  ?
4: What can i do to avoid this situation or what is the recommendation way to upgrade in this kind of scenario ?

Mayank


Tim St. Clair

unread,
May 11, 2018, 9:50:02 AM5/11/18
to krma...@gmail.com, Kubernetes developer/contributor discussion
skipped upgrades e.g. 1.7.x - 1.9.y, have never been supported.

You need to upgrade through the 1.7.x->1.8.y->1.9.z due to potential
api object conversion issues.

Cheers,
Tim
> --
> You received this message because you are subscribed to the Google Groups
> "Kubernetes developer/contributor discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kubernetes-de...@googlegroups.com.
> To post to this group, send email to kuberne...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kubernetes-dev/0afb0cb3-25e7-4a5c-8026-f6a1c72c0eab%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Cheers,
Timothy St. Clair

“Do all the good you can. By all the means you can. In all the ways
you can. In all the places you can. At all the times you can. To all
the people you can. As long as ever you can.”

Jordan Liggitt

unread,
May 11, 2018, 9:52:02 AM5/11/18
to Tim St. Clair, krma...@gmail.com, Kubernetes developer/contributor discussion
That's fair, though the same issue exists in the 1.8 controller (deference of a new field in 1.8):

https://github.com/kubernetes/kubernetes/blob/release-1.8/pkg/controller/volume/persistentvolume/pv_controller.go#L1285

> email to kubernetes-dev+unsubscribe@googlegroups.com.
> To post to this group, send email to kubernetes-dev@googlegroups.com.
--
Cheers,
Timothy St. Clair

“Do all the good you can. By all the means you can. In all the ways
you can. In all the places you can. At all the times you can. To all
the people you can. As long as ever you can.”
--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CALM%2Bqp__bgvRCryJtCZSbqF0YXsdCC%2BNJ_CfDYox%2BNJbynF65Q%40mail.gmail.com.

Daniel Smith

unread,
May 11, 2018, 1:38:46 PM5/11/18
to Jordan Liggitt, Tim St. Clair, Mayank, Kubernetes developer/contributor discussion
(Note that the default setup doesn't have this problem because all controller-managers talk only to their local apiserver (over localhost) and apiserver defaults the field. This is not ideal since it prevents running apiservers and controller managers at any ratio other than 1:1.)

We, as a project, could generally take one of two stances here. The extreme form of each position is:

1) Document: in an HA environment, the only safe upgrade path is first kube-apiserver, then controllers, then nodes. (And anyone running webhook admission controllers has to be ready for anything, as webhooks are potentially both clients and part of the core control plane.)
2) Add a rule: clients MUST be backwards compatible by one version. (Under this rule, there is a bug in controller-manager.)

I think the best thing to do is something a bit in the middle. Where we both define a control plane upgrade order, and then define the backwards compatibility requirements for each component. We already do this explicitly for kubelets, which may be up to two versions old and may NOT be newer than the control plane.

I think the required upgrade order has to be something like:

1. Webhooks (which are now required to be backwards compatible by at least one version)
2. kube-apiserver
3. Any aggregated apiservers
4. kube-controller-manager
5. cloud-controller-manager

Under this rule, the bug report in the OP should be directed to whomever did the control plane rollout.

If a roll-back is needed, it'd generally have to happen in the reverse order, which makes things very complicated. To allow roll-back and roll-forward to follow the same order, I think we'd have to adopt a pretty extreme form of position 2). Alternatively, we can take the approach we take with kubelet today: you don't roll kubelet version forward until you're certain you're not going to roll the control plane back. But this means you'd want to progress through those 5 steps slowly.

In general, approach 1) means pain for cluster operators and/or distro providers, and approach 2) means pain for controller / client authors (and a bigger test matrix).

Approach 2) is more consistent with the general Kubernetes "self-healing" philosophy, but (IMO) it requires more coordination from more people to get right (the upgrade has a bug if any of 1000 different people make a mistake; the OP has provided a great example of such a bug) and is therefore less likely to work in practice. It is hard to deterministically improve our testing around running with version-skewed control planes: even the 5 (incomplete) steps I gave already have 5! (120) orderings that perhaps should be tested separately; we'd have to use a randomized approach.


> email to kubernetes-de...@googlegroups.com.
> To post to this group, send email to kuberne...@googlegroups.com.
--
Cheers,
Timothy St. Clair

“Do all the good you can. By all the means you can. In all the ways
you can. In all the places you can. At all the times you can. To all
the people you can. As long as ever you can.”

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To post to this group, send email to kuberne...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To post to this group, send email to kuberne...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CALbx6sYooT-O4-zBPGObuZZq_01eB1DNUFHAMen98UKtypyuaA%40mail.gmail.com.

krma...@gmail.com

unread,
May 15, 2018, 2:42:10 AM5/15/18
to Kubernetes developer/contributor discussion
Thanks Daniel for your detailed thoughts. 

In this case do you see any potential issue with always doing both client and server side defaulting. If we had client side defaulting in the 1.9 controller manager talking to 1.7 apiserver, this could be avoided. Do you know of any pitfalls with that approach ? Agreed we cannot control what the webhooks and admission controllers will do, but at least we can provide this for the core kubernetes.
 Few more questions:-

- what is OP you refer below ?
- You mention that we dont role kubelet forward unless we are sure we wont rollback the control plane. Why ? and are these kubernetes upgrade rules written somewhere ?

My challenge is mainly is to keep upgrading clusters in a seamless manner while keeping the business running . We  should imo, optimize for the cluster operators ;-)(approach 2 or whatever it takes). So far we have done three upgrade, and every time we see some surprises or the other. Here are some challenges we face, including others we think we will face:-

- Test to avoid that our customers wont see a Container or pod restart.
- Moving from alpha to beta or stable version of workload objects
- Avoid unexpected behavior like we saw unexpected evictions because we forgot an admission flag
- client-go upgrades related to this and mock generation for these clients.
- change in behavior of existing objects like Secrets becoming readonly by default.(I am not saying this is not the right thing to do at all , just gathering my thoughts here )
- fixing things moving from annotations to fields
- etcd2 to etcd3 migration 
- Deprecation of ThirdPartyResource
- AffinityInAnnotations no longer supported so remove that and other such feature flags. Ideally it should become a noop and then eventually a release later it should be not supported.( I know i jumped a version , so you might be already doing this)
- flags like —api-server are no longer supported. Same the path should be flag becoming a noop and then not supported. I know that would mean people would not know that the flag doesnt work and would be up for surprises. There are tradeoffs.
- and many more i need to recollect.

I am not saying providing  a seamless upgrade is easy or providing backward compatibility is easy, but we need to stress the importance of this area. May be a have a working group or sig that just focusses on this aspect.


Cheers
Mayank
PS: I am happy to move this conversation to a issue where we can capture more discussion or thoughts.
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages