Feedback needed - building custom controller loops for fun and profit

269 views
Skip to first unread message

Clayton Coleman

unread,
Sep 7, 2016, 6:42:26 PM9/7/16
to kubernetes-dev
TL:DR If you've wanted to write a custom controller in Kube (for an existing resource or a new third party resource), try this out and give feedback on whether it is useful.

Controllers are a fundamental part of Kubernetes (they're the logic loops that ensure all the declarative APIs make it into reality) but for a long time they have been very difficult to write.  We want to start taking steps to make it easier to build simple integrations with Kubernetes and prototype new declarative patterns (like pet set controllers and extensions to services / ingress).  

We're looking for feedback on a prototype implementation of a "script controller" - a simple command in kubectl / oc (the openshift cli) that makes it easy to solve problems like:

  I want to do Y on every resource of type X

where X is any resource, third party or otherwise, and Y is anything.  What is anything?  Some examples are:

* Ensure every namespace gets a Quota and LimitRange (so that users can't use unlimited cluster resources)
* Ensure every service is registered in my cloud / private / zone file DNS provider
* Send an email alert every time a node transitions to "NotReady"
* On an NFS server, add an NFS export anytime a PVC is created and bind to that PVC (i.e. a custom dynamic provisioner)
* Delete any pod that has restarted more than 100 times and send an email to the owner
* Anytime a pod fails scheduling, write an annotation on the deployment that launched the pod indicating when and why.

In Kubernetes we say controllers are usually level-driven (they depend not on seeing the change happen, but on whatever the current state is) and that they maintain some invariant (if X is true, Y should also be true).  Sometimes you also need to keep an external system in sync - if you crash, have a bug, or someone messes around with the external system you need to be able to reconcile those changes.  This is a lot more complicated than just "watch" - it's a reconciliation loop, and as we've shown over the last few years it's tricky to get right.


## So?

We've created a prototype "observe" command on the OpenShift CLI that works with Kubernetes or OpenShift to explore how we can make custom controllers easy / useful.  It's available as an image "openshift/observe:latest" so you can run it on a local machine, or you can download the 1.3.0-rc1 release binary of OpenShift.  This is tracking Allow admins to implement controller loops from the CLI (IFTTT) and is step one: getting feedback.

The goal of the "observe" command is to help you build simple controller loops and help you do them correctly.  That is:

1. Make it easy to run a script once for every X in the system, and to re-run that script when X changes
2. Make it easy to get data off X - you shouldn't have to jump through 'jq' hoops if you want to fetch a service IP or annotation value
3. Make it *possible* to write a correct reconciliation loop against an external system, and help guide you through the gotchas
4. Be friendly for simple integrations, possible to write complex integrations, and eventually be able to tell you when you need to go write some Go code.  Not every problem can be solved here.

If this describes a problem you've had that you've cobbled together a bunch of scripts for, please give the new command a try and see if it helps / hurts / is amazing for your use case, and give us feedback.


## Try it out

Get it locally - download the latest Origin client release binaries or run:

    docker run --entrypoint /bin/bash -it openshift/observe:latest
    # copy in a kubeconfig for your cluster or login using `oc config` or `oc login`
    $ oc observe -h

Watch everything:

    # for every service, and any time a service changes, print out info
    $ oc observe --all-namespaces services

Add something to do:

    # for every service, and any time a service changes, echo
    $ oc observe --all-namespaces services -- echo

You'll see that this prints out namespace and name for each one as arguments 1 and 2 to echo.  If you create / delete a service in the background, you'll see it show up in this list (the update, at least).

Add the service IP to the output:

    $ oc observe --all-namespaces services -a '{ .spec.clusterIP }' -- echo

We've used '-a' to print a JSONPath style template for each object, which becomes the last argument of the command.   

To turn this out into something practical, create a new script called 'record.sh' in the current directory and make it executable:

    $ cat record.sh
    #!/bin/sh
    echo $1 $2 $3 >> services

    $ oc observe --all-namespaces services -a '{ .spec.clusterIP }' -- ./record.sh

All services and their IPs will be recorded in that local file.  You can extend that to anything you can do with bash.

The more complex case is handling deletions.  Say you want to create an ingress for every service, but if the service gets deleted you want to delete the ingress.  To properly cleanup, we need to know the ingresses that were created this way.

    $ cat create.sh
    #!/bin/sh
    echo "{\"kind\":\"Ingress\": \"apiVersion\": \"extensions/v1beta1\",\"metadata\":{\"name\":\"$2\"}, ...}' kubectl create -f - --namespace $1
    kubectl annotate ingress/$2 fromservice=true
    
    $ cat names.sh
    #!/bin/sh
    kubectl get ingress --all-namespaces --template '{{ range .items }}{{ if eq (or .metadata.annotation.fromservice "") "true" }}{{ .metadata.namespace }}/{{ .metadata.name }}{{"\n"}}{{ end }}{{ end }}'

    $ cat delete.sh
    #!/bin/sh
    kubectl delete ingress $2 --namespace=$1

    $ oc observe --all-namespaces services --delete ./delete.sh --names=./names.sh -- ./create.sh

The first script creates an ingress with the same name as the service and sets an annotation.  The second walks every ingress and outputs namespace/name for any that have the annotation fromservice=true (note that the go template here is actually not enough - you have to check for the annotation being empty because it will error otherwise).  

The combination of those allows the observer to detect that a service has been deleted while it was not running - any ingress that has the annotation was created by a service, and since they match names, that must mean that a service was deleted.  If a user deletes a service directly, we'll get the watch notification - but not if we crashed, or on initial sync.

This reconciliation is tricky to get right - but observe is able to use the exact same pattern and code that Kubernetes uses to ensure we only fix critical reconcile bugs once.

There are other options around failure modes, retries, metrics endpoints, and restart behavior.  Please see the help for more.


## Please give feedback

The sorts of feedback that are really useful are:

* Can you build something useful with this?
* Does this simplify scripting that you were already doing?
* Is the command obvious / intuitive if you know your way around kube?
* What sort of gotchas have you hit doing this yourself, and what can we do to make this simpler?
* Would you use this to build third party resources?
* What else do you need?
* How much do you hate JSONPath and Go templates in this use case?

We've tried to base this on real working controllers we've seen people trying to build - our hope is that we can make this a generally useful bit of glue code for all kubernetes clusters (eventually making it into kubectl once we sort out the use cases).


Thanks!

Maru Newby

unread,
Sep 30, 2016, 1:26:23 AM9/30/16
to Clayton Coleman, kubernetes-dev
I've used your complex example as the basis for syncing /etc/hosts on
a master with node ip's retrieved from an openshift cluster to
workaround https://github.com/kubernetes/kubernetes/issues/22063:

https://github.com/openshift/origin/pull/11061/files#diff-4f467ba061fec5cb2baef98fb4497c5c

(This is hopefully moot very soon given justinsb's
https://github.com/kubernetes/kubernetes/pull/33718)

I naively assumed the result of 'names' would be used to filter
events, but re-reading the example it's clear that --names is used in
determining when to call the deletion script rather than when to call
the addition script. For my use case it was necessary to have the add
script filter out hostname+ip pairs that already existed in
/etc/hosts, otherwise every node heartbeat added another hosts entry.

Node events happen frequently due to heartbeats, so there is more
activity from this simple watch than I was expecting. Is it
preferable to filter events at the client side vs implementing
filtering on the server side (e.g. only send me events representing
changes I care about)? I imagine it's a tradeoff between network
traffic vs increasing the cost of server-side processing.
> --
> You received this message because you are subscribed to the Google Groups
> "Kubernetes developer/contributor discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kubernetes-de...@googlegroups.com.
> To post to this group, send email to kuberne...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kubernetes-dev/CAH16ShJ%3DAs0%3DbtU0QmmpY_oPwpiq4scngy36khLtJcgiMG4cKA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Clayton Coleman

unread,
Sep 30, 2016, 10:39:56 AM9/30/16
to Maru Newby, kubernetes-dev
The challenge is the server side filtering is limited today (we need to reach consensus on field filtering), and client side filtering can be more expressive.  I'd like to add support to both sides - one wrinkle is determining when objects cross in and out of filtering, which we've solved server side by implementing delete events for those changes, but which we don't do for client side filtering today.

So basically yes, filtering at least equivalent to what "kubectl get" supports should be enabled, field support should be possible, but for now you'd need to do it in the loop.  I suspect a client side go filtering would be able to ensure that even very rapid changing resource types are still practical to use in a single threaded bash loop.

> email to kubernetes-dev+unsubscribe@googlegroups.com.
> To post to this group, send email to kubernetes-dev@googlegroups.com.
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages