OLM from operatorhub.io for kubernetes (without Openshift)

David Macháček

unread,

Nov 2, 2020, 3:02:05 PM11/2/20

to Operator Framework

Hello,

what is the level o maturity of OLM for vanilla kubernetes? We are trying to deploy several operators using operatorhub.io on IBM Cloud Kubernetes (v1.18.10_1531). Currently v0.16.1 of OLM is recommended. Does this mean, OLM is some sort of beta version (since its <1)?

I am struggling to make OLM work properly (pods keep restarting) and some operators seem not working correctly and require plenty of additional configuration (like manually changing InstallPlanTypes, or sometimes operator requires its own namespace) although I know they work properly on OpenShift.

Is there any official statement describing the current level of operator support by "vanilla" kubernetes?

If there is and operators are not really production-ready for k8s, my organization is ready to migrate to OpenShift.

Daniel Messer

unread,

Nov 4, 2020, 5:30:35 AM11/4/20

to David Macháček, Operator Framework

Hi David,

thanks for reaching out. We do ship and test OLM upstream and it is a priority for us that OLM works on vanilla Kubernetes just as well as it does on OpenShift. In fact OLM itself does not carry OpenShift specifics. Some metadata structures are interpreted on OpenShift only (at the moment) and some features stem from use-cases in typical OpenShift environments (enterprise deployments). The main difference at runtime from an OLM perspective is the more stringent security model.

OLM being <v1.0.0 is more an attribute to it's API stability and longevity guarantees than any kind of maturity. So far OLM only releases feature that have gone through QE and we don't have beta or release candidates yet. Because the Operator ecosystem and Kubernetes extensibility models is ever changing we have not seen an OLM v1.0.0 release as we keep *adding* APIs. We haven't removed or significantly broken any API yet with the exception of OperatorSource (which really was only used on OpenShift with Quay.io's appregistry service). The more the API landscape stabilizes the more we are going to move to a stable release. If we do this now we might have to bump it to 2.0.0 very soon if we follow semver rules, due to the Operator API being introduced.

I wonder if your experience is with OLM itself or with the Operators on OperatorHub.io? Are OLM pods restarting?

/D

--
You received this message because you are subscribed to the Google Groups "Operator Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to operator-framew...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/operator-framework/daa2c1c0-4c40-4b85-a3b8-a1ad8f92fb44n%40googlegroups.com.

--

Daniel Messer

Product Manager Operator Framework & Quay

Red Hat OpenShift

David Macháček

unread,

Nov 5, 2020, 5:54:53 AM11/5/20

to Operator Framework

Thank you for your answer.

I work with IBM Cloud Kubernetes (v1.18.10_1531). But since I struggle to make them work (they dont seem to register any new Subscriptions deployed) I started to use OLM provided by operatorhub.io itself:

curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.17.0/install.sh | bash -s v0.17.0

Actuall behaviour is that operatorhubio-catalog keeps restarting (added -w to see the progress):

k get pods -n olm -w
NAME                                READY   STATUS             RESTARTS   AGE
catalog-operator-7b4788ffb5-x5x9d   1/1     Running            0          6m15s
olm-operator-678d76b95c-27hs2       1/1     Running            0          6m15s
operatorhubio-catalog-lgbsx         0/1     CrashLoopBackOff   4          6m15s
packageserver-644b549954-gw9x4      1/1     Running            0          6m1s
packageserver-644b549954-p7sgw      1/1     Running            0          6m1s
operatorhubio-catalog-lgbsx         0/1     Running            5          6m21s
operatorhubio-catalog-lgbsx         1/1     Running            5          6m27s
operatorhubio-catalog-lgbsx         0/1     OOMKilled          5          6m50s
operatorhubio-catalog-lgbsx         0/1     CrashLoopBackOff   5          6m53s

Description:

k describe pod operatorhubio-catalog-lgbsx -n olm
Name:         operatorhubio-catalog-lgbsx
Namespace:    olm
Priority:     0
Node:         10.85.150.103/10.85.150.103
Start Time:   Thu, 05 Nov 2020 10:22:24 +0100
Labels:       olm.catalogSource=operatorhubio-catalog
Annotations: kubernetes.io/psp: ibm-privileged-psp
Status:       Running
IP:           172.30.43.69
IPs:
IP: 172.30.43.69
Containers:
registry-server:
    Container ID:   containerd://1bfa5cb4b127c752c7686df440a756f265a39f68fd270146bde0ff1d363ffe33
    Image:          quay.io/operatorhubio/catalog:latest
    Image ID:       quay.io/operatorhubio/catalog@sha256:eae33bfed2fd562037020a60eae68a6063c85ba8fe01c1da9edb1e030c0eeb73
    Port:           50051/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Thu, 05 Nov 2020 10:28:45 +0100
      Finished:     Thu, 05 Nov 2020 10:29:13 +0100
    Ready:          False
    Restart Count: 5
    Limits:
      cpu:     100m
      memory: 100Mi
    Requests:
      cpu:        10m
      memory:     50Mi
    Liveness:     exec [grpc_health_probe -addr=localhost:50051] delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:    exec [grpc_health_probe -addr=localhost:50051] delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment: <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-rkkbr (ro)
Conditions:
Type              Status
Initialized       True
Ready             False
ContainersReady   False
PodScheduled      True
Volumes:
default-token-rkkbr:
    Type:        Secret (a volume populated by a Secret)
    SecretName: default-token-rkkbr
    Optional:    false
QoS Class:       Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations:
Events:
Type     Reason     Age                     From                    Message
----     ------     ----                    ----                    -------
Normal   Scheduled 8m1s                    default-scheduler       Successfully assigned olm/operatorhubio-catalog-lgbsx to 10.85.150.103
Normal   Pulling    7m59s                   kubelet, 10.85.150.103 Pulling image "quay.io/operatorhubio/catalog:latest"
Normal   Pulled     7m52s                   kubelet, 10.85.150.103 Successfully pulled image "quay.io/operatorhubio/catalog:latest"
Warning Unhealthy 7m9s                    kubelet, 10.85.150.103 Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: failed to start exec "8c7547c490369652cb84386bfd1fd70c9daab8197d6c4090163b23de09c84b7c": OCI runtime exec failed: exec failed: cannot exec a container that has stopped: unknown
Warning Unhealthy 6m32s                   kubelet, 10.85.150.103 Liveness probe failed: timeout: failed to connect service "localhost:50051" within 1s
Warning Unhealthy 6m32s                   kubelet, 10.85.150.103 Readiness probe failed:
Normal   Started    6m13s (x3 over 7m51s)   kubelet, 10.85.150.103 Started container registry-server
Warning Unhealthy 5m31s (x2 over 7m10s)   kubelet, 10.85.150.103 Liveness probe errored: rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 1s exceeded: context deadline exceeded
Warning Unhealthy 5m29s (x3 over 6m36s)   kubelet, 10.85.150.103 Readiness probe errored: rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 1s exceeded: context deadline exceeded
Normal   Pulled     5m (x3 over 7m9s)       kubelet, 10.85.150.103 Container image "quay.io/operatorhubio/catalog:latest" already present on machine
Normal   Created    5m (x4 over 7m52s)      kubelet, 10.85.150.103 Created container registry-server
Warning BackOff    2m58s (x11 over 6m32s) kubelet, 10.85.150.103 Back-off restarting failed container

Logs:

k logs operatorhubio-catalog-lgbsx -n olm
time="2020-11-05T09:28:47Z" level=info msg="Keeping server open for infinite seconds" database=/database/index.db port=50051
time="2020-11-05T09:28:47Z" level=info msg="serving registry" database=/database/index.db port=50051

Nodes top (the problem is not with OOM, since I am starting them on 10.85.150.103)

NAME            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
10.85.150.103   1620m        41%    1159Mi          8%
10.85.150.117   686m         17%    12638Mi         94%
10.85.150.120   748m         19%    11146Mi         83%
10.85.150.122   1051m        26%    11969Mi         89%

Thanks, David

Dne středa 4. listopadu 2020 v 11:30:35 UTC+1 uživatel dme...@redhat.com napsal:

Kevin Rizza

unread,

Nov 5, 2020, 8:02:04 AM11/5/20

to David Macháček, Operator Framework

Hey David,

So that's the pod that actually serves the currently available content to install on the cluster to OLM. It gets added and resolved by OLM itself by adding a resource to your install namespace called a CatalogSource.

Based on the events you logged, it looks like the problem is that the liveness / readiness probes are failing on the pod, which implies that the pod itself is not responding in a healthy way -- that pod should actually be pretty low latency and memory, so if something is going on there it could be networking related? Those pods aren't backed by a deployment, and the catalog-operator pod in that namespace is the thing that manages them. Could you take a look at and post the logs from the catalog-operator pod to see if there is anything obvious in there?

Thanks,

Kevin

To view this discussion on the web visit https://groups.google.com/d/msgid/operator-framework/24a65021-3f0e-4b70-b7ba-594b5bb6c2b4n%40googlegroups.com.

David Macháček

unread,

Nov 5, 2020, 8:33:39 AM11/5/20

to Operator Framework

I tried to deploy to 2 operators:

Davids-MacBook-Air:kong david.m...@ibm.com$ k get subscription --all-namespaces
NAMESPACE          NAME                        PACKAGE                  SOURCE                  CHANNEL
operators          my-elastic-cloud-eck        elastic-cloud-eck        operatorhubio-catalog   stable   <-- https://operatorhub.io/operator/elastic-cloud-eck
strimzi-operator   my-strimzi-kafka-operator   strimzi-kafka-operator   operatorhubio-catalog   stable <-- https://operatorhub.io/operator/strimzi-kafka-operator

Logs from catalog-operator:

time="2020-11-05T13:28:08Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=/apis/operators.coreos.com/v1alpha1/namespaces/strimzi-operator/subscriptions/my-strimzi-kafka-operator
time="2020-11-05T13:28:08Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=/apis/operators.coreos.com/v1alpha1/namespaces/strimzi-operator/subscriptions/my-strimzi-kafka-operator
time="2020-11-05T13:28:08Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=/apis/operators.coreos.com/v1alpha1/namespaces/strimzi-operator/subscriptions/my-strimzi-kafka-operator
time="2020-11-05T13:28:09Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=/apis/operators.coreos.com/v1alpha1/namespaces/operators/subscriptions/my-elastic-cloud-eck
time="2020-11-05T13:28:09Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=/apis/operators.coreos.com/v1alpha1/namespaces/operators/subscriptions/my-elastic-cloud-eck
time="2020-11-05T13:28:09Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=/apis/operators.coreos.com/v1alpha1/namespaces/operators/subscriptions/my-elastic-cloud-eck
time="2020-11-05T13:28:09Z" level=info msg="state.Key.Namespace=olm state.Key.Name=operatorhubio-catalog state.State=TRANSIENT_FAILURE"
time="2020-11-05T13:28:09Z" level=info msg="considered csvs without properties annotation during resolution: [elastic-cloud-eck.v1.2.1]"
time="2020-11-05T13:28:09Z" level=error msg="failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.21.124.28:50051: i/o timeout\"" catalog="{operatorhubio-catalog olm}"
time="2020-11-05T13:28:09Z" level=info msg="considered csvs without properties annotation during resolution: [strimzi-cluster-operator.v0.20.0 strimzi-cluster-operator.v0.19.0]"
time="2020-11-05T13:28:09Z" level=error msg="failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.21.124.28:50051: i/o timeout\"" catalog="{operatorhubio-catalog olm}"
E1105 13:28:09.975081       1 queueinformer_operator.go:290] sync "strimzi-operator" failed: constraints not satisfiable: strimzi-kafka-operator requires at least one of @existing/strimzi-operator//strimzi-cluster-operator.v0.20.0, strimzi-kafka-operator is mandatory, gvkunique/kafka.strimzi.io/v1alpha1/KafkaRebalance permits at most 1 of @existing/strimzi-operator//strimzi-cluster-operator.v0.20.0, @existing/strimzi-operator//strimzi-cluster-operator.v0.19.0, @existing/strimzi-operator//strimzi-cluster-operator.v0.19.0 is mandatory
I1105 13:28:09.975572       1 event.go:278] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"", Name:"strimzi-operator", UID:"f079185a-af58-4b19-a047-d06c4da0dc52", APIVersion:"v1", ResourceVersion:"3889096", FieldPath:""}): type: 'Warning' reason: 'ResolutionFailed' constraints not satisfiable: strimzi-kafka-operator requires at least one of @existing/strimzi-operator//strimzi-cluster-operator.v0.20.0, strimzi-kafka-operator is mandatory, gvkunique/kafka.strimzi.io/v1alpha1/KafkaRebalance permits at most 1 of @existing/strimzi-operator//strimzi-cluster-operator.v0.20.0, @existing/strimzi-operator//strimzi-cluster-operator.v0.19.0, @existing/strimzi-operator//strimzi-cluster-operator.v0.19.0 is mandatory
time="2020-11-05T13:28:10Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=/apis/operators.coreos.com/v1alpha1/namespaces/strimzi-operator/subscriptions/my-strimzi-kafka-operator
time="2020-11-05T13:28:10Z" level=info msg="considered csvs without properties annotation during resolution: [elastic-cloud-eck.v1.2.1]"
time="2020-11-05T13:28:10Z" level=error msg="failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.21.124.28:50051: i/o timeout\"" catalog="{operatorhubio-catalog olm}"
time="2020-11-05T13:28:10Z" level=info msg="considered csvs without properties annotation during resolution: [strimzi-cluster-operator.v0.20.0 strimzi-cluster-operator.v0.19.0]"
time="2020-11-05T13:28:10Z" level=error msg="failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.21.124.28:50051: i/o timeout\"" catalog="{operatorhubio-catalog olm}"
E1105 13:28:10.617315       1 queueinformer_operator.go:290] sync "strimzi-operator" failed: constraints not satisfiable: @existing/strimzi-operator//strimzi-cluster-operator.v0.19.0 is mandatory, strimzi-kafka-operator is mandatory, strimzi-kafka-operator requires at least one of @existing/strimzi-operator//strimzi-cluster-operator.v0.20.0, gvkunique/kafka.strimzi.io/v1alpha1/KafkaRebalance permits at most 1 of @existing/strimzi-operator//strimzi-cluster-operator.v0.19.0, @existing/strimzi-operator//strimzi-cluster-operator.v0.20.0
I1105 13:28:10.617672       1 event.go:278] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"", Name:"strimzi-operator", UID:"f079185a-af58-4b19-a047-d06c4da0dc52", APIVersion:"v1", ResourceVersion:"3889096", FieldPath:""}): type: 'Warning' reason: 'ResolutionFailed' constraints not satisfiable: @existing/strimzi-operator//strimzi-cluster-operator.v0.19.0 is mandatory, strimzi-kafka-operator is mandatory, strimzi-kafka-operator requires at least one of @existing/strimzi-operator//strimzi-cluster-operator.v0.20.0, gvkunique/kafka.strimzi.io/v1alpha1/KafkaRebalance permits at most 1 of @existing/strimzi-operator//strimzi-cluster-operator.v0.19.0, @existing/strimzi-operator//strimzi-cluster-operator.v0.20.0
.. and then this syncing/reconciliation loop starts again

Seems like IP 172.21.124.28 is unreachable, it belongs to operatorhubio-catalog so I dont understand how is that possible:

k get svc -n olm -o wide
NAME                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
operatorhubio-catalog              ClusterIP   172.21.124.28   <none>        50051/TCP   5m34s
packageserver-service              ClusterIP   172.21.207.36   <none>        5443/TCP    4m52s
v1-packages-operators-coreos-com   ClusterIP   172.21.146.74   <none>        443/TCP     4m53s

I work as cloud architect at IBM in Prague and I cooperate closely with Red Hat solution architects in Czechia, but unfortunatly they all are more specialized in OpenShift only and noone around here has evert tried to deploy operators in k8s production, especially in IBM Cloud :-(

Thanks very much!

David

david.m...@ibm.com

Dne čtvrtek 5. listopadu 2020 v 14:02:04 UTC+1 uživatel kri...@redhat.com napsal:

David Macháček

unread,

Nov 5, 2020, 9:31:27 AM11/5/20

to Operator Framework

I also tried to fix strimzi-operator mentioned in logs which seems to be failing:

NAME                               DISPLAY                       VERSION   REPLACES                           PHASE
strimzi-cluster-operator.v0.19.0   Strimzi                       0.19.0                                       Replacing
strimzi-cluster-operator.v0.20.0   Strimzi                       0.20.0    strimzi-cluster-operator.v0.19.0   Failed

so i deleted the original CSV its installPlan and eventually even the subscription and tried to redeploy it using

kubectl create -f https://operatorhub.io/install/strimzi-kafka-operator.yaml

But the problem keeps appearing with following logs from catalog-operator

time="2020-11-05T14:25:25Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=/apis/operators.coreos.com/v1alpha1/namespaces/operators/subscriptions/my-strimzi-kafka-operator
time="2020-11-05T14:25:26Z" level=info msg="considered csvs without properties annotation during resolution: [elastic-cloud-eck.v1.2.1]"
time="2020-11-05T14:25:26Z" level=error msg="failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.21.124.28:50051: i/o timeout\"" catalog="{operatorhubio-catalog olm}"
E1105 14:25:26.228978       1 queueinformer_operator.go:290] sync "operators" failed: constraints not satisfiable: strimzi-kafka-operator is mandatory, strimzi-kafka-operator has a dependency without any candidates to satisfy it
I1105 14:25:26.229882       1 event.go:278] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"", Name:"operators", UID:"ca6e37dc-cbd2-49e1-aa88-ac2503cc6d68", APIVersion:"v1", ResourceVersion:"1506920", FieldPath:""}): type: 'Warning' reason: 'ResolutionFailed' constraints not satisfiable: strimzi-kafka-operator is mandatory, strimzi-kafka-operator has a dependency without any candidates to satisfy it
time="2020-11-05T14:25:26Z" level=info msg="considered csvs without properties annotation during resolution: [elastic-cloud-eck.v1.2.1]"
time="2020-11-05T14:25:26Z" level=error msg="failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.21.124.28:50051: i/o timeout\"" catalog="{operatorhubio-catalog olm}"
E1105 14:25:26.819099       1 queueinformer_operator.go:290] sync "operators" failed: constraints not satisfiable: strimzi-kafka-operator has a dependency without any candidates to satisfy it, strimzi-kafka-operator is mandatory
I1105 14:25:26.819515       1 event.go:278] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"", Name:"operators", UID:"ca6e37dc-cbd2-49e1-aa88-ac2503cc6d68", APIVersion:"v1", ResourceVersion:"1506920", FieldPath:""}): type: 'Warning' reason: 'ResolutionFailed' constraints not satisfiable: strimzi-kafka-operator has a dependency without any candidates to satisfy it, strimzi-kafka-operator is mandatory

Dne čtvrtek 5. listopadu 2020 v 14:33:39 UTC+1 uživatel David Macháček napsal:

David Macháček

unread,

Nov 5, 2020, 9:40:09 AM11/5/20

to Operator Framework

I am closing the conversation since its more likely to be an IKS problem and continue to investigate using internal IBM channels. Thanks very much to all

Dne čtvrtek 5. listopadu 2020 v 15:31:27 UTC+1 uživatel David Macháček napsal:

Reply all

Reply to author

Forward