Sidecar container breaks non-root policy

1,647 views
Skip to first unread message

Benedek Koncz

unread,
Mar 31, 2022, 3:23:54 PM3/31/22
to Postgres Operator
Hi there,

I am trying to create a new postgres cluster with the help of PGO in a kuma service-mesh (https://kuma.io), but I run into a problem I can not find a workaround for. 

In the service-mesh - in a given namespace - a side car container is injected automatically into every pod. The sidecar container acts as a network proxy to provide resiliency to a all service-to-service communication. During pod initialisation a the repo-host (and would also consequetly the postgres pods) fails with CreateContainerConfigError. If you check the events it can be seen that there is a problem with the running user privileges.  

 container''s runAsUser breaks non-root policy (pod: "db-repo-host-0_backend(82803fd5-34f0-4f16-9bf8-3b3b241a6d39)

The init-container related to the nertwork proxy sidecar (kuma-init) try to run as root, but at the pod level the runAsRoot property is set to false, which in turn results in a policy break. (I thought according to K8S documentation that the container level settings would simply overwrite the pod level settings https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context, but apparently not).

I tried to find a way to modify the pod level security context for the cluster instance pods, but I could not find any. I have tried to annotate the instances at spec.instances[].metadata.annotation with kuma.io/sidecar-injection: disabled, but seems to have no affect either.

Any tips?

Thank you,
Benedek

Ben Blattberg

unread,
Mar 31, 2022, 4:08:34 PM3/31/22
to Postgres Operator, Benedek Koncz
Hello,
I was going to point you to an issue I found on Kuma's github, but I see you're already commenting there. I do see that there was a bug which was supposedly fixed 2 years ago: https://github.com/kumahq/kuma/pull/631 -- I assume you're using a version of Kuma that has that fix, but just wanted to double-check. This seems like a problem that Kuma would run into frequently and so would have had to solve, so I'm curious if that fix isn't working.
-Ben

Benedek Koncz

unread,
Apr 1, 2022, 7:42:46 AM4/1/22
to Postgres Operator, Ben Blattberg, Benedek Koncz

Hi Ben,

Thanks for the quick reply, I really appreciate it 😊. 

Yes, we are running the latest release 1.5.0, as of today. A fix they have introduced in PR #631 causing the problem right now. The kuma-init sets the running user to 0, which collides with the pod security context of the repo-host pod, which has both runAsNonRoot=true and allowPrivilegeEscalation=false. The relevant part of the pods yaml from the live system:

    securityContext:

      allowPrivilegeEscalation: false

      privileged: false

      readOnlyRootFilesystem: true

      runAsNonRoot: true

 

This problem is limited to the repo-host pod. The postgres database pod instances start without a problem, where the security context on the pod level is simply:

   securityContext:

    fsGroup: 26

    runAsNonRoot: true 


Here is the full definition of the repo-host from the system:

apiVersion: v1

kind: Pod

metadata:

  annotations:

    kuma.io/builtindns: enabled

    kuma.io/builtindnsport: "15053"

    kuma.io/envoy-admin-port: "9901"

    kuma.io/mesh: default

    kuma.io/sidecar-injected: "true"

    kuma.io/sidecar-uid: "5678"

    kuma.io/transparent-proxying: enabled

    kuma.io/transparent-proxying-inbound-port: "15006"

    kuma.io/transparent-proxying-inbound-v6-port: "15010"

    kuma.io/transparent-proxying-outbound-port: "15001"

    kuma.io/virtual-probes: enabled

    kuma.io/virtual-probes-port: "9000"

  creationTimestamp: "2022-04-01T07:58:47Z"

  generateName: db-repo-host-

  labels:

    controller-revision-hash: db-repo-host-6d45c7cc98

    postgres-operator.crunchydata.com/cluster: db

    postgres-operator.crunchydata.com/data: pgbackrest

    postgres-operator.crunchydata.com/pgbackrest: ""

    postgres-operator.crunchydata.com/pgbackrest-dedicated: ""

    statefulset.kubernetes.io/pod-name: db-repo-host-0

  name: db-repo-host-0

  namespace: backend

  ownerReferences:

  - apiVersion: apps/v1

    blockOwnerDeletion: true

    controller: true

    kind: StatefulSet

    name: db-repo-host

    uid: 1d78c7d0-85f2-4eea-bee4-5aa97f84d208

  resourceVersion: "8076067"

  uid: 4583d684-5c76-4876-afa6-d33790a5a85c

spec:

  automountServiceAccountToken: false

  containers:

  - command:

    - /usr/sbin/sshd

    - -D

    - -e

    env:

    - name: LD_PRELOAD

      value: /usr/lib64/libnss_wrapper.so

    - name: NSS_WRAPPER_PASSWD

      value: /tmp/nss_wrapper/postgres/passwd

    - name: NSS_WRAPPER_GROUP

      value: /tmp/nss_wrapper/postgres/group

    image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.36-1

    imagePullPolicy: IfNotPresent

    livenessProbe:

      failureThreshold: 3

      periodSeconds: 10

      successThreshold: 1

      tcpSocket:

        port: 2022

      timeoutSeconds: 1

    name: pgbackrest

    resources:

      limits:

        cpu: 250m

        memory: 256Mi

      requests:

        cpu: 250m

        memory: 256Mi

    securityContext:

      allowPrivilegeEscalation: false

      privileged: false

      readOnlyRootFilesystem: true

      runAsNonRoot: true

    terminationMessagePath: /dev/termination-log

    terminationMessagePolicy: File

    volumeMounts:

    - mountPath: /etc/ssh

      name: ssh

      readOnly: true

    - mountPath: /pgbackrest/repo1

      name: repo1

    - mountPath: /etc/pgbackrest/conf.d

      name: pgbackrest-config

    - mountPath: /tmp

      name: tmp

  - args:

    - run

    - --log-level=info

    - --concurrency=2

    env:

    - name: POD_NAME

      valueFrom:

        fieldRef:

          apiVersion: v1

          fieldPath: metadata.name

    - name: POD_NAMESPACE

      valueFrom:

        fieldRef:

          apiVersion: v1

          fieldPath: metadata.namespace

    - name: INSTANCE_IP

      valueFrom:

        fieldRef:

          apiVersion: v1

          fieldPath: status.podIP

    - name: KUMA_CONTROL_PLANE_CA_CERT

      value: |

        -----BEGIN CERTIFICATE-----

        somecert

        -----END CERTIFICATE-----

    - name: KUMA_CONTROL_PLANE_URL

      value: https://kuma-control-plane.kuma:5678

    - name: KUMA_DATAPLANE_DRAIN_TIME

      value: 30s

    - name: KUMA_DATAPLANE_MESH

      value: default

    - name: KUMA_DATAPLANE_NAME

      value: $(POD_NAME).$(POD_NAMESPACE)

    - name: KUMA_DATAPLANE_RUNTIME_TOKEN_PATH

      value: /var/run/secrets/kubernetes.io/serviceaccount/token

    - name: KUMA_DNS_CORE_DNS_BINARY_PATH

      value: coredns

    - name: KUMA_DNS_CORE_DNS_EMPTY_PORT

      value: "15054"

    - name: KUMA_DNS_CORE_DNS_PORT

      value: "15053"

    - name: KUMA_DNS_ENABLED

      value: "true"

    - name: KUMA_DNS_ENVOY_DNS_PORT

      value: "15055"

    image: docker.io/kumahq/kuma-dp:1.5.0

    imagePullPolicy: IfNotPresent

    livenessProbe:

      failureThreshold: 12

      httpGet:

        path: /ready

        port: 9901

        scheme: HTTP

      initialDelaySeconds: 60

      periodSeconds: 5

      successThreshold: 1

      timeoutSeconds: 3

    name: kuma-sidecar

    readinessProbe:

      failureThreshold: 12

      httpGet:

        path: /ready

        port: 9901

        scheme: HTTP

      initialDelaySeconds: 1

      periodSeconds: 5

      successThreshold: 1

      timeoutSeconds: 3

    resources:

      limits:

        cpu: "1"

        memory: 512Mi

      requests:

        cpu: 50m

        memory: 64Mi

    securityContext:

      runAsGroup: 5678

      runAsUser: 5678

    terminationMessagePath: /dev/termination-log

    terminationMessagePolicy: File

  dnsPolicy: ClusterFirst

  enableServiceLinks: true

  hostname: db-repo-host-0

  initContainers:

  - command:

    - bash

    - -c

    - mkdir -p /pgbackrest/repo1/log

    image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.36-1

    imagePullPolicy: IfNotPresent

    name: pgbackrest-log-dir

    resources:

      limits:

        cpu: 250m

        memory: 256Mi

      requests:

        cpu: 250m

        memory: 256Mi

    securityContext:

      allowPrivilegeEscalation: false

      privileged: false

      readOnlyRootFilesystem: true

      runAsNonRoot: true

    terminationMessagePath: /dev/termination-log

    terminationMessagePolicy: File

    volumeMounts:

    - mountPath: /pgbackrest/repo1

      name: repo1

    - mountPath: /tmp

      name: tmp

  - command:

    - bash

    - -c

    - NSS_WRAPPER_SUBDIR=postgres CRUNCHY_NSS_USERNAME=postgres CRUNCHY_NSS_USER_DESC="postgres"

      /opt/crunchy/bin/nss_wrapper.sh

    image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.36-1

    imagePullPolicy: IfNotPresent

    name: nss-wrapper-init

    resources:

      limits:

        cpu: 250m

        memory: 256Mi

      requests:

        cpu: 250m

        memory: 256Mi

    securityContext:

      allowPrivilegeEscalation: false

      privileged: false

      readOnlyRootFilesystem: true

      runAsNonRoot: true

    terminationMessagePath: /dev/termination-log

    terminationMessagePolicy: File

    volumeMounts:

    - mountPath: /tmp

      name: tmp

  - args:

    - --redirect-outbound-port

    - "15001"

    - --redirect-inbound=true

    - --redirect-inbound-port

    - "15006"

    - --redirect-inbound-port-v6

    - "15010"

    - --kuma-dp-uid

    - "5678"

    - --exclude-inbound-ports

    - ""

    - --exclude-outbound-ports

    - ""

    - --verbose

    - --skip-resolv-conf

    - --redirect-all-dns-traffic

    - --redirect-dns-port

    - "15053"

    command:

    - /usr/bin/kumactl

    - install

    - transparent-proxy

    image: docker.io/kumahq/kuma-init:1.5.0

    imagePullPolicy: IfNotPresent

    name: kuma-init

    resources:

      limits:

        cpu: 100m

        memory: 50M

      requests:

        cpu: 10m

        memory: 10M

    securityContext:

      capabilities:

        add:

        - NET_ADMIN

        - NET_RAW

      runAsGroup: 0

      runAsUser: 0

    terminationMessagePath: /dev/termination-log

    terminationMessagePolicy: File

  nodeName: aks-computepool-31459603-vmss000018

  preemptionPolicy: PreemptLowerPriority

  priority: 0

  restartPolicy: Always

  schedulerName: default-scheduler

  securityContext:

    fsGroup: 26

    runAsNonRoot: true

  serviceAccount: default

  serviceAccountName: default

  subdomain: db-pods

  terminationGracePeriodSeconds: 30

  tolerations:

  - effect: NoExecute

    key: node.kubernetes.io/not-ready

    operator: Exists

    tolerationSeconds: 300

  - effect: NoExecute

    key: node.kubernetes.io/unreachable

    operator: Exists

    tolerationSeconds: 300

  - effect: NoSchedule

    key: node.kubernetes.io/memory-pressure

    operator: Exists

  topologySpreadConstraints:

  - labelSelector:

      matchExpressions:

      - key: postgres-operator.crunchydata.com/data

        operator: In

        values:

        - postgres

        - pgbackrest

      matchLabels:

        postgres-operator.crunchydata.com/cluster: db

    maxSkew: 1

    topologyKey: kubernetes.io/hostname

    whenUnsatisfiable: ScheduleAnyway

  - labelSelector:

      matchExpressions:

      - key: postgres-operator.crunchydata.com/data

        operator: In

        values:

        - postgres

        - pgbackrest

      matchLabels:

        postgres-operator.crunchydata.com/cluster: db

    maxSkew: 1

    topologyKey: topology.kubernetes.io/zone

    whenUnsatisfiable: ScheduleAnyway

  volumes:

  - name: ssh

    projected:

      defaultMode: 32

      sources:

      - configMap:

          name: db-ssh-config

      - secret:

          name: db-ssh

  - name: repo1

    persistentVolumeClaim:

      claimName: db-repo1

  - name: pgbackrest-config

    projected:

      defaultMode: 420

      sources:

      - configMap:

          items:

          - key: pgbackrest_repo.conf

            path: pgbackrest_repo.conf

          - key: config-hash

            path: config-hash

          name: db-pgbackrest-config

  - emptyDir:

      sizeLimit: 16Mi

    name: tmp

 

Cheers,

Benedek

Benedek Koncz

unread,
Apr 1, 2022, 7:43:15 AM4/1/22
to Postgres Operator, Ben Blattberg, Benedek Koncz
Hi,

Sorry I was mistaken in my previous message. BOTH the posgtres and pgbackrest pods are affacted by the problem. They can not pass initialisation if the kuma-init container is injected by kuma.

Bests,
Benedek

Ben Blattberg a következőt írta (2022. március 31., csütörtök, 22:08:34 UTC+2):

Koncz Benedek

unread,
Apr 20, 2022, 6:38:41 AM4/20/22
to Postgres Operator, Ben Blattberg

Hi Ben,

 

The problem is very much on kuma’s side (see Issue 1Issue 2), but they are working on a solution https://github.com/kumahq/kuma/issues/3925. In the meantime I have come up with a workaround: I have created a dynamic admission controller – a mutating webhook, which modifies the kuma-init container’s security context by explicitly setting the runAsNonRoot as false (and overriding the Pod settings with it). This helps avoiding the init error and the database can start.

 

However PGO still cannot run in a kuma-service mesh, because of an other error:

Warning  FailedToGenerateKumaDataplane  10s (x22 over 3m9s)  k8s.kuma.io/dataplane-generator  Failed to generate Kuma Dataplane: unable to translate a Pod into a Dataplane: A service that selects pod db-main-t9sl-0 was found, but it doesn't match any container ports.

 

The network sidecar can not be properly set up because the services created by PGO they either have a selector (that select the pod) or a port assigned to them (as far as I understand). Can this be solved somehow?


Best Regards,


Benedek

Chris Bandy

unread,
May 3, 2022, 1:33:29 PM5/3/22
to Koncz Benedek, Postgres Operator, Ben Blattberg
Benedek,

It sounds like Kuma is looking for a Service with a selector that matches the Postgres primary instance. We don't have a Service for that. Instead, we've configured Patroni to use Endpoints for its elections because it avoids some split-brain failure scenarios. The "{cluster}-primary" headless Service has ports and is an alias to the Endpoints managed by Patroni.

Is there any other way to indicate to Kuma how this Pod should join the mesh?

-- Chris
Reply all
Reply to author
Forward
0 new messages