Hi Ben,
Thanks for the quick reply, I really appreciate it 😊.
Yes, we are running the latest release 1.5.0, as of today. A fix they have introduced in PR #631 causing the problem right now. The kuma-init sets the running user to 0, which collides with the pod security context of the repo-host pod, which has both runAsNonRoot=true and allowPrivilegeEscalation=false. The relevant part of the pods yaml from the live system:
securityContext:
allowPrivilegeEscalation: false
privileged: false
readOnlyRootFilesystem: true
runAsNonRoot: true
This problem is limited to the repo-host pod. The postgres database pod instances start without a problem, where the security context on the pod level is simply:
securityContext:
fsGroup: 26
runAsNonRoot: true
Here is the full definition of the repo-host from the system:
apiVersion: v1
kind: Pod
metadata:
annotations:
kuma.io/builtindns: enabled
kuma.io/builtindnsport: "15053"
kuma.io/envoy-admin-port: "9901"
kuma.io/mesh: default
kuma.io/sidecar-injected: "true"
kuma.io/sidecar-uid: "5678"
kuma.io/transparent-proxying: enabled
kuma.io/transparent-proxying-inbound-port: "15006"
kuma.io/transparent-proxying-inbound-v6-port: "15010"
kuma.io/transparent-proxying-outbound-port: "15001"
kuma.io/virtual-probes: enabled
kuma.io/virtual-probes-port: "9000"
creationTimestamp: "2022-04-01T07:58:47Z"
generateName: db-repo-host-
labels:
controller-revision-hash: db-repo-host-6d45c7cc98
postgres-operator.crunchydata.com/cluster: db
postgres-operator.crunchydata.com/data: pgbackrest
postgres-operator.crunchydata.com/pgbackrest: ""
postgres-operator.crunchydata.com/pgbackrest-dedicated: ""
statefulset.kubernetes.io/pod-name: db-repo-host-0
name: db-repo-host-0
namespace: backend
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: db-repo-host
uid: 1d78c7d0-85f2-4eea-bee4-5aa97f84d208
resourceVersion: "8076067"
uid: 4583d684-5c76-4876-afa6-d33790a5a85c
spec:
automountServiceAccountToken: false
containers:
- command:
- /usr/sbin/sshd
- -D
- -e
env:
- name: LD_PRELOAD
value: /usr/lib64/libnss_wrapper.so
- name: NSS_WRAPPER_PASSWD
value: /tmp/nss_wrapper/postgres/passwd
- name: NSS_WRAPPER_GROUP
value: /tmp/nss_wrapper/postgres/group
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.36-1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 2022
timeoutSeconds: 1
name: pgbackrest
resources:
limits:
cpu: 250m
memory: 256Mi
requests:
cpu: 250m
memory: 256Mi
securityContext:
allowPrivilegeEscalation: false
privileged: false
readOnlyRootFilesystem: true
runAsNonRoot: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/ssh
name: ssh
readOnly: true
- mountPath: /pgbackrest/repo1
name: repo1
- mountPath: /etc/pgbackrest/conf.d
name: pgbackrest-config
- mountPath: /tmp
name: tmp
- args:
- run
- --log-level=info
- --concurrency=2
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: INSTANCE_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: KUMA_CONTROL_PLANE_CA_CERT
value: |
-----BEGIN CERTIFICATE-----
somecert
-----END CERTIFICATE-----
- name: KUMA_CONTROL_PLANE_URL
value: https://kuma-control-plane.kuma:5678
- name: KUMA_DATAPLANE_DRAIN_TIME
value: 30s
- name: KUMA_DATAPLANE_MESH
value: default
- name: KUMA_DATAPLANE_NAME
value: $(POD_NAME).$(POD_NAMESPACE)
- name: KUMA_DATAPLANE_RUNTIME_TOKEN_PATH
value: /var/run/secrets/kubernetes.io/serviceaccount/token
- name: KUMA_DNS_CORE_DNS_BINARY_PATH
value: coredns
- name: KUMA_DNS_CORE_DNS_EMPTY_PORT
value: "15054"
- name: KUMA_DNS_CORE_DNS_PORT
value: "15053"
- name: KUMA_DNS_ENABLED
value: "true"
- name: KUMA_DNS_ENVOY_DNS_PORT
value: "15055"
image: docker.io/kumahq/kuma-dp:1.5.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 12
httpGet:
path: /ready
port: 9901
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
name: kuma-sidecar
readinessProbe:
failureThreshold: 12
httpGet:
path: /ready
port: 9901
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
resources:
limits:
cpu: "1"
memory: 512Mi
requests:
cpu: 50m
memory: 64Mi
securityContext:
runAsGroup: 5678
runAsUser: 5678
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostname: db-repo-host-0
initContainers:
- command:
- bash
- -c
- mkdir -p /pgbackrest/repo1/log
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.36-1
imagePullPolicy: IfNotPresent
name: pgbackrest-log-dir
resources:
limits:
cpu: 250m
memory: 256Mi
requests:
cpu: 250m
memory: 256Mi
securityContext:
allowPrivilegeEscalation: false
privileged: false
readOnlyRootFilesystem: true
runAsNonRoot: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /pgbackrest/repo1
name: repo1
- mountPath: /tmp
name: tmp
- command:
- bash
- -c
- NSS_WRAPPER_SUBDIR=postgres CRUNCHY_NSS_USERNAME=postgres CRUNCHY_NSS_USER_DESC="postgres"
/opt/crunchy/bin/nss_wrapper.sh
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.36-1
imagePullPolicy: IfNotPresent
name: nss-wrapper-init
resources:
limits:
cpu: 250m
memory: 256Mi
requests:
cpu: 250m
memory: 256Mi
securityContext:
allowPrivilegeEscalation: false
privileged: false
readOnlyRootFilesystem: true
runAsNonRoot: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: tmp
- args:
- --redirect-outbound-port
- "15001"
- --redirect-inbound=true
- --redirect-inbound-port
- "15006"
- --redirect-inbound-port-v6
- "15010"
- --kuma-dp-uid
- "5678"
- --exclude-inbound-ports
- ""
- --exclude-outbound-ports
- ""
- --verbose
- --skip-resolv-conf
- --redirect-all-dns-traffic
- --redirect-dns-port
- "15053"
command:
- /usr/bin/kumactl
- install
- transparent-proxy
image: docker.io/kumahq/kuma-init:1.5.0
imagePullPolicy: IfNotPresent
name: kuma-init
resources:
limits:
cpu: 100m
memory: 50M
requests:
cpu: 10m
memory: 10M
securityContext:
capabilities:
add:
- NET_ADMIN
- NET_RAW
runAsGroup: 0
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
nodeName: aks-computepool-31459603-vmss000018
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 26
runAsNonRoot: true
serviceAccount: default
serviceAccountName: default
subdomain: db-pods
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
topologySpreadConstraints:
- labelSelector:
matchExpressions:
- key: postgres-operator.crunchydata.com/data
operator: In
values:
- postgres
- pgbackrest
matchLabels:
postgres-operator.crunchydata.com/cluster: db
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
- labelSelector:
matchExpressions:
- key: postgres-operator.crunchydata.com/data
operator: In
values:
- postgres
- pgbackrest
matchLabels:
postgres-operator.crunchydata.com/cluster: db
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
volumes:
- name: ssh
projected:
defaultMode: 32
sources:
- configMap:
name: db-ssh-config
- secret:
name: db-ssh
- name: repo1
persistentVolumeClaim:
claimName: db-repo1
- name: pgbackrest-config
projected:
defaultMode: 420
sources:
- configMap:
items:
- key: pgbackrest_repo.conf
path: pgbackrest_repo.conf
- key: config-hash
path: config-hash
name: db-pgbackrest-config
- emptyDir:
sizeLimit: 16Mi
name: tmp
Cheers,
Benedek
Hi Ben,
The problem is very much on kuma’s side (see Issue 1, Issue 2), but they are working on a solution https://github.com/kumahq/kuma/issues/3925. In the meantime I have come up with a workaround: I have created a dynamic admission controller – a mutating webhook, which modifies the kuma-init container’s security context by explicitly setting the runAsNonRoot as false (and overriding the Pod settings with it). This helps avoiding the init error and the database can start.
However PGO still cannot run in a kuma-service mesh, because of an other error:
Warning FailedToGenerateKumaDataplane 10s (x22 over 3m9s) k8s.kuma.io/dataplane-generator Failed to generate Kuma Dataplane: unable to translate a Pod into a Dataplane: A service that selects pod db-main-t9sl-0 was found, but it doesn't match any container ports.
The network sidecar can not be properly set up because the services created by PGO they either have a selector (that select the pod) or a port assigned to them (as far as I understand). Can this be solved somehow?
Best Regards,
Benedek