cc @kubernetes/sig-storage-bugs
I guess and think this is out of scope to include rbd, but very unfortunate bug indeed.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
I am working on kubernetes/features#278 - /usr/bin/rbd (and similar tools) could run in containers. On GKE you would run a daemonset with all Ceph utilities and you won't need anything on the nodes nor master(s).
@jsafrane Would that incorporate dynamic storage provisioning for the controller manager as well?
@luxas, probably not in alpha, but in the end yes, no /usr/bin/rbd on controller-manager host.
The hyperkube version of controller-manager
includes a GlusterFS client (albeit a very out of date one), so why the disparity with Ceph? I understand that it can't and shouldn't support all the different storage provisioners, but I think it is a reasonable to expectation to at least support the most common ones.
As I understand it, the efforts being made at https://github.com/kubernetes-incubator/external-storage will eventually result in provisioner support in a clean and nicely separated fashion. There is currently nothing for Ceph RBD (see kubernetes-incubator/external-storage#99) but there is preliminary support for CephFS.
On a related note, the recent Kubernetes Community Meeting 20170615 featured a 20 minutes demo of another provisioner in that project: Local Persistent Storage by Michelle Au (@msau42). That's expected to be included in Kubernetes 1.7 with more features planned for 1.8 and 1.9, so stuff is happening.
I was confused here: Why the rbd was needed in the controller-manager
container? I found that the failure log was found in kubelet, so seems we need to put the rbd
binary to kubelet but not controller manager?
E0705 08:31:20.223470 31869 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/rbd/9f3c21b9-615a-11e7-aea9-525400852aca-rbdpd\" (\"9f3c21b9-615a-11e7-aea9-525400852aca\")" failed. No retries permitted until 2017-07-05 08:33:20.223442369 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/rbd/9f3c21b9-615a-11e7-aea9-525400852aca-rbdpd" (spec.Name: "rbdpd") pod "9f3c21b9-615a-11e7-aea9-525400852aca" (UID: "9f3c21b9-615a-11e7-aea9-525400852aca") with: rbd: failed to modprobe rbd error:executable file not found in $PATH
hi, there are two parts:
controller-manager
needs to access rbd
binary to create new image in ceph cluster for your PVC.rbd
utility included (kubernetes-incubator/external-storage#200), then controller-manager
do not need access rbd
binary anymore.kubelet
needs to access rbd
binary to attach (rbd map
) and detach (rbd unmap
) RBD image on node. If kubelet
is running on the host, host needs to install rbd
utility (install ceph-common
package on most Linux distributions).Ah, I see, I was not using dynamic provision
, so no need to update controller-manager
. But as now most people are using hyperkube
and all kubenetes services are using same image, so if we can put the rbd
binary to the hyperkube
image, then it should works. Thanks @cofyc
@cofyc another question is if I run kubernetes directly on the host but not in container, then once I installed rbd utility
, then ceph + kubernetes should works for both controller-manager and kubelet, right?
We solved this by creating our own images and adding the kube-controller-binary in it.
Dockerfile:
FROM ubuntu:16.04 ARG KUBERNETES_VERSION=v1.6.4 ENV DEBIAN_FRONTEND=noninteractive \ container=docker \ KUBERNETES_DOWNLOAD_ROOT=https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_VERSION}/bin/linux/amd64 \ KUBERNETES_COMPONENT=kube-controller-manager RUN echo 'deb http://download.ceph.com/debian-kraken xenial main' > /etc/apt/sources.list.d/download_ceph_com_debian_kraken.list RUN set -x \ && apt-get update \ && apt-get install -y --allow-unauthenticated \ ceph-common=11.2.0-1xenial \ curl \ && curl -L ${KUBERNETES_DOWNLOAD_ROOT}/${KUBERNETES_COMPONENT} -o /usr/bin/${KUBERNETES_COMPONENT} \ && chmod +x /usr/bin/${KUBERNETES_COMPONENT} \ && apt-get purge -y --auto-remove \ curl \ && rm -rf /var/lib/apt/lists/*
Note that we are requesting a specific version of the ceph-common package (11.2.0)
Feel free to use/adapt for your needs
@gyliu513 Yes
hi, guys,
Now you can avoid using customized kube-controller
image, external-storage out-of-tree RBD provisioner is merged, you can use it instead。Here is guide:
1, Deploy standalone rbd-provisioner
controller:
Note: Currently v0.1.0 is latest version, you can always check newest version here.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: rbd-provisioner
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
app: rbd-provisioner
spec:
containers:
- name: rbd-provisioner
image: "quay.io/external_storage/rbd-provisioner:v0.1.0"
2, Then configure storage class:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rbd
provisioner: ceph.com/rbd
parameters:
monitors: <ceph monitors addresses>
pool: <pool to use>
adminId: <admin id>
adminSecretNamespace: <admin id secret namespace>
adminSecretName: <admin id secret name>
userId: <user id>
userSecretName: <user id secret name>
Now you can create PVC using rbd
as storageClassName
, e.g.:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ceph-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: rbd
See also kubernetes-incubator/external-storage#206 kubernetes-incubator/external-storage#200.
cc @sbezverk @gyliu513 @Platzii @ianchakeres @kongslund @jingxu97 @thanodnl @v1k0d3n
@cofyc While using this external-provisioner, I got error in the provisioner container:
kubectl logs rbd-provisioner-1825796386-01nqz -n kube-system |more
I0718 02:09:24.062514 1 main.go:70] Creating RBD provisioner with identity: 1e7aa80f-6b5e-11e7-a22f-faa5ad76f27a
I0718 02:09:24.064650 1 controller.go:407] Starting provisioner controller 1e7bfb96-6b5e-11e7-a22f-faa5ad76f27a!
E0718 02:09:24.066793 1 reflector.go:201] github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:411: Failed to list *v1.PersistentVolumeClaim: User "system:serviceaccount:kube-syst
em:default" cannot list persistentvolumeclaims at the cluster scope. (get persistentvolumeclaims)
E0718 02:09:24.066796 1 reflector.go:201] github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:412: Failed to list *v1.PersistentVolume: User "system:serviceaccount:kube-system:de
fault" cannot list persistentvolumes at the cluster scope. (get persistentvolumes)
The issue may caused by RABC.
So I think the deployment should be modified like:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: rbd-provisioner
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
app: rbd-provisioner
spec:
containers:
- name: rbd-provisioner
image: "quay.io/external_storage/rbd-provisioner:v0.1.0
"
serviceAccountName: persistent-volume-binder ### add service account here
Yes, if default
ServiceAccount cannot have enough permissions to access apiserver, you can add your own ServiceAccount for rbd-provisioner
, which is recommended way, especially in production.
Also, namespace
in guide example can also be changed, if you don't want to deploy it in kube-system
namespace.
@cofyc
Why the key is not recognized by system?
kubectl logs rbd-provisioner-4059846714-blk2s -n kube-system
E0718 06:44:50.606634 1 goroutinemap.go:166] Operation for "provision-kube-system/ceph-pvc[47e83dbe-6b84-11e7-8632-00505682fcc1]" failed. No retries permitted until 2017-07-18 06:45:54.606613353 +0000 UTC (durationBeforeRetry 1m4s). Error: failed to create rbd image: exit status 22, command output: rbd: image format 1 is deprecated
2017-07-18 06:44:50.604709 7f83a44f2d80 -1 auth: failed to decode key 'QVFBUEhGOVoxTU0vQnhBQTZSaEdLblhmYXFqVFE3WjNqZ0xDc1E9PQo=='
2017-07-18 06:44:50.604719 7f83a44f2d80 0 librados: client.admin initialization error (22) Invalid argument
rbd: couldn't connect to the cluster!
hi, @zhangqx2010
$ echo QVFBUEhGOVoxTU0vQnhBQTZSaEdLblhmYXFqVFE3WjNqZ0xDc1E9PQo== | base64 -d
AQAPHF9Z1MM/BxAA6RhGKnXfaqjTQ7Z3jgLCsQ==
base64: invalid input
Your provided base64-encoded string QVFBUEhGOVoxTU0vQnhBQTZSaEdLblhmYXFqVFE3WjNqZ0xDc1E9PQo==
is invalid. It seems like it has an extra char =
at the end of string.
$ echo QVFBUEhGOVoxTU0vQnhBQTZSaEdLblhmYXFqVFE3WjNqZ0xDc1E9PQo= | base64 -d
AQAPHF9Z1MM/BxAA6RhGKnXfaqjTQ7Z3jgLCsQ==
Please remove it and retry.
What's your k8s version? kubectl version
. In my local 1.6.4 environment, if base64-encoded string is invalid, apiserver
does not accept it.
$ �cat <<EOF > t.yaml
apiVersion: v1
kind: Secret
metadata:
name: test-secret
type: "kubernetes.io/rbd"
data:
key: QVFBUEhGOVoxTU0vQnhBQTZSaEdLblhmYXFqVFE3WjNqZ0xDc1E9PQo==
EOF
$ kubectl apply -f t.yaml
Error from server (BadRequest): error when creating "t.yaml": Secret in version "v1" cannot be handled as a Secret: [pos 92]: json: error decoding base64 binary 'QVFBUEhGOVoxTU0vQnhBQTZSaEdLblhmYXFqVFE3WjNqZ0xDc1E9PQo==': illegal base64 data at input byte 56
You can use the kube-system/persistent-volume-binder
serviceaccount with rbd provisioner, although it still lacks the events/get permission (seems to be not critical for rbd-provisioner, though).
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-29T23:15:59Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.1", GitCommit:"1dc5c66f5dd61da08412a74221ecc79208c2165b", GitTreeState:"clean", BuildDate:"2017-07-14T01:48:01Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
It's typo for my first try. After modified the key, the issue still exsits.
cat ceph.client.admin.keyring
[client.admin]
key = AQAPHF9Z1MM/BxAA6RhGKnXfaqjTQ7Z3jgLCsQ==
caps mds = "allow *"
caps mon = "allow *"
caps osd = "allow *"
echo 'AQAPHF9Z1MM/BxAA6RhGKnXfaqjTQ7Z3jgLCsQ==' | base64
QVFBUEhGOVoxTU0vQnhBQTZSaEdLblhmYXFqVFE3WjNqZ0xDc1E9PQo=
W0718 08:10:53.418428 1 rbd_util.go:71] failed to create rbd image, output 2017-07-18 08:10:53.386902 7ff38f0fad80 -1 did not load config file, using default settings.
rbd: image format 1 is deprecated
2017-07-18 08:10:53.416330 7ff38f0fad80 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2017-07-18 08:10:53.416431 7ff38f0fad80 -1 auth: failed to decode key 'QVFBUEhGOVoxTU0vQnhBQTZSaEdLblhmYXFqVFE3WjNqZ0xDc1E9PQo='
2017-07-18 08:10:53.416446 7ff38f0fad80 0 librados: client.admin initialization error (22) Invalid argument
@cofyc
After successfully use your kubernetes-incubator/external-storage#200, pv is created as claimed.
#kubectl get pv
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-b3a244da-7111-11e7-bb10-00505682fcc1 2Gi RWO Delete Bound default/dy-rbd-c-1 rbd-dynamic 11m
pvc-b3ab0cd5-7111-11e7-bb10-00505682fcc1 1Gi RWO Delete Bound default/dy-rbd-c-2 rbd-dynamic 11m
# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
dy-rbd-c-1 Bound pvc-b3a244da-7111-11e7-bb10-00505682fcc1 2Gi RWO rbd-dynamic 11m
dy-rbd-c-2 Bound pvc-b3ab0cd5-7111-11e7-bb10-00505682fcc1 1Gi RWO rbd-dynamic 11m
But my deployment of one pod get errors:
Unable to mount volumes for pod "gocd-server-2-197958991-s1z14_default(b39c4442-7111-11e7-bb10-00505682fcc1)": timeout expired waiting for volumes to attach/mount for pod "default"/"gocd-server-2-197958991-s1z14". list of unattached/unmounted volumes=[dy-rbd-1 dy-rbd-2]
The provisioner logs:
#kubectl logs rbd-provisioner-2785693406-7s3r0 -f
I0725 08:17:33.112058 1 provision.go:110] successfully created rbd image "kubernetes-dynamic-pvc-b5713380-7111-11e7-b630-46fbafb56e36"
I0725 08:17:33.112083 1 controller.go:801] volume "pvc-b3a244da-7111-11e7-bb10-00505682fcc1" for claim "default/dy-rbd-c-1" created
I0725 08:17:33.218255 1 provision.go:110] successfully created rbd image "kubernetes-dynamic-pvc-b5816b74-7111-11e7-b630-46fbafb56e36"
I0725 08:17:33.218305 1 controller.go:801] volume "pvc-b3ab0cd5-7111-11e7-bb10-00505682fcc1" for claim "default/dy-rbd-c-2" created
I0725 08:17:33.564263 1 controller.go:818] volume "pvc-b3a244da-7111-11e7-bb10-00505682fcc1" for claim "default/dy-rbd-c-1" saved
I0725 08:17:33.564283 1 controller.go:854] volume "pvc-b3a244da-7111-11e7-bb10-00505682fcc1" provisioned for claim "default/dy-rbd-c-1"
I0725 08:17:33.763928 1 controller.go:818] volume "pvc-b3ab0cd5-7111-11e7-bb10-00505682fcc1" for claim "default/dy-rbd-c-2" saved
I0725 08:17:33.763946 1 controller.go:854] volume "pvc-b3ab0cd5-7111-11e7-bb10-00505682fcc1" provisioned for claim "default/dy-rbd-c-2"
I0725 08:18:03.136684 1 leaderelection.go:204] stopped trying to renew lease to provision for pvc default/dy-rbd-c-1, timeout reached
I0725 08:18:03.151513 1 leaderelection.go:204] stopped trying to renew lease to provision for pvc default/dy-rbd-c-2, timeout reached
What could cause this mount timeout?
I noticed that the fstype is stated nowhere. But when I tried to add fsType: ext4 to storageclass. I saw another error:
E0725 08:09:14.913472 1 goroutinemap.go:166] Operation for "provision-default/dy-rbd-c-1[81e6c6c7-7110-11e7-bb10-00505682fcc1]" failed. No retries permitted until 2017-07-25 08:09:15.913462177 +0000 UTC (durationBeforeRetry 1s). Error: invalid option "fsType" for ceph.com/rbd provisioner
No need to and do not add fsType: ext4
to storage class.
Have you checked userId
and secret key in userSecretName
?
You can execute rbd
command on your minion nodes manually, e.g. rbd ls -m <ceph-monitor-addrs> -p <your-pool> --id <userId> --key=<ceph secret key of userId>
to make sure node has rbd
utility installed, and can access ceph cluster through user id and user secret you provided.
As for RBD plugin in kubelet, it simply calls rbd
utility with ceph cluster information you provided to map image onto the host.
@cofyc
Yes, you are right. I typed wrong pool for ceph auth add
command. Problem solved after correcting this. Thanks for helping!
I have done as suggested by @farcaller and am running my rbd-provisioner under service account persistent-volume-binder.
PVC now creates volume and binds. However I see the errors about lack of access to events in the rbd-provisioner log
E0809 10:26:06.901024 1 controller.go:682] Error watching for provisioning success, can't provision for claim "default/dbvolclaim": User "system:serviceaccount:kube-system:persistent-volume-binder" cannot list events in the namespace "default". (get events)
Also pod cannot mount the volume that was created :
`kubectl describe po mysql-3673113032-7k059
Name: mysql-3673113032-7k059
Namespace: default
Node: knode22.robm.ammeon.com/10.168.170.22
Start Time: Wed, 09 Aug 2017 11:26:08 +0100
Labels: app=mysql
pod-template-hash=3673113032
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"mysql-3673113032","uid":"277b204b-7ced-11e7-add2-00163e371bd3","...
Status: Pending
IP:
Controllers: ReplicaSet/mysql-3673113032
Containers:
mysql:
Container ID:
Image: mysql:5.6
Image ID:
Port: 3306/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
MYSQL_ROOT_PASSWORD: password
Mounts:
/var/lib/mysql from mysql-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-702kp (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
mysql-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: dbvolclaim
ReadOnly: false
default-token-702kp:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-702kp
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
1m 1m 2 default-scheduler Warning FailedScheduling PersistentVolumeClaim is not bound: "dbvolclaim" (repeated 4 times)
1m 1m 1 default-scheduler Normal Scheduled Successfully assigned mysql-3673113032-7k059 to knode22.robm.ammeon.com
1m 1m 1 kubelet, knode22.robm.ammeon.com Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-702kp"
1m 20s 8 kubelet, knode22.robm.ammeon.com Warning FailedMount MountVolume.SetUp failed for volume "pvc-276d654d-7ced-11e7-add2-00163e371bd3" : rbd: image kubernetes-dynamic-pvc-277d877d-7ced-11e7-b9a9-5e25ff659549 is locked by other nodes
`
Is it reasonable to assume that because the provisoner cannot access events, it is never informed that PV is successfully created and hence never unlocks the PV ready for use.
@rtmie
You may create one service account
for this provisioner, like rbd-provisioner. Then binding this sa to clusterrole system:controller:persistent-volume-binder
.
When you deploy the provisioner pod, use the sa you just created to achieve the goal.
apiVersion: extensions/v1beta1
kind: Deployment
...
spec:
serviceAccountName: rbd-provisioner
...
@zhangqx2010 thanks for suggestion. However to me this is creating another SA with same permissions as persistent-bolume-binder.
In any case I tried it (hope I have configured it correctly!):
kind: ServiceAccount
apiVersion: v1
metadata:
name: rbd-provisioner
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rbd-provisioner
subjects:
- kind: ServiceAccount
name: rbd-provisioner
namespace: kube-system
roleRef:
kind: ClusterRole
name: system:controller:persistent-volume-binder
apiGroup: rbac.authorization.k8s.io
with similar result in deployment:
kubectl get po
NAME READY STATUS RESTARTS AGE
mysql-3673113032-1021n 0/1 ContainerCreating 0 3m
kubectl describe po mysql-3673113032-1021n
Name: mysql-3673113032-1021n
Namespace: default
Node: knode22.robm.ammeon.com/10.168.170.22
Start Time: Thu, 10 Aug 2017 09:58:37 +0100
Labels: app=mysql
pod-template-hash=3673113032
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"mysql-3673113032","uid":"184ff164-7daa-11e7-add2-00163e371bd3","...
Status: Pending
IP:
Controllers: ReplicaSet/mysql-3673113032
from default-token-702kp (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
mysql-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: dbvolclaim
ReadOnly: false
default-token-702kp:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-702kp
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
3m 3m 2 default-scheduler Warning FailedScheduling PersistentVolumeClaim is not bound: "dbvolclaim" (repeated 4 times)
3m 3m 1 default-scheduler Normal Scheduled Successfully assigned mysql-3673113032-1021n to knode22.robm.ammeon.com
3m 3m 1 kubelet, knode22.robm.ammeon.com Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-702kp"
1m 1m 1 kubelet, knode22.robm.ammeon.com Warning FailedMount Unable to mount volumes for pod "mysql-3673113032-1021n_default(185505a6-7daa-11e7-add2-00163e371bd3)": timeout expired waiting for volumes to attach/mount for pod "default"/"mysql-3673113032-1021n". list of unattached/unmounted volumes=[mysql-persistent-storage]
1m 1m 1 kubelet, knode22.robm.ammeon.com Warning FailedSync Error syncing pod
3m 1m 9 kubelet, knode22.robm.ammeon.com Warning FailedMount MountVolume.SetUp failed for volume "pvc-1844c146-7daa-11e7-add2-00163e371bd3" : rbd: image kubernetes-dynamic-pvc-1854ce34-7daa-11e7-b463-ba9a43d562ef is locked by other nodes```
Need more information. Did you use the pvc that you mentioned above? And please show your storageclass.
Hi @zhangqx2010 ,
Storage class
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rbd
provisioner: ceph.com/rbd
parameters:
monitors: 10.168.170.99:6789
adminId: admin
adminSecretName: ceph-secret
adminSecretNamespace: kube-system
pool: kubernetes
userId: kube
userSecretName: ceph-secret-user
imageFormat: "2"
imageFeatures: layering
PVC and PV Status
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
dbvolclaim Bound pvc-0df7359a-7ddf-11e7-add2-00163e371bd3 5Gi RWO rbd 19h
kubectl get pv
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-0df7359a-7ddf-11e7-add2-00163e371bd3 5Gi RWO Delete Bound default/dbvolclaim rbd 19h
I think the problem is on the node. I have just found an RBD error in the kubelet log:
Aug 10 16:14:53 knode22.robm.ammeon.com kubelet[540]: I0810 16:14:53.456634 540 rbd_util.go:141] lock list output "2017-08-10 16:14:53.450147 7fb1cb3ba7c0 -1 auth: failed to decode key 'XXXXXXXXXXXXXXXXXXXXXXXX\n'\n2017-08-10 16:14:53.450196 7fb1cb3ba7c0 0 librados: client.kube initialization error (22) Invalid argument\nrbd: couldn't connect to the cluster!\n"
Key , which I have x'ed out, is correctly read from the secret and is correct from ceph user setup. However I don't like the look of the \n.
Updated ceph-auth for user kube
sudo ceph auth get client.kube
exported keyring for client.kube
[client.kube]
key = XXXXXXXXXXXXXXXXXXXXXXXXX
caps mds = "allow * pool=kubernetes"
caps mon = "allow r"
caps osd = "allow * pool=kubernetes"
Different error on kubelet. Can anyone provide correct ceph auth?
lock list output "2017-08-11 15:47:59.852439 7f01ddf547c0 0 librados: client.kube authentication error (1) Operation not permitted\nrbd: couldn't connect to the cluster!\n"
mon="allow r"
(read mon to find osd)
osd="allow class-read object_prefix rbd_children, allow rwx pool=kubernetes"
(read rbd_children prefix, full access to kubernetes pool)
@farcaller - thanks for that. All good now!
@cofyc Thanks for the rbd image, it works wonders here :)
However, I am stuck in a similar timeout problem that @rtmie had.
My kubernetes version is 1.6.7
installed by kubeadm
PVC:
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE ceph-pvc Bound pvc-a6abf36b-8ef9-11e7-a959-02000a1ba70c 5Gi RWO rbd 23m
PV:
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-a6abf36b-8ef9-11e7-a959-02000a1ba70c 5Gi RWO Delete Bound default/ceph-pvc rbd 23m
Logs from the rbd-image:
I0901 09:41:47.382496 1 main.go:84] Creating RBD provisioner with identity: ceph.com/rbd I0901 09:41:47.384600 1 controller.go:407] Starting provisioner controller c5c0a291-8ef9-11e7-9f54-8efcc95066fd! I0901 09:41:47.387744 1 controller.go:1068] scheduleOperation[lock-provision-default/ceph-pvc[a6abf36b-8ef9-11e7-a959-02000a1ba70c]] I0901 09:41:47.399585 1 leaderelection.go:156] attempting to acquire leader lease... I0901 09:41:47.408363 1 leaderelection.go:178] successfully acquired lease to provision for pvc default/ceph-pvc I0901 09:41:47.408480 1 controller.go:1068] scheduleOperation[provision-default/ceph-pvc[a6abf36b-8ef9-11e7-a959-02000a1ba70c]] I0901 09:41:47.480116 1 provision.go:110] successfully created rbd image "kubernetes-dynamic-pvc-c5c5a17f-8ef9-11e7-9f54-8efcc95066fd" I0901 09:41:47.480189 1 controller.go:801] volume "pvc-a6abf36b-8ef9-11e7-a959-02000a1ba70c" for claim "default/ceph-pvc" created I0901 09:41:47.485847 1 controller.go:818] volume "pvc-a6abf36b-8ef9-11e7-a959-02000a1ba70c" for claim "default/ceph-pvc" saved I0901 09:41:47.485890 1 controller.go:854] volume "pvc-a6abf36b-8ef9-11e7-a959-02000a1ba70c" provisioned for claim "default/ceph-pvc" I0901 09:41:49.415265 1 leaderelection.go:198] stopped trying to renew lease to provision for pvc default/ceph-pvc, task succeeded
Test pod in pending state:
`Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
25m 24m 7 default-scheduler Warning FailedScheduling [SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "ceph-pvc", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "ceph-pvc", which is unexpected.]
24m 24m 1 default-scheduler Normal Scheduled Successfully assigned test-pod to cepf-slave-curious-tiger
22m 2m 10 kubelet, cepf-slave-curious-tiger Warning FailedMount Unable to mount volumes for pod "test-pod_default(a72f09cd-8ef9-11e7-a959-02000a1ba70c)": timeout expired waiting for volumes to attach/mount for pod "default"/"test-pod". list of unattached/unmounted volumes=[pvc]
22m 2m 10 kubelet, cepf-slave-curious-tiger Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"test-pod". list of unattached/unmounted volumes=[pvc]`
Any ideas?
@Demonsthere could you provide more details about your issue? How you set up your pod, pvc and pv, e.g., yaml file could be helpful. Also if you have the kubelet log on node, we could take a look to debug.
Can you take a look at kubelet logs on the node where the pod is instantiated, look for anything related to ceph, RBD
Also
I encountered the same issue.
kube-controller-manager logs
ep 4 15:25:36 bj-xg-oam-kubernetes-001 kube-controller-manager: W0904 15:25:36.032128 13211 rbd_util.go:364] failed to create rbd image, output
Sep 4 15:25:36 bj-xg-oam-kubernetes-001 kube-controller-manager: W0904 15:25:36.032201 13211 rbd_util.go:364] failed to create rbd image, output
Sep 4 15:25:36 bj-xg-oam-kubernetes-001 kube-controller-manager: W0904 15:25:36.032252 13211 rbd_util.go:364] failed to create rbd image, output
Sep 4 15:25:36 bj-xg-oam-kubernetes-001 kube-controller-manager: E0904 15:25:36.032276 13211 rbd.go:317] rbd: create volume failed, err: failed to create rbd image: fork/exec /usr/bin/rbd: invalid argument, command output:
@jingxu97 I am using the yamls from https://github.com/kubernetes-incubator/external-storage/tree/master/ceph/rbd
controller:
apiVersion: extensions/v1beta1 kind: Deployment
metadata: name: rbd-provisioner namespace: kube-system
spec: replicas: 1 strategy: type: Recreate
template: metadata: labels: app: rbd-provisioner spec: containers: - name: rbd-provisioner
image: "quay.io/external_storage/rbd-provisioner:latest" env: - name: PROVISIONER_NAME value: ceph.com/rbd # serviceAccountName: rbd-provisioner
pvc:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: ceph-pvc spec: accessModes: - ReadWriteOnce resources: requests
: storage: 5Gi storageClassName: rbd
secrets:
---
apiVersion: v1 kind: Secret metadata
: name: ceph-secret-admin namespace: kube-system type: "kubernetes.io/rbd" data: key: {{ ceph_key_admin | b64encode }} ---
apiVersion: v1 kind: Secret metadata
: name: ceph-secret-user type: "kubernetes.io/rbd" data: key: {{ ceph_key_user | b64encode }}
storageClass
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: rbd provisioner: ceph.com/rbd parameters
: monitors: {{ ceph_monitor_list }} pool: k8s-test adminId: k8s-admin adminSecretName: ceph-secret-admin adminSecretNamespace: kube-system userId: k8s-user
userSecretName: ceph-secret-user imageFormat: "2" imageFeatures: layering
—
@rootsongjc can you share the K8S deployment for the rbd-provisioner?
@Demonsthere that looks like a problem on your ceph server. Have you tried creating an RBD volume with the ceph client tools and mounting it.
@rtmie Please check this link: https://github.com/kubernetes-incubator/external-storage/tree/master/ceph/rbd
I have not tried it personally, but it should work.
@sbezverk thanks, I do not have any issues with my setup, just answering someone else.
@rtmie I was able to mount the volume created by k8s (rbd map) manually
rbd map k8s-test/kubernetes-dynamic-pvc-2519acf4-8f12-11e7-9da6-4e6002ec91dd --id k8s-user /dev/rbd1
@rootsongjc Maybe your Secret key isn't base64 ?
Hi,
I'm running into the same error. Is there a workaround?
Feb 05 20:05:53 minikube kubelet[3704]: E0205 20:05:53.570497 3704 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/rbd/[10.97.152.94:6790 10.104.200.62:6790 10.111.171.163:6790]:k8s-dynamic-pvc-ea65c0b0-0a99-11e8-a29b-0800272637bb-ea742ba1-0a99-11e8-9a4d-0242ac110004\"" failed. No retries permitted until 2018-02-05 20:07:55.570463699 +0000 UTC m=+40710.307964873 (durationBeforeRetry 2m2s). Error: "MountVolume.WaitForAttach failed for volume \"pvc-ea65c0b0-0a99-11e8-a29b-0800272637bb\" (UniqueName: \"kubernetes.io/rbd/[10.97.152.94:6790 10.104.200.62:6790 10.111.171.163:6790]:k8s-dynamic-pvc-ea65c0b0-0a99-11e8-a29b-0800272637bb-ea742ba1-0a99-11e8-9a4d-0242ac110004\") pod \"prometheus-sample-metrics-prom-0\" (UID: \"38ed9d87-0aa9-11e8-a29b-0800272637bb\") : **error: executable file not found in $PATH, rbd output:** "
Kubernetes: v1.9.1 (Minikube)
OS: Linux minikube 4.9.13 #1 SMP Thu Oct 19 17:14:00 UTC 2017 x86_64 GNU/Linux
Thanks in Advance.
@bamb00 I found that the linux kernel required for ceph is at least 4.10 for Ubuntu. Plus you need the kernel modules libceph and rbd enabled
Hi everybody,
After following this guide: http://docs.ceph.com/docs/master/start/kube-helm/
And then trying so many different things in this thread, I'm now stuck here:
Mar 5 17:28:04 ip-172-25-37-183 kubelet[2495]: E0305 17:28:04.279492 2495 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/rbd/[172.25.37.183:6789]:kubernetes-dynamic-pvc-7b423f8a-209a-11e8-a595-f29dbaa32808\"" failed. No retries permitted until 2018-03-05 17:28:12.279449171 +0000 UTC m=+16066.929917280 (durationBeforeRetry 8s). Error: "MountVolume.WaitForAttach failed for volume \"pvc-7b39bd6a-209a-11e8-a55b-02e13b5c0864\" (UniqueName: \"kubernetes.io/rbd/[172.25.37.183:6789]:kubernetes-dynamic-pvc-7b423f8a-209a-11e8-a595-f29dbaa32808\") pod \"mypod\" (UID: \"804e2b5a-209a-11e8-a55b-02e13b5c0864\") : error: exit status 1, rbd output: 2018-03-05 17:28:04.271178 7f05573bad40 -1 did not load config file, using default settings.\n2018-03-05 17:28:04.276321 7f05573bad40 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory\n2018-03-05 17:28:04.277661 7f05573bad40 0 librados: client.admin authentication error (1) Operation not permitted\nrbd: couldn't connect to the cluster!\n
I've manually installed the correct versions of ceph on each node of my k8s cluster – a little cumbersome, and weirdly, not documented anywhere (does anyone know why this isn't written down on docs.ceph.com?).
Then I had to use the service IP address in my Storage Class instead of the hostname.
And finally, it seems there's errors with authentication. I'm pretty sure I've created my userSecret correctly:
{
"kind": "Secret",
"apiVersion": "v1",
"metadata": {
"name": "pvc-ceph-client-key",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/secrets/pvc-ceph-client-key",
"uid": "0c1a93eb-2080-11e8-a55b-02e13b5c0864",
"resourceVersion": "18641341",
"creationTimestamp": "2018-03-05T14:18:16Z"
},
"data": {
"key": "QVFDeVVKMWF3ODdHS2hBQVBSc3NrRHYrMThnSVl0T3B1Qnlyb3c9PQo="
},
"type": "kubernetes.io/rbd"
}
Any pointers?
I have the problem that pvc log:
Warning ProvisioningFailed 55m (x6 over 1h) ceph.com/rbd rbd-provisioner-5b89b9bb7c-56jgz c4f87b20-2112-11e8-82cb-0242ac110005 (combined from similar events): Failed to provision volume with StorageClass "fast": failed to create rbd image: exit status 110, command output: 2018-03-06 09:40:33.361766 7fedad8c7d80 -1 did not load config file, using default settings.
rbd: image format 1 is deprecated
2018-03-06 09:40:33.411473 7fedad8c7d80 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2018-03-06 09:45:33.411867 7fedad8c7d80 0 monclient(hunting): authenticate timed out after 300
2018-03-06 09:45:33.411971 7fedad8c7d80 0 librados: client.admin authentication error (110) Connection timed out
rbd: couldn't connect to the cluster!
Normal ExternalProvisioning 3m (x1839 over 2h) persistentvolume-controller waiting for a volume to be created, either by external provisioner "ceph.com/rbd" or manually created by system administrator
first i create the deployment in none-rbac enviroment
[root@controller:/home/ubuntu/rbd]$ cat rbd-provisioner.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: rbd-provisioner
labels:
app: rbd-provisioner
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: rbd-provisioner
template:
metadata:
labels:
app: rbd-provisioner
spec:
containers:
- name: rbd-provisioner
image: "quay.io/external_storage/rbd-provisioner:latest"
imagePullPolicy: Never
env:
- name: PROVISIONER_NAME
value: ceph.com/rbd
then the pod running
[root@controller:/etc/ceph]$ kubectl get po
NAME READY STATUS RESTARTS AGE
rbd-provisioner-5b89b9bb7c-56jgz 1/1 Running 0 2h
second i create a storageclass
[root@controller:/home/ubuntu/rbd]$ cat storageclass-rbd.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fast
provisioner: ceph.com/rbd
parameters:
monitors: 192.168.1.115:6789
adminId: admin
adminSecretName: ceph-secret
adminSecretNamespace: rbd
pool: rbd
userId: admin
userSecretName: ceph-secret
[root@controller:/home/ubuntu/rbd]$ kubectl get sc
NAME PROVISIONER AGE
fast ceph.com/rbd 2h
third i create a pvc
[root@controller:/home/ubuntu/rbd]$ cat ceph-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: ceph-claim-sc
namespace: rbd
spec:
storageClassName: fast
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
[root@controller:/home/ubuntu/rbd]$ kubectl get pvc -n rbd
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ceph-claim-sc Pending fast 2h
[root@controller:/etc/ceph]$ kubectl describe pvc ceph-claim-sc -n rbd
Name: ceph-claim-sc
Namespace: rbd
StorageClass: fast
Status: Pending
Volume:
Labels:
Annotations: control-plane.alpha.kubernetes.io/leader={"holderIdentity":"c4f87b20-2112-11e8-82cb-0242ac110005","leaseDurationSeconds":15,"acquireTime":"2018-03-06T08:13:17Z","renewTime":"2018-03-06T09:45:18Z","lea...
volume.beta.kubernetes.io/storage-provisioner=ceph.com/rbd
Finalizers: []
Capacity:
Access Modes:
Events:
Type Reason Age From Message
Warning ProvisioningFailed 55m (x6 over 1h) ceph.com/rbd rbd-provisioner-5b89b9bb7c-56jgz c4f87b20-2112-11e8-82cb-0242ac110005 (combined from similar events): Failed to provision volume with StorageClass "fast": failed to create rbd image: exit status 110, command output: 2018-03-06 09:40:33.361766 7fedad8c7d80 -1 did not load config file, using default settings.
rbd: image format 1 is deprecated
2018-03-06 09:40:33.411473 7fedad8c7d80 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2018-03-06 09:45:33.411867 7fedad8c7d80 0 monclient(hunting): authenticate timed out after 300
2018-03-06 09:45:33.411971 7fedad8c7d80 0 librados: client.admin authentication error (110) Connection timed out
rbd: couldn't connect to the cluster!
Normal ExternalProvisioning 3m (x1839 over 2h) persistentvolume-controller waiting for a volume to be created, either by external provisioner "ceph.com/rbd" or manually created by system administrator
the rbd-provisioner pod log is:
deploy.txt
thanks for all and wait for your reply!
@jianglingxia did you create ceph-secret in your namespaces ?
yes,thanks for your reply, and I have checked the problem that the ceph cluster version not Consistent with the minion ceph-common version!thanks very much!
Warning ProvisioningFailed 1m ceph.com/rbd rbd-provisioner-bc956f5b4-r2vg4 01c62837-4db5-11e8-b4c7-0a580af4040f Failed to provision volume with StorageClass "ceph-storage": failed to create rbd image: exit status 2, command output: 2018-05-02 05:58:08.149335 7f58a112ad80 -1 did not load config file, using default settings.
2018-05-02 05:58:08.213998 7f58a112ad80 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
rbd: error opening pool rdb: (2) No such file or directory
Normal Provisioning 56s (x2 over 1m) ceph.com/rbd rbd-provisioner-bc956f5b4-r2vg4 01c62837-4db5-11e8-b4c7-0a580af4040f External provisioner is provisioning volume for claim "harbor/adminserver-config-harbor-harbor-adminserver-0"
Warning ProvisioningFailed 56s ceph.com/rbd rbd-provisioner-bc956f5b4-r2vg4 01c62837-4db5-11e8-b4c7-0a580af4040f Failed to provision volume with StorageClass "ceph-storage": failed to create rbd image: exit status 2, command output: 2018-05-02 05:58:15.320918 7fbd961cdd80 -1 did not load config file, using default settings.
2018-05-02 05:58:15.386747 7fbd961cdd80 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
rbd: error opening pool rdb: (2) No such file or directory
Normal ExternalProvisioning 7s (x4 over 13s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "ceph.com/rbd" or manually created by system administrator
I have this error,how to solve?
solved
Closed #38923.
/reopen
@feresberbeche: you can't re-open an issue/PR unless you authored it or you are assigned to it.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
hi, guys,
Now you can avoid using customized
kube-controller
image, external-storage out-of-tree RBD provisioner is merged, you can use it instead. Here is guide:1, Deploy standalone
rbd-provisioner
controller:
Ok, It can work. But what about of several already provisioned PVC/PV by kubernetes.io/rbd provisioner? How can I switch these volumes under the control of external storage controller? What is a clear upgrade path from 1.11 to 1.12?
But what about of several already provisioned PVC/PV by kubernetes.io/rbd provisioner? How can I switch these volumes under the control of external storage controller?
There is no automatic way, but you can do it manually.
spec.claimRef
filed to point to new claimWhat is a clear upgrade path from 1.11 to 1.12?
No extra action required for RBD provisioning.
External provisioner is great during provisioning volumes but if you want to resize those volumes it's handled by in-tree volume plugin (kube-controller-manager):
Error expanding volume "volume-name" of plugin kubernetes.io/rbd : rbd info failed, error: executable file not found in $PATH
So if you want to resize volumes you still need to have rbd binary in kube-controller-manager image.
hi, there are two parts:
* Volume Provisioning: Currently, if you want dynamic provisioning, RBD provisioner in `controller-manager` needs to access `rbd` binary to create new image in ceph cluster for your PVC. [external-storage](https://github.com/kubernetes-incubator/external-storage) plans to move volume provisioners from in-tree to out-of-tree, there will be a separated RBD provisioner container image with `rbd` utility included ([kubernetes-incubator/external-storage#200](https://github.com/kubernetes-incubator/external-storage/issues/200)), then `controller-manager` do not need access `rbd` binary anymore. * Volume Attach/Detach: `kubelet` needs to access `rbd` binary to attach (`rbd map`) and detach (`rbd unmap`) RBD image on node. If `kubelet` is running on the host, host needs to install `rbd` utility (install `ceph-common` package on most Linux distributions).
This is very useful detail explanations.
I installed Kubernetes 1.13.2 with Kubespray
For the first part (Volume Provisioning), kubespray deployment uses /external-storage/rbd-provisioner container. So it works fine. I checked the logs and Ceph created the volume ok
In the second part (Volume Attach/Detach), it failed because rbd binary was not found.
My kubelet is running on CoreOS host (not inside docker/rkt) because and due to to security, kubespray kubelet_deployment_type default is 'host'. I could NOT install ceph-common when setting up these nodes. However, I could invoke 'rbd' binary mounted from ceph/ceph container
I still got the error stating that rbd could not find ceph.conf in these nodes. Where to get this .conf dynamically?