Unable to see osd after adding extra disks to cluster

foobar2...@gmail.com

unread,

Jun 26, 2020, 5:56:57 PM6/26/20

to rook-dev

Hi all,

I have been having issues after adding storage to the k8s/rook-ceph nodes. I have three nodes configures with 2 disks, one of which I just added to an existing rook-ceph cluster. I'll focus on the first node to avoid any clutter.

cephCluster

    storage:
      config: null
      nodes:
      - config: null
        devices:
        - config:
            osdsPerDevice: "1"
          name: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0123456789
        - config:
            osdsPerDevice: "1"
          name: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol9876543210
        name: ip-10-1-1-10
        resources: {}

A second LVM is created on each node and "rook-ceph-osd-prepare" finds the disks and seems to complete successfuly

 2020-06-26 21:34:44.202512 I | cephcmd: desired devices to configure osds: [{Name:/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol9876543210 OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false IsDevicePathFilter:false} {Name:/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0123456789 OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false IsDevicePathFilter:false}]
2020-06-26 21:34:44.203482 I | rookcmd: starting Rook v1.3.6 with arguments '/rook/rook ceph osd provision'
2020-06-26 21:34:44.203496 I | rookcmd: flag values: --cluster-id=09bc231d-8cba-4a47-ba1c-c123123dd456, --data-device-filter=, --data-device-path-filter=, --data-devices=/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol9876543210:1:::,/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0123456789:1:::, --encrypted-device=false, --force-format=false, --help=false, --location=, --log-flush-frequency=5s, --log-level=DEBUG, --metadata-device=, --node-name=ip-10-1-1-10, --operator-image=, --osd-database-size=0, --osd-store=, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=false, --service-account=
[...]
2020-06-26 21:34:44.288032 I | cephosd: creating and starting the osds
2020-06-26 21:34:44.288051 D | cephosd: desiredDevices are [{Name:/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol9876543210 OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false IsDevicePathFilter:false} {Name:/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0123456789 OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false IsDevicePathFilter:false}]
2020-06-26 21:34:44.288058 D | cephosd: context.Devices are [0xc0006c2900 0xc0001f07e0 0xc0001ee360 0xc000162480 0xc000338000 0xc0003386c0]
2020-06-26 21:34:44.288063 D | exec: Running command: lsblk /dev/nvme0n1p3 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-06-26 21:34:44.289392 D | exec: Running command: ceph-volume inventory --format json /dev/nvme0n1p3
2020-06-26 21:34:44.736160 I | cephosd: skipping device "nvme0n1p3": ["Insufficient space (<5GB)"].
2020-06-26 21:34:44.736180 I | cephosd: skipping device "nvme0n1p1" because it contains a filesystem "ext4"
2020-06-26 21:34:44.736183 I | cephosd: skipping device "nvme0n1p4" because it contains a filesystem "xfs"
2020-06-26 21:34:44.736186 I | cephosd: skipping device "nvme0n1p2" because it contains a filesystem "vfat"
2020-06-26 21:34:44.736189 I | cephosd: skipping 'dm' device "dm-0"
2020-06-26 21:34:44.736191 I | cephosd: skipping 'dm' device "dm-1"
2020-06-26 21:34:44.743309 I | cephosd: configuring osd devices: {"Entries":{}}
2020-06-26 21:34:44.743322 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2020-06-26 21:34:44.743328 D | exec: Running command: ceph-volume lvm list  --format json
2020-06-26 21:34:44.927170 I | cephosd: osdInfo has 1 elements. [{Name:osd-data-6739dcc3-1c8f-4857-8bc6-c5d5030330a9 Path:/dev/ceph-f49fe69f-6b5b-44ed-9682-e82642a6233f/osd-data-6739dcc3-1c8f-4857-8bc6-c5d5030330a9 Tags:{OSDFSID:9588da0e-ab05-452e-b3d2-a3a8f7e1fef3 Encrypted:0 ClusterFSID:aeffed0b-b91c-444a-a5d4-377ee004a4f9} Type:block}]
2020-06-26 21:34:44.927185 I | cephosd: 1 ceph-volume lvm osd devices configured on this node
2020-06-26 21:34:44.927197 I | cephosd: devices = [{ID:0 Cluster:ceph UUID:9588da0e-ab05-452e-b3d2-a3a8f7e1fef3 DevicePartUUID: BlockPath:/dev/ceph-f49fe69f-6b5b-44ed-9682-e82642a6233f/osd-data-6739dcc3-1c8f-4857-8bc6-c5d5030330a9 MetadataPath: SkipLVRelease:false Location: LVBackedPV:false CVMode:lvm Store:bluestore}]

When I describe "rook-ceph-osd-prepare", I get the following

Events:
  Type     Reason          Age                From                      Message
  ----     ------          ----               ----                      -------
  Normal   Scheduled       <unknown>          default-scheduler         Successfully assigned rook-ceph/rook-ceph-osd-prepare-ip-10-1-1-10-7rvpg to ip-10-1-1-10
  Normal   Created         10m                kubelet, ip-10-1-1-10  Created container copy-bins
  Normal   Started         10m                kubelet, ip-10-1-1-10  Started container copy-bins
  Normal   Pulled          10m                kubelet, ip-10-1-1-10  Container image "ceph/ceph:v14.2.9" already present on machine
  Normal   Created         10m                kubelet, ip-10-1-1-10  Created container provision
  Normal   Started         10m                kubelet, ip-10-1-1-10  Started container provision
  Normal   Pulled          10m (x2 over 10m)  kubelet, ip-10-1-1-10  Container image "rook/ceph:v1.3.6" already present on machine
  Normal   SandboxChanged  10m                kubelet, ip-10-1-1-10  Pod sandbox changed, it will be killed and re-created.
  Warning  Failed          10m                kubelet, ip-10-1-1-10  Error: cannot find volume "rook-binaries" to mount into container "copy-bins"

I get the same warning for all three nodes.

So my situation is that I have 3 active osd, three suposed configures osds from the new storage but they do not aprear to be accessible by the rook-ceph cluster.

+----+---------------+-------+-------+--------+---------+--------+---------+-----------+
| id |       host    |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+---------------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | ip-10-1-1-10  | 1327M | 48.6G |    0   |     0   |    0   |     0   | exists,up |
| 1  | ip-10-1-1-11  | 1327M | 48.6G |    0   |     0   |    0   |     0   | exists,up |
| 2  | ip-10-1-1-12  | 1327M | 48.6G |    0   |     0   |    0   |     0   | exists,up |
+----+---------------+-------+-------+--------+---------+--------+---------+-----------+

I am out of ideas on how to fix this. Could someone help?

Thanks!! :)

Alexander Trost

unread,

Jun 26, 2020, 6:49:52 PM6/26/20

to foobar2...@gmail.com, rook-dev

Check the rook-ceph-operator logs to see if it is still waiting for the prepare to complete on one of the nodes and / or has encountered any issues.

Also If the `Pod sandbox changed` events are happening more often, there might be something wrong with node(s) and / or application (check the logs of all the containers including the init containers of the Pod(s)).

Kind regards

Alexander Trost

--
You received this message because you are subscribed to the Google Groups "rook-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rook-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rook-dev/84088245-eb05-4651-9b20-331d7bfcf3f3o%40googlegroups.com.

foobar2...@gmail.com

unread,

Jun 28, 2020, 4:02:53 AM6/28/20

to rook-dev

Hi Alexander,

Thank you so much for your answer. It led me back to reading documentation and realise I might be setting up rook-ceph the wrong way. I think I should be looking at a PVC-Cluster rather than adding disks to a Kubernetes node and then to Rook-Ceph.

Thanks again!

To unsubscribe from this group and stop receiving emails from it, send an email to rook...@googlegroups.com.

Reply all

Reply to author

Forward