Hi Adam,
On Thu, 16 Jun 2022, Adam J. McPartlan wrote:
> The VM is heavy on the IOs. I am assuming (stabbing in the dark) the disk
> is not fast enough to cope or the DRDB replication is sluggish.
If there is no networking issue, DRBD stays in sync, regardless of the
performance of your disks or the IO-load of the VMs.
> It's not affecting the cluster and a gnt-cluster verify reports nothing
> unusual...
>
> on primary: /dev/drbd0 (147:0) in sync, status *DEGRADED*B
Please provide the output of "gnt-cluster verify-disks". This should
detect the degraded DRBD disks.
Normally a cron-job runs every 5 minutes on the master node, which should
repair degraded DRBD disks. You may run "ganeti-watcher" by hand, to see
if that helps.
If this does not help (which I assume), you are in a case, that ganeti can
not detect/repair reliable.
> cat /proc/drbd
> version: 8.4.11 (api:1/proto:86-101)
> srcversion: FC3433D849E3B88C1E7B55C
> 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
> ns:0 nr:0 dw:1453831088 dr:1501319775 al:56559455 bm:0 lo:0 pe:0 ua:0
> ap:0 ep:1 wo:f oos:183001276
...
> 7: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
> ns:0 nr:73400320 dw:73400320 dr:0 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1
> wo:f oos:0
This two DRBD resources (0, 7) are out-of-sync (oos). For completeness,
can you also provide the content of /proc/drbd from the secondary node of
the instance with /dev/drbd0?
Resource 7 must belongs to an other instance... check to find/repair it,
too.
Last but not least, your problem should be fixed! "gnt-instance
activate-disks <inst>". Should to do it.
Thanks, Sascha.