On 08/21/2017 07:38 PM, Georg Faerber wrote:
> Hi all,
>
> Environment
> -----------
> - Debian jessie
> - drbd-utils 8.9.5-1~bpo8+1
> - ganeti 2.15.2-1~bpo8+1
> - kernel 3.16.43-2+deb8u3
> - Small two node cluster, DRBD uses a dedicated, directly connected
> 1Gb interface (no PL, no buffer overruns, latency <1ms; it seems
> everything is fine regarding the network)
> - ~60 instances
>
> Problem
> -------
> I'm seeing the following: After a cluster reboot (shutdown all the
> instances, reboot the nodes, start all instances), the DRBD resource of
> the instance which was added the last to the cluster gets stuck (the
> following logs are produced by only one node, the ganeti master):
>
> /proc/drbd
> 19: cs:WFBitMapT ro:Secondary/Primary ds:Outdated/UpToDate C r---d- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:21096
>
[snip]
Hello there,
This kernel patch could be related,
https://patchwork.kernel.org/patch/5189451/ , but it's included in
version 3.19 and later.
Applying this patch over debian's linux 3.16.43 and after some reboots
showed that it might actually solve the problem.
Unfortunately living with patched/diverged kernel is not really viable.
If someone else tests and verifies this too, perhaps it would worth
report it to Debian.
Greetings