DRBD Stalled

ampictage

unread,

Apr 13, 2012, 11:58:29 AM4/13/12

to ganeti

Wondering if anyone else has had this problem:

I have a cluster of 5 ganeti hosts (xen with dom0_mem=4096M), but I
cannot add more that 16 drbd disks per host, once the 17th instance is
created, it stalls on sync with short read errors and never finishes.

I have tried multiple versions of drbd/kernel/bios. I have another 2
node cluster that does not have this problem with same hardware/
software/kernel versions using the same model switch for the drbd
transport.

I added 2 hosts to this cluster recently, and when I tested them
before installing to cluster, I was able to create more than 20 vm's
without issue.

Here is the only log output from drbd which includes the output from
drbdsetup 17 disconnect used to terminate the disk creation:

block drbd17: Starting worker thread (from cqueue/2 [261])
block drbd17: disk( Diskless -> Attaching )
block drbd17: No usable activity log found.
block drbd17: Method to ensure write ordering: barrier
block drbd17: max_segment_size ( = BIO size ) = 32768
block drbd17: drbd_bm_resize called with capacity == 419430400
block drbd17: resync bitmap: bits=52428800 words=819200
block drbd17: size = 200 GB (209715200 KB)
block drbd17: Writing the whole bitmap, size changed
block drbd17: 200 GB (52428800 bits) marked out-of-sync by on disk bit-
map.
block drbd17: recounting of set bits took additional 2 jiffies
block drbd17: 200 GB (52428800 bits) marked out-of-sync by on disk bit-
map.
block drbd17: disk( Attaching -> Inconsistent )
block drbd17: Barriers not supported on meta data device - disabling
block drbd17: conn( StandAlone -> Unconnected )
block drbd17: Starting receiver thread (from drbd17_worker [21794])
block drbd17: receiver (re)started
block drbd17: conn( Unconnected -> WFConnection )
block drbd17: Handshake successful: Agreed network protocol version 94
block drbd17: Peer authenticated using 16 bytes of 'md5' HMAC
block drbd17: conn( WFConnection -> WFReportParams )
block drbd17: Starting asender thread (from drbd17_receiver [21799])
block drbd17: data-integrity-alg: <not-used>
block drbd17: drbd_sync_handshake:
block drbd17: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:52428800 flags:0
block drbd17: peer 4829B58EB3A8FE8D:
0000000000000004:0000000000000000:0000000000000000 bits:52428800 flags:
0
block drbd17: uuid_compare()=-2 by rule 20
block drbd17: Becoming sync target due to disk states.
block drbd17: Writing the whole bitmap, full sync required after
drbd_sync_handshake.
block drbd17: 200 GB (52428800 bits) marked out-of-sync by on disk bit-
map.
block drbd17: peer( Unknown -> Primary ) conn( WFReportParams ->
WFBitMapT ) pdsk( DUnknown -> UpToDate )
block drbd17: conn( WFBitMapT -> WFSyncUUID )
block drbd17: helper command: /bin/true before-resync-target minor-17
block drbd17: helper command: /bin/true before-resync-target minor-17
exit code 0 (0x0)
block drbd17: conn( WFSyncUUID -> SyncTarget )
block drbd17: Began resync as SyncTarget (will sync 209715200 KB
[52428800 bits set]).
block drbd17: peer( Primary -> Unknown ) conn( SyncTarget ->
Disconnecting ) pdsk( UpToDate -> DUnknown )
block drbd17: short read expecting header on sock: r=-512
block drbd17: meta connection shut down by peer.
block drbd17: asender terminated
block drbd17: Terminating asender thread
block drbd17: Connection closed
block drbd17: conn( Disconnecting -> StandAlone )
block drbd17: receiver terminated
block drbd17: Terminating receiver thread
block drbd17: disk( Inconsistent -> Diskless )
block drbd17: drbd_bm_resize called with capacity == 0
block drbd17: worker terminated
block drbd17: Terminating worker thread

Any help would be appreciated.

Iustin Pop

unread,

Apr 13, 2012, 2:48:41 PM4/13/12

to gan...@googlegroups.com

On Fri, Apr 13, 2012 at 08:58:29AM -0700, ampictage wrote:
> Wondering if anyone else has had this problem:
>
> I have a cluster of 5 ganeti hosts (xen with dom0_mem=4096M), but I

Just curious, why do you give dom0 so much memory?

This shows a very well behaved primary. It started resync as synctarget,
and then the primary shutdown the connection.

Could you give details about the kernel log on the primary too?

thanks,
iustin

Simon Deziel

unread,

Apr 13, 2012, 2:59:32 PM4/13/12

to gan...@googlegroups.com

On 12-04-13 11:58 AM, ampictage wrote:
> Wondering if anyone else has had this problem:
>
> I have a cluster of 5 ganeti hosts (xen with dom0_mem=4096M), but I
> cannot add more that 16 drbd disks per host, once the 17th instance is
> created, it stalls on sync with short read errors and never finishes.

By default, the drbd modules only allows to create 32 devices which you
might have reached with 16 instances (each instance takes 2 DRBD
devices: data & metadata). Did you increased the minor_count when
loading the drbd module ? See
http://docs.ganeti.org/ganeti/master/html/install.html#installing-drbd
for details.

Simon

ampictage

unread,

Apr 17, 2012, 11:34:38 AM4/17/12

to ganeti

DRBD is loaded with minor_count=128
Gave 4GB to dom0 because initially only had 512MB, but ran into issues
with out of memory errors when trying to add drbd instances, this was
in production and did not have the time to test, so gave it too much
versus too little.

Source server dmesg log:

block drbd1: Starting worker thread (from cqueue/1 [258])
block drbd1: disk( Diskless -> Attaching )
block drbd1: No usable activity log found.
block drbd1: Method to ensure write ordering: barrier
block drbd1: max_segment_size ( = BIO size ) = 32768
block drbd1: drbd_bm_resize called with capacity == 419430400
block drbd1: resync bitmap: bits=52428800 words=819200
block drbd1: size = 200 GB (209715200 KB)
block drbd1: Writing the whole bitmap, size changed
block drbd1: 200 GB (52428800 bits) marked out-of-sync by on disk bit-
map.
block drbd1: recounting of set bits took additional 2 jiffies
block drbd1: 200 GB (52428800 bits) marked out-of-sync by on disk bit-
map.
block drbd1: disk( Attaching -> Inconsistent )
block drbd1: Barriers not supported on meta data device - disabling
block drbd1: conn( StandAlone -> Unconnected )
block drbd1: Starting receiver thread (from drbd1_worker [19944])
block drbd1: receiver (re)started
block drbd1: conn( Unconnected -> WFConnection )
block drbd1: Handshake successful: Agreed network protocol version 94
block drbd1: Peer authenticated using 16 bytes of 'md5' HMAC
block drbd1: conn( WFConnection -> WFReportParams )
block drbd1: Starting asender thread (from drbd1_receiver [19949])
block drbd1: data-integrity-alg: <not-used>
block drbd1: drbd_sync_handshake:
block drbd1: self

0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:52428800 flags:0

block drbd1: peer

0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:52428800 flags:0

block drbd1: uuid_compare()=0 by rule 10
block drbd1: No resync, but 52428800 bits in bitmap!
block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams ->
Connected ) pdsk( DUnknown -> Inconsistent )
block drbd1: role( Secondary -> Primary ) disk( Inconsistent ->
UpToDate )
block drbd1: Forced to consider local data as UpToDate!
block drbd1: Creating new current UUID
block drbd1: drbd_sync_handshake:
block drbd1: self 8B26969D7F9BD80D:

0000000000000004:0000000000000000:0000000000000000 bits:52428800 flags:
0

block drbd1: peer

0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:52428800 flags:0

block drbd1: uuid_compare()=2 by rule 30
block drbd1: Becoming sync source due to disk states.
block drbd1: Writing the whole bitmap, full sync required after
drbd_sync_handshake.
block drbd1: 200 GB (52428800 bits) marked out-of-sync by on disk bit-
map.
block drbd1: conn( Connected -> WFBitMapS )
block drbd1: conn( WFBitMapS -> SyncSource )
block drbd1: Began resync as SyncSource (will sync 209715200 KB
[52428800 bits set]).
block drbd1: peer( Secondary -> Unknown ) conn( SyncSource ->
TearDown )
block drbd1: asender terminated
block drbd1: Terminating asender thread
block drbd1: Connection closed
block drbd1: conn( TearDown -> Unconnected )
block drbd1: receiver terminated
block drbd1: Restarting receiver thread
block drbd1: receiver (re)started
block drbd1: conn( Unconnected -> WFConnection )
block drbd1: role( Primary -> Secondary )
block drbd1: conn( WFConnection -> Disconnecting )
block drbd1: Discarding network configuration.
block drbd1: Connection closed
block drbd1: conn( Disconnecting -> StandAlone )
block drbd1: receiver terminated
block drbd1: Terminating receiver thread
block drbd1: disk( UpToDate -> Diskless ) pdsk( Inconsistent ->
DUnknown )
block drbd1: drbd_bm_resize called with capacity == 0
block drbd1: worker terminated
block drbd1: Terminating worker thread

In the logs, everything is fine but the sync just stalls, and given
time never recovers (5-10 minutes). No dropped packets or any other
indications in the ring buffer as to any errors.

Here is another bit of information: I can add more than 17 drbd
instances to nodes when the instances are not doing anything. I added
20 pvm instances that just hung in kickstart (meaning the instance was
started but was just waiting for input) without any issue on 2 nodes
(50/50 primary on each node) that were recently added/updated.

I am stumped.

On Apr 13, 11:59 am, Simon Deziel <simon.dez...@gmail.com> wrote:
> On 12-04-13 11:58 AM, ampictage wrote:
>
> > Wondering if anyone else has had this problem:
>
> > I have a cluster of 5 ganeti hosts (xen with dom0_mem=4096M), but I
> > cannot add more that 16 drbd disks per host, once the 17th instance is
> > created, it stalls on sync with short read errors and never finishes.
>
> By default, the drbd modules only allows to create 32 devices which you
> might have reached with 16 instances (each instance takes 2 DRBD
> devices: data & metadata). Did you increased the minor_count when

> loading the drbd module ? Seehttp://docs.ganeti.org/ganeti/master/html/install.html#installing-drbd

Iustin Pop

unread,

Apr 17, 2012, 11:45:20 AM4/17/12

to gan...@googlegroups.com

On Tue, Apr 17, 2012 at 08:34:38AM -0700, ampictage wrote:
> DRBD is loaded with minor_count=128
> Gave 4GB to dom0 because initially only had 512MB, but ran into issues
> with out of memory errors when trying to add drbd instances, this was
> in production and did not have the time to test, so gave it too much
> versus too little.

Ack. For Xen, 1Gbe network machines work fine for us with 1024MB ram,
10Gbe need 2048M. FYI.

In between the above and the below message, what happens? This shows
like the secodary disconnects itself, but the secondary showed like the
primary disconnected. Was here than the stall happened and you
disconnected?

Me too. I haven't seen DRBD behaving like this before. Just to confirm:
this log was with what version of DRBD? And kernel?

If you create just one or two instance, and do heavy traffic, does DRBD
still hang?

iustin

ampictage

unread,

Apr 19, 2012, 11:38:02 AM4/19/12

to ganeti

Hey all,

OK, have fixed the problem with a suggestion from the drbd
list....upgraded to 8.3.12 solved the issue with the stalling.

As a side note, found 2 other issues when diagnosing:

Broadcom NICS using bnx2 driver, had to disable all checksum
offloading because it would cause network errors under high network
load.
ethtool -K <eth#> rx off tx off sg off tso off
(Thanks Iustin for suggesting to test this)

Had to increase # of loopback devices, because once I could get the
drbd instances, xen would not start them do to not having a loop
device to use to attach the drive.
added 'options loop max_loop=256' to modprobe.conf
Error reported was: Error: Device 5632 (vbd) could not be connected. /
etc/xen/scripts/block failed; error detected.)

Thanks for the input!

Iustin Pop

unread,

Apr 19, 2012, 11:41:34 AM4/19/12

to gan...@googlegroups.com

On Thu, Apr 19, 2012 at 08:38:02AM -0700, ampictage wrote:
> Hey all,
>
> OK, have fixed the problem with a suggestion from the drbd
> list....upgraded to 8.3.12 solved the issue with the stalling.

I see, good to know.

> As a side note, found 2 other issues when diagnosing:
>
> Broadcom NICS using bnx2 driver, had to disable all checksum
> offloading because it would cause network errors under high network
> load.
> ethtool -K <eth#> rx off tx off sg off tso off
> (Thanks Iustin for suggesting to test this)

If I did, then you're welcome, but I thought someone else did :)

> Had to increase # of loopback devices, because once I could get the
> drbd instances, xen would not start them do to not having a loop
> device to use to attach the drive.
> added 'options loop max_loop=256' to modprobe.conf
> Error reported was: Error: Device 5632 (vbd) could not be connected. /
> etc/xen/scripts/block failed; error detected.)

That is strange. When using DRBD/block, it shouldn't use loop, only when
you use file-based storage. Strange.