drbd resync speed dropped to 80 kb/s

tschend

unread,

Feb 22, 2013, 7:09:28 AM2/22/13

to ganeti

Hi,

yesterday we had an outage of one node on our three node test cluster.

We are using debian squeeze with ganeti 2.6.2

we did the following:

gnt-node failover broken.node
gnt-node modify --offline yes broken.node

Then we wanted to fix drbd relationships:

gnt-node evacuate -s broken.node

This generated a lot of jobs and some where finishing pretty fast.

Now there are a few left and is seems they are kept to 80 kb/s

here is the output of /prob/drbd on both servers:

Note: skipping offline node(s): broken.node
------------------------------------------------
node: node1
[=>..................] sync'ed: 12.5% (26892/30720)Mfinish:
76:29:37 speed: 80 (240) want: 80 K/sec
[==>.................] sync'ed: 19.7% (24692/30720)Mfinish:
166:49:46 speed: 40 (380) want: 40 K/sec
[=>..................] sync'ed: 10.9% (27404/30720)Mfinish:
87:42:15 speed: 84 (208) want: 80 K/sec
[>...................] sync'ed: 6.0% (38532/40960)Mfinish:
182:40:52 speed: 44 (152) want: 40 K/sec
[=>..................] sync'ed: 10.1% (36860/40960)Mfinish:
124:30:21 speed: 80 (260) want: 80 K/sec
[========>...........] sync'ed: 46.8% (10912/20480)Mfinish:
73:43:10 speed: 40 (612) want: 40 K/sec
[>....................] sync'ed: 1.9% (60308/61452)Mfinish:
203:42:57 speed: 80 (72) want: 80 K/sec

return code = 0
------------------------------------------------
node: node1
[>....................] sync'ed: 1.9% (60308/61452)Mfinish:
171:33:01 speed: 80 (72) K/sec
[========>...........] sync'ed: 46.8% (10912/20480)Mfinish:
51:43:58 speed: 40 (612) K/sec
[>...................] sync'ed: 6.0% (38532/40960)Mfinish:
182:40:52 speed: 40 (152) K/sec
[=>..................] sync'ed: 10.1% (36860/40960)Mfinish:
104:50:49 speed: 80 (260) K/sec
[=>..................] sync'ed: 12.5% (26892/30720)Mfinish:
76:29:37 speed: 80 (240) K/sec
[==>.................] sync'ed: 19.7% (24692/30720)Mfinish:
166:49:46 speed: 40 (380) K/sec
[=>..................] sync'ed: 10.9% (27404/30720)Mfinish:
77:57:33 speed: 80 (208) K/sec

This is very strange. We have no high load on the boxes. No iowait.

Here are the disk parameters from the gnt-cluster info:

Default disk parameters:
- blockdev:
- diskless:
- drbd:
c-delay-target: 1
c-fill-target: 0
c-max-rate: 211440
c-min-rate: 8096
c-plan-ahead: 20
data-stripes: 1
disk-barriers: n
disk-custom:
dynamic-resync: True
meta-barriers: False
meta-stripes: 1
metavg: xenvg
net-custom:
resync-rate: 101440

Any suggestions?

Regards
Thomas

Iustin Pop

unread,

Feb 22, 2013, 7:12:53 AM2/22/13

to gan...@googlegroups.com

This is strange. I looks like somehow drbd "forgot" the correct sync
rate.

You can reset it manually via "drbdsetup /dev/drbdX syncer -r 100m" (i
hope this is the correct syntax).

regards,
iustin

tschend

unread,

Feb 22, 2013, 7:28:04 AM2/22/13

to ganeti

Hi iustin,

i was crawling through the ganeti and drbd docs.

Maybe there is a version issue.

We are running drbd 8.3.11

The gnt-cluster manpage shows:

c-min-rate
Minimum resync speed for the dynamic resync speed controller. [KiB/s]

but the drbd docs say:

c-min-rate min_rate
A node that is primary and sync-source has to schedule application IO
requests and resync IO requests. The min_rate tells DRBD use only up
to min_rate for resync IO and to dedicate all other available IO
bandwidth to application requests.

Note: The value 0 has a special meaning. It disables the limitation of
resync IO completely, which might slow down application IO
considerably. Set it to a value of 1, if you prefer that resync IO
never slows down application IO.

Note: Although the name might suggest that it is a lower bound for the
dynamic resync speed controller, it is not. If the DRBD-proxy buffer
is full, the dynamic resync speed controller is free to lower the
resync speed down to 0, completely independent of the c-min-rate
setting.

Min_rate has 4096 (4MiB/s) and KiB/s as default unit.

I am using all the defaults on this cluster so far.

Changeslog of drbd has no entry about such an "issue".

I will try to solve this with drbdsetup now.

Regards
Thomas

Iustin Pop

unread,

Feb 22, 2013, 7:35:15 AM2/22/13

to gan...@googlegroups.com

Oh sorry. I didn't see that you're using dynamic resync, and assumed
static resync.

Sorry, I don't know how DRBD behaves w.r.t. sync prioritisation when
using dynamic resync…

iustin

> --
> You received this message because you are subscribed to the Google Groups "ganeti" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ganeti+un...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

tschend

unread,

Feb 22, 2013, 7:38:59 AM2/22/13

to ganeti

Looks like the correct syntax but does not speed up the sync.

Regards
Thomas

On 22 Feb., 13:12, Iustin Pop <ius...@google.com> wrote:

tschend

unread,

Feb 22, 2013, 7:43:57 AM2/22/13

to ganeti

Just one small update. I have done a drbdsetup /dev/drbd3 -m to set
the min rate. I increased it from 8096 to 211440 but did not change
the resync speed.

drbdsetup /dev/drbd3 show
disk {
size 41943040s; # bytes
on-io-error detach;
fencing dont-care _is_default;
max-bio-bvecs 0 _is_default;
}
net {
timeout 60 _is_default; # 1/10 seconds
max-epoch-size 2048 _is_default;
max-buffers 2048 _is_default;
unplug-watermark 128 _is_default;
connect-int 10 _is_default; # seconds
ping-int 10 _is_default; # seconds
sndbuf-size 0 _is_default; # bytes
rcvbuf-size 0 _is_default; # bytes
ko-count 0 _is_default;
cram-hmac-alg "md5";
shared-secret
"5fa4373805b190f79649625a7b2ab9ca8b8898d5";
after-sb-0pri discard-zero-changes;
after-sb-1pri consensus;
after-sb-2pri disconnect _is_default;
rr-conflict disconnect _is_default;
ping-timeout 5 _is_default; # 1/10 seconds
on-congestion block _is_default;
congestion-fill 0s _is_default; # byte
congestion-extents 127 _is_default;
}
syncer {
rate 102400k; # bytes/second
after -1 _is_default;
al-extents 257;
on-no-data-accessible io-error _is_default;
c-plan-ahead 20; # 1/10 seconds
c-delay-target 1; # 1/10 seconds
c-fill-target 0s _is_default; # bytes
c-max-rate 211440k; # bytes/second
c-min-rate 211440k; # bytes/second
}
protocol C;
_this_host {
device minor 3;
disk "/dev/xenvg/02418866-b50f-4a94-bea6-
c295d7c2b3a0.disk0_data";
meta-disk "/dev/xenvg/02418866-b50f-4a94-bea6-
c295d7c2b3a0.disk0_meta" [ 0 ];
address ipv4 10.0.120.230:11003;
}
_remote_host {
address ipv4 10.0.120.232:11003;

}

Regards
Thomas

On 22 Feb., 13:12, Iustin Pop <ius...@google.com> wrote:

Iustin Pop

unread,

Feb 22, 2013, 7:52:43 AM2/22/13

to gan...@googlegroups.com

On Fri, Feb 22, 2013 at 04:38:59AM -0800, tschend wrote:
> Looks like the correct syntax but does not speed up the sync.

No, because that syntax only applies to static sync, which I thought you
were using.

iustin

tschend

unread,

Feb 22, 2013, 12:05:15 PM2/22/13

to ganeti

Hi,

so disabling dynamic sync

drbdsetup /dev/drbdX -p 0

and setting static sync speed to 100m works

drbdsetuo /dev/drbdX -r 100m

I hope that helps someone in the future.

Maybe not setting the dynamic syncer to true as a default would be
good.

I am also interested if someone else had this problem.

Regards
Thomas

Iustin Pop

unread,

Feb 25, 2013, 3:42:43 AM2/25/13

to gan...@googlegroups.com

On Fri, Feb 22, 2013 at 09:05:15AM -0800, tschend wrote:
> Hi,
>
> so disabling dynamic sync
>
> drbdsetup /dev/drbdX -p 0
>
> and setting static sync speed to 100m works
>
> drbdsetuo /dev/drbdX -r 100m
>
> I hope that helps someone in the future.
>
> Maybe not setting the dynamic syncer to true as a default would be
> good.

Hmm. Was it set to true by default? I thought you explicitly enabled it.
In constants.py, it is:

DISK_LD_DEFAULTS = {
LD_DRBD8: {
LDP_RESYNC_RATE: CLASSIC_DRBD_SYNC_SPEED,
LDP_BARRIERS: _autoconf.DRBD_BARRIERS,
LDP_NO_META_FLUSH: _autoconf.DRBD_NO_META_FLUSH,
LDP_DEFAULT_METAVG: DEFAULT_VG,
LDP_DISK_CUSTOM: "",
LDP_NET_CUSTOM: "",
LDP_DYNAMIC_RESYNC: False,
^^^^^^^^^^^^^^^^^^^^^^^^^^

So it is strange…

Lari Hotari

unread,

Feb 26, 2013, 7:05:53 AM2/26/13

to gan...@googlegroups.com, tschend

Have you tested the speed of your network with iperf or similar?

This is not related to resync speed but, I've experienced that ethernet offload settings can cause problems on some hardware and especially with virtualization (linux bridge).

Try turning off some of the offload settings for your network card:

For example:
/sbin/ethtool --offload eth0 gso off tso off sg off gro off
/sbin/ethtool --offload br0 gso off tso off sg off gro off

quote from RHEL KVM documentation:
" If you experience low performance with the para-virtualized network drivers, verify the setting for the GSO and TSO features on the host system. The para-virtualized network drivers require that the GSO and TSO options are disabled for optimal performance."
https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Host_Configuration_and_Guest_Installation_Guide/ch10s04.html

Another reference: http://serverfault.com/questions/272483/why-is-tcp-accept-performance-so-bad-under-xen/272496#272496

I'm not sure at all if this helps but I've seen some really strange network performance problems when offload settings are enabled.

Regards,

Lari

Reply all

Reply to author

Forward