DRBD speed not going higher than ~95MB/s

1,330 views
Skip to first unread message

John N.

unread,
May 1, 2016, 5:57:15 PM5/1/16
to ganeti
Hello,

I am running ganeti 2.12 on Debian 8 (official packages from Debian repo) on new servers with enterprise SSD disks as well as 10 Gbit/s Nexus switches but for some reason I haven't found yet DRBD synchronisation does not go higher than ~95 MB/s.

No matter what DRBD option I change (max-buffers, max-epoch-time, resync-rate, etc.) it does not go over that barrier of 100 MB/s which seems really weird to me like if something else would be blocking this limit.

Any ideas what else I should be checking or trying? or does ganeti enforce such a limit somewhere else?

Regards
John

Lucas, Sascha

unread,
May 2, 2016, 3:14:33 AM5/2/16
to ganeti
Hi John,

John N. wrote:

> I haven't found yet DRBD synchronisation does not go higher than ~95 MB/s.

> No matter what DRBD option I change (max-buffers, max-epoch-time, resync-rate, etc.) it does not go over that barrier of 100 MB/s ...

I can confirm that there is a magic barrier around 100MB/s (sometimes 120MB/s). I've verified, that it is not the disk, nor the network. I've created a VM with 3 disks and I got 3x100MB/s (on spinning disks/RAID).

Because I do not care to make a single disk sync faster, I never investigated further. But I assume the following:

There might be a chance that the dynamic resync controller is active. I'm just quoting my words on a different thread:

In the DRBD docs regarding the dynamic resync controller[1] I read: "It is enabled by default with 8.4, and disabled by default with 8.3. To explicitly enable or disable, set c-plan-ahead to 20 (enable) or 0 (disable)."

Taking a look into the resulting drbd-config with "drbdsetup show /dev/drbdX", shows that there is not c-plan-ahead=0. So I must conclude that the dynamic resync controller is active? But looking at Ganeti I got the impression it is disabled:

Default disk parameters:
drbd:
...
dynamic-resync: False

What do you think? Thanks, Sascha.

[1] https://blogs.linbit.com/p/443/drbd-sync-rate-controller-2/



Aufsichtsratsvorsitzender: Herbert Vogel
Geschäftsführung: Michael Krüger
Sitz der Gesellschaft: Halle/Saale
Registergericht: Amtsgericht Stendal | Handelsregister-Nr. HRB 208414
UST-ID-Nr. DE 158253683

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Empfänger sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail oder des Inhalts dieser Mail sind nicht gestattet. Diese Kommunikation per E-Mail ist nicht gegen den Zugriff durch Dritte geschützt. Die GISA GmbH haftet ausdrücklich nicht für den Inhalt und die Vollständigkeit von E-Mails und den gegebenenfalls daraus entstehenden Schaden. Sollte trotz der bestehenden Viren-Schutzprogramme durch diese E-Mail ein Virus in Ihr System gelangen, so haftet die GISA GmbH - soweit gesetzlich zulässig - nicht für die hieraus entstehenden Schäden.

John N.

unread,
May 2, 2016, 3:55:40 AM5/2/16
to ganeti
Hi Sascha

Thanks to you now I don't think I am crazy :) and that there really is an upper limit somewhere defined of 100 MB/s.

Yesterday I also tried to set c-plan-ahead to 0 in order to disable the dynamic resync controller but no success and as you correctly mentioned while checking the config of a DRBD device which drbdsetup show you can't find any traces of the c-plan-ahead parameter. So it looks like this parameter is there but not used.

It looks like these parameters are there but that the ganeti team did not implement them... In this case only someone of the ganeti team could help us I suppose. Anyone from google ganeti team read this? Could you please comment about this?

Nowadays with 10Gbit/s interfaces it would really be nice to have faster resync rates with DRBD...

Regards
John

Benjamin Redling

unread,
May 2, 2016, 4:17:37 AM5/2/16
to gan...@googlegroups.com
Hi,

On 2016-05-02 09:55, John N. wrote:
> Thanks to you now I don't think I am crazy :) and that there really is an
> upper limit somewhere defined of 100 MB/s.
[...]
> Nowadays with 10Gbit/s interfaces it would really be nice to have faster
> resync rates with DRBD...

Highly interesting even without 10Gbit/s:
I have link aggregation on my wish-list for our ganeti setup and
perfomance-wise investing any time in that wouldn't make much sense
considering this issue.

Nobody on the list using either using a fast network or link aggregation
having this issue?

Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321

Phil Regnauld

unread,
May 2, 2016, 4:23:58 AM5/2/16
to gan...@googlegroups.com
Benjamin Redling (benjamin.rampe) writes:
>
> Highly interesting even without 10Gbit/s:
> I have link aggregation on my wish-list for our ganeti setup and
> perfomance-wise investing any time in that wouldn't make much sense
> considering this issue.
>
> Nobody on the list using either using a fast network or link aggregation
> having this issue?

What type of link aggregation ? If doing LACP, you probably won't see
each DRBD throughput exceed the capacity of a single member link,
due to the way hashing is done on source/destination addresses (L2 and
sometimes L3). Overall, it will help if you have multiple DRBD syncs
going on.

Cheers,
Phil

John N.

unread,
May 2, 2016, 4:36:43 AM5/2/16
to ganeti

On Monday, May 2, 2016 at 10:17:37 AM UTC+2, Benjamin Redling wrote:
Nobody on the list using either using a fast network or link aggregation
having this issue?

I am using LACP (802.3ad) with two 10 Gbit/s fiber SFP+ links for each ganeti node going to two different Cisco Nexus switches and can not exceed this 100 MB/s DRBD sync limit although excellent links and enterprises SSD disks.

J.
 

Simon Deziel

unread,
May 2, 2016, 9:52:36 AM5/2/16
to gan...@googlegroups.com
Hi John,

I believe you want to tune the "resync-rate" cluster parameter. IIRC it
is set to 100MB/s by default.

HTH,
Simon

John N.

unread,
May 2, 2016, 11:43:41 AM5/2/16
to ganeti
Hi Simon,

I have the resync-rate set to 1250000 which should match for a 10Gbit/s connection but that does not change anything unfortunately.

Regards
J.

Simon Deziel

unread,
May 2, 2016, 11:54:58 AM5/2/16
to gan...@googlegroups.com
On 2016-05-02 11:43 AM, John N. wrote:
> I have the resync-rate set to 1250000 which should match for a 10Gbit/s
> connection but that does not change anything unfortunately.

Oh right, that only changes the resync rate (duh!). I think the initial
sync speed is tunable, just haven't needed this since I always uses
"--no-wait-for-sync" when creating new instances.

John N.

unread,
May 4, 2016, 4:20:00 AM5/4/16
to ganeti
I think this is a ganeti issue which needs to be tackled by the ganeti dev team @google and as such I have posted an issue here:

https://code.google.com/p/ganeti/issues/detail?id=1176

Cheers
J.

Viktor Bachraty

unread,
May 4, 2016, 5:53:14 AM5/4/16
to ganeti

It looks like these parameters are there but that the ganeti team did not implement them... In this case only someone of the ganeti team could help us I suppose. Anyone from google ganeti team read this? Could you please comment about this?

We haven't had the need for tuning single device resync performance, actually we were throttling DRBD resync-rate to evenly distribute throughput between multiple devices. The mentioned parameters are implemented. Maybe this would be of help? http://serverfault.com/questions/740311/drbd-terrible-sync-performance-on-10gige

The parameters are used here:

The default value for c-plan-ahead is 20, but it is unused as dynamic disk resync seems to be False by default:

Defaults in a more readable form:
lib/_constants.py
1942:DISK_DT_DEFAULTS = {"diskless":{}, "file":{}, "sharedfile":{}, "plain":{"stripes":1}, "blockdev":{}, "drbd":{"c-delay-target":1, "c-fill-target":0, "c-max-rate":61440, "c-min-rate":4096, "c-plan-ahead":20, "data-stripes":1, "disk-barriers":"n", "disk-custom":"", "dynamic-resync":False, "meta-barriers":False, "meta-stripes":1, "metavg":"xenvg", "net-custom":"", "protocol":"C", "resync-rate":61440}, "rbd":{"access":"kernelspace", "pool":"rbd"}, "ext":{"access":"kernelspace"}, "gluster":{"access":"kernelspace", "host":"127.0.0.1", "port":24007, "volume":"gv0"}}


Lucas, Sascha

unread,
May 4, 2016, 7:31:25 AM5/4/16
to gan...@googlegroups.com
Hi Viktor,

Viktor Bachraty wrote:
> Maybe this would be of help? http://serverfault.com/questions/740311/drbd-terrible-sync-performance-on-10gige

As I read the above "c-plan-ahead 0" solves the slow resync speed. Right? Ganeti does not set c-plan-ahead per default:

$ drbdsetup show /dev/drbd1 | grep -c c-plan
0

So far so good. But according to [1] the dynamic resync controller "is enabled by default with 8.4" and "set c-plan-ahead to .. 0 (to) disable".

Doesn't this mean, if Ganeti does not set c-plan-ahead=0 that the dynamic resync controller is active per default? And accordingly drops speed at ~100MB/s?

If so, the following cluster parameter is misleading:

Default disk parameters:
drbd:
...
dynamic-resync: False

Viktor Bachraty

unread,
May 4, 2016, 8:20:30 AM5/4/16
to ganeti
Hi Sascha,


On Wednesday, May 4, 2016 at 12:31:25 PM UTC+1, sascha wrote:
Hi Viktor,

Viktor Bachraty wrote:
>  Maybe this would be of help? http://serverfault.com/questions/740311/drbd-terrible-sync-performance-on-10gige

As I read the above "c-plan-ahead 0" solves the slow resync speed. Right? Ganeti does not set c-plan-ahead per default:

Yes, that's how I understand it (haven't tried though).

$ drbdsetup show /dev/drbd1 | grep -c c-plan
0

So far so good. But according to [1] the dynamic resync controller "is enabled by default with 8.4" and "set c-plan-ahead to .. 0 (to) disable".

Doesn't this mean, if Ganeti does not set c-plan-ahead=0 that the dynamic resync controller is active per default? And accordingly drops speed at ~100MB/s?

If so, the following cluster parameter is misleading:

Right, the 'dynamic-resync' cluster parameter name is ambiguous and can be misleading. According to the code that the first git link I've posted points to, it should be rather named 'configure-dynamic-resync'. Unless it's True, Ganeti won't change c-* values, so DRDB defaults will apply. To make Ganeti disable the dynamic resync controller you would have to set 'dynamic-resync=True' and 'c-plan-ahead=0' (note that if you don't set the other c- values, Ganeti will use it's own defaults which may differ from DRBD's defaults). 

Viktor

John N.

unread,
May 4, 2016, 9:28:04 AM5/4/16
to ganeti
Hi Viktor,


On Wednesday, May 4, 2016 at 2:20:30 PM UTC+2, Viktor Bachraty wrote:
Right, the 'dynamic-resync' cluster parameter name is ambiguous and can be misleading. According to the code that the first git link I've posted points to, it should be rather named 'configure-dynamic-resync'. Unless it's True, Ganeti won't change c-* values, so DRDB defaults will apply. To make Ganeti disable the dynamic resync controller you would have to set 'dynamic-resync=True' and 'c-plan-ahead=0' (note that if you don't set the other c- values, Ganeti will use it's own defaults which may differ from DRBD's defaults). 

Now that all makes sense, thanks for explaining. The naming of the 'dynamic-resync' parameter is really confusing like you say. So I was able to set this parameter to True and finally I could get over the 100 MB/s barrier achieving 175 MB/s leaving all the other parameters to their defaults (except for net-custom). Unfortunately setting dynamic-resync=True as well as c-plan-ahead=0 does not work as you can see below what happens during a gnt-instance add:

Wed May  4 15:21:36 2016  - WARNING: Device creation failed
Failure: command execution error:
Can't create block device <DRBD8(hosts=fa17d3b8-e5e5-4675-af34-8d41fb381ae7/4-ccfee5ec-2b0c-4e15-b32f-2293542445d0/4, port=11004, backend=<LogicalVolume(/dev/ffzgvg/6367d27a-5516-4e11-8da2-90f8118f4afc.disk0_data, not visible, size=20480m)>, metadev=<LogicalVolume(/dev/ffzgvg/6367d27a-5516-4e11-8da2-90f8118f4afc.disk0_meta, not visible, size=128m)>, visible as /dev/disk/0, size=20480m)> on node node1.domain.com for instance instance5.domain.com: Error while executing backend function: Can't execute ''A value of 0 for c-plan-ahead disables the dynamic sync speed controller at DRBD level. If you want to disable it, please set the dynamic-resync disk parameter to False.'': not found ([Errno 2] No such file or directory)

Regards
J.

Viktor Bachraty

unread,
May 4, 2016, 10:13:07 AM5/4/16
to gan...@googlegroups.com
Hi John,

Thanks for testing.  This sounds like a bug - it's a bit unfortunate that c-plan-ahead is used for both tuning the dynamic sync controller and also turning it off - that was probably the root cause that caused misinterpreting dynamic-resync in different places of the code.  Could you please update issue 1176  with your findings ?
Thanks,

Viktor

John N.

unread,
May 4, 2016, 10:30:21 AM5/4/16
to ganeti

On Wednesday, May 4, 2016 at 4:13:07 PM UTC+2, Viktor Bachraty wrote:

Thanks for testing.  This sounds like a bug - it's a bit unfortunate that c-plan-ahead is used for both tuning the dynamic sync controller and also turning it off - that was probably the root cause that caused misinterpreting dynamic-resync in different places of the code.  Could you please update issue 1176  with your findings ?
Thanks,


Done! Thanks again to you for helping out.
 
Regards
J.

George K.

unread,
May 6, 2016, 7:01:25 AM5/6/16
to gan...@googlegroups.com
That's what we use at GRNET to get more than 250Mb/sec on 10Gbit ethernet:
# gnt-group info XXX | grep net-custom                                                                                                                                         
      net-custom: --max-buffers 36k --sndbuf-size 1024k --rcvbuf-size 2048k

hope it helps

John N.

unread,
May 6, 2016, 8:11:36 AM5/6/16
to ganeti

On Friday, May 6, 2016 at 1:01:25 PM UTC+2, George K. wrote:
That's what we use at GRNET to get more than 250Mb/sec on 10Gbit ethernet:
# gnt-group info XXX | grep net-custom                                                                                                                                         
      net-custom: --max-buffers 36k --sndbuf-size 1024k --rcvbuf-size 2048k

hope it helps

I will try to higher my max-buffers and play with  sndbuf-size as well as rcvbuf-size, are you actually using jumbo frames of 9000 bytes or standard 1500 bytes?

George K.

unread,
May 6, 2016, 8:35:36 AM5/6/16
to gan...@googlegroups.com
these nodes are currently using 1500

John N.

unread,
May 6, 2016, 9:30:06 AM5/6/16
to ganeti
Thanks George, your net-custom settings are amazing! I also managed to get 250 MB/s exactly like you with these. Interesting that you do not set at all the max-epoch-size setting as recommended in the ganeti wiki under performance tuning. I think the recommendations on the ganeti wiki should be revised for 10 Gbit/s networks.

Regards
J
Reply all
Reply to author
Forward
0 new messages