Creating instances takes a very long time using drbd disk-template

Bukar

unread,

Nov 29, 2016, 2:51:01 PM11/29/16

to ganeti, lord...@gmail.com, b.ca...@pobox.com

I have a two node cluster both on dl380 gen 9 servers, with the same configuration. They are both running Ubuntu 16.04.1 OS on raid 5 with ganeti version 2.15 installed and kvm as the hypervisor.

gnt-node list

Node DTotal DFree MTotal MNode MFree Pinst Sinst

host1 1.7T 1.4T 23.4G 1.4G 21.9G 1 0

host2 1.7T 1.4T 23.4G 270M 22.9G 0 1

I have tried creating instances with a drbd disk of 150GB, 200GB and now 300GB the list time it took me to create an instance with 150GB was 5 hours 30 minutes. And for the instance with 300GB 12 hours as can be seen below:

root@host1:~# gnt-instance add -t drbd -o noop -s 300G -B minmem=5120M,maxmem=8192M -n host1.:host2. --no-start --no-name-check --no-ip-check clientA

Tue Nov 29 18:24:31 2016 * creating instance disks...

Tue Nov 29 18:24:34 2016 adding instance clientA to cluster config

Tue Nov 29 18:24:34 2016 adding disks to cluster config

Tue Nov 29 18:24:34 2016 - INFO: Waiting for instance clientA to sync disks

Tue Nov 29 18:24:35 2016 - INFO: - device disk/0: 0.10% done, 12h 8m 9s remaining (estimated)

Tue Nov 29 18:25:35 2016 - INFO: - device disk/0: 0.20% done, 12h 7m 10s remaining (estimated)

Tue Nov 29 18:26:35 2016 - INFO: - device disk/0: 0.30% done, 12h 3m 58s remaining (estimated)

Tue Nov 29 18:27:36 2016 - INFO: - device disk/0: 0.50% done, 11h 54m 12s remaining (estimated)

Tue Nov 29 18:28:36 2016 - INFO: - device disk/0: 0.60% done, 12h 4m 15s remaining (estimated)

Tue Nov 29 18:29:36 2016 - INFO: - device disk/0: 0.70% done, 12h 53m 25s remaining (estimated)

Tue Nov 29 18:30:36 2016 - INFO: - device disk/0: 0.90% done, 12h 30m 6s remaining (estimated)

Tue Nov 29 18:31:36 2016 - INFO: - device disk/0: 1.00% done, 12h 12m 58s remaining (estimated)

Tue Nov 29 18:32:37 2016 - INFO: - device disk/0: 1.10% done, 12h 0m 21s remaining (estimated)

Tue Nov 29 18:33:37 2016 - INFO: - device disk/0: 1.30% done, 11h 53m 26s remaining (estimated)

Tue Nov 29 18:34:37 2016 - INFO: - device disk/0: 1.40% done, 11h 42m 46s remaining (estimated)

Tue Nov 29 18:35:37 2016 - INFO: - device disk/0: 1.50% done, 12h 1m 25s remaining (estimated)

Tue Nov 29 18:36:37 2016 - INFO: - device disk/0: 1.70% done, 12h 12m 44s remaining (estimated)

Tue Nov 29 18:37:38 2016 - INFO: - device disk/0: 1.80% done, 12h 3m 31s remaining (estimated)

Tue Nov 29 18:38:38 2016 - INFO: - device disk/0: 1.90% done, 12h 27m 44s remaining (estimated)

I am using 3 physical network ports on the server, one of which I configured as a bridge interface for the instances

I think it is not a normal, but I have configured about 7 different scenarios, with fresh installation of the OS using both Ubuntu 14.04 and 16.04 and different set of hard drives but with almost the same result as regards to the time it takes to create an instance. Although, if I try creating an instance with plain disk template, it takes less than a minute to create 300GB. That is why I feel probably I am getting it wrong somewhere.

gnt-cluster info

Cluster name: ngcluster.

Cluster UUID: 459ea9d6-bef8-49fa-b0c1-2a685f1099bf

Creation time: 2016-11-29 15:46:49

Modification time: 2016-11-29 17:03:18

Master node: host1.

Architecture (this node): 64bits (x86_64)

Tags: (none)

Default hypervisor: kvm

Enabled hypervisors: kvm

Hypervisor parameters:

kvm:

acpi: True

boot_order: disk

cdrom2_image_path:

cdrom_disk_type:

cdrom_image_path:

cpu_cores: 0

cpu_mask: all

cpu_sockets: 0

cpu_threads: 0

cpu_type:

disk_aio: threads

disk_cache: default

disk_type: paravirtual

floppy_image_path:

initrd_path:

kernel_args: ro

kernel_path:

keymap:

kvm_extra:

kvm_flag:

kvm_path: /usr/bin/kvm

machine_version:

mem_path:

migration_bandwidth: 32

migration_caps:

migration_downtime: 30

migration_mode: live

migration_port: 8102

nic_type: paravirtual

reboot_behavior: reboot

root_path: /dev/vda1

security_domain:

security_model: none

serial_console: True

serial_speed: 38400

soundhw:

spice_bind:

spice_image_compression:

spice_ip_version: 0

spice_jpeg_wan_compression:

spice_password_file:

spice_playback_compression: True

spice_streaming_video:

spice_tls_ciphers: HIGH:-DES:-3DES:-EXPORT:-DH

spice_use_tls: False

spice_use_vdagent: True

spice_zlib_glz_wan_compression:

usb_devices:

usb_mouse:

use_chroot: False

use_localtime: False

user_shutdown: False

vga:

vhost_net: False

virtio_net_queues: 1

vnc_bind_address: 0.0.0.0

vnc_password_file:

vnc_tls: False

vnc_x509_path:

vnc_x509_verify: False

vnet_hdr: True

OS-specific hypervisor parameters:

OS parameters:

Hidden OSes:

Blacklisted OSes:

Cluster parameters:

candidate pool size: 10

maximal number of jobs running simultaneously: 20

maximal number of jobs simultaneously tracked by the scheduler: 25

mac prefix: aa:00:00

master netdev: eno2

master netmask: 32

use external master IP address setup script: False

lvm volume group: ngganeti

lvm reserved volumes: (none)

drbd usermode helper: /bin/true

file storage path: /srv/ganeti/file-storage

shared file storage path: /srv/ganeti/shared-file-storage

gluster storage path: /var/run/ganeti/gluster

maintenance of node health: False

uid pool:

default instance allocator: hail

default instance allocator parameters:

primary ip version: 4

preallocation wipe disks: False

OS search path: /srv/ganeti/os, /usr/local/lib/ganeti/os, /usr/lib/ganeti/os, /usr/share/ganeti/os

ExtStorage Providers search path: /srv/ganeti/extstorage, /usr/local/lib/ganeti/extstorage, /usr/lib/ganeti/extstorage, /usr/share/ganeti/extstorage

enabled disk templates: drbd, plain

install image:

instance communication network:

zeroing image:

compression tools:

- gzip

- gzip-fast

- gzip-slow

enabled user shutdown: False

Default node parameters:

cpu_speed: 1

exclusive_storage: False

oob_program:

ovs: False

ovs_link:

ovs_name: switch1

spindle_count: 1

ssh_port: 22

Default instance parameters:

default:

always_failover: False

auto_balance: True

maxmem: 128

minmem: 128

spindle_use: 1

vcpus: 1

Default nic parameters:

default:

link: br0

mode: bridged

vlan:

Default disk parameters:

blockdev:

diskless:

drbd:

c-delay-target: 1

c-fill-target: 0

c-max-rate: 61440

c-min-rate: 4096

c-plan-ahead: 20

data-stripes: 1

disk-barriers: n

disk-custom:

dynamic-resync: False

meta-barriers: False

meta-stripes: 1

metavg: ngganeti

net-custom:

protocol: C

resync-rate: 61440

ext:

access: kernelspace

file:

gluster:

access: kernelspace

host: 127.0.0.1

port: 24007

volume: gv0

plain:

stripes: 1

rbd:

access: kernelspace

pool: rbd

sharedfile:

Instance policy - limits for instances:

bounds specs:

- max/0:

cpu-count: 8

disk-count: 16

disk-size: 1048576

memory-size: 32768

nic-count: 8

spindle-use: 12

min/0:

cpu-count: 1

disk-count: 1

disk-size: 1024

memory-size: 128

nic-count: 1

spindle-use: 1

std:

cpu-count: 1

disk-count: 1

disk-size: 1024

memory-size: 128

nic-count: 1

spindle-use: 1

allowed disk templates: drbd, plain

vcpu-ratio: 4

spindle-ratio: 32

Data collectors:

cpu-avg-load:

active: True

interval: 5.000s

diskstats:

active: True

interval: 5.000s

drbd:

active: True

interval: 5.000s

inst-status-xen:

active: True

interval: 5.000s

lv:

active: True

interval: 5.000s

xen-cpu-avg-load:

active: True

interval: 5.000s

______________________

As can be seen in the gnt-verify command, I am presently having issues with ssh communication between the nodes, could it be the issue?

root@host1:~# gnt-cluster verify

Submitted jobs 76, 77

Waiting for job 76 ...

Tue Nov 29 18:21:22 2016 * Verifying cluster config

Tue Nov 29 18:21:22 2016 * Verifying cluster certificate files

Tue Nov 29 18:21:22 2016 * Verifying hypervisor parameters

Tue Nov 29 18:21:22 2016 * Verifying all nodes belong to an existing group

Waiting for job 77 ...

Tue Nov 29 18:21:23 2016 * Verifying group 'default'

Tue Nov 29 18:21:23 2016 * Gathering data (2 nodes)

Tue Nov 29 18:21:23 2016 * Gathering information about nodes (2 nodes)

Tue Nov 29 18:21:25 2016 * Gathering disk information (2 nodes)

Tue Nov 29 18:21:25 2016 * Verifying configuration file consistency

Tue Nov 29 18:21:25 2016 - ERROR: node: The public key file '/var/lib/ganeti/ganeti_pub_keys' does not exist. Consider running 'gnt-cluster renew-crypto --new-ssh-keys [--no-ssh-key-check]' to fix this.

Tue Nov 29 18:21:25 2016 * Verifying node status

Tue Nov 29 18:21:25 2016 - ERROR: node host1.: ssh communication with node 'host2.': ssh problem: Permission denied (publickey,password).\'r\n

Tue Nov 29 18:21:25 2016 - ERROR: node host1.: ssh communication with node 'host1.': ssh problem: Permission denied (publickey,password).\'r\n

Tue Nov 29 18:21:25 2016 - ERROR: node host2.: ssh communication with node 'host2.': ssh problem: Permission denied (publickey,password).\'r\n

Tue Nov 29 18:21:25 2016 - ERROR: node host2.: ssh communication with node 'host1.': ssh problem: Permission denied (publickey,password).\'r\n

Tue Nov 29 18:21:25 2016 * Verifying instance status

Tue Nov 29 18:21:25 2016 * Verifying orphan volumes

Tue Nov 29 18:21:25 2016 * Verifying N+1 Memory redundancy

Tue Nov 29 18:21:25 2016 * Other Notes

Tue Nov 29 18:21:25 2016 * Hooks Results

Kindly assist, I need to figure out what I have been doing wrong. I will really appreciate if I can find a way to configure drbd to work as it should.

Thank you

Iustin Pop

unread,

Nov 29, 2016, 3:39:36 PM11/29/16

to gan...@googlegroups.com, lord...@gmail.com, b.ca...@pobox.com

On 2016-11-29 11:51:01, Bukar wrote:
>
>
> I have a two node cluster both on dl380 gen 9 servers, with the same
> configuration. They are both running Ubuntu 16.04.1 OS on raid 5 with
> ganeti version 2.15 installed and kvm as the hypervisor.
>
>
>

> *gnt-node list *

>
> Node DTotal DFree MTotal MNode MFree Pinst Sinst
>
> host1 1.7T 1.4T 23.4G 1.4G 21.9G 1 0
>
> host2 1.7T 1.4T 23.4G 270M 22.9G 0 1
>
>
>
> I have tried creating instances with a drbd disk of 150GB, 200GB and now
> 300GB the list time it took me to create an instance with 150GB was 5 hours
> 30 minutes. And for the instance with 300GB 12 hours as can be seen below:
>

> *root@host1:~# gnt-instance add -t drbd -o noop -s 300G -B

> minmem=5120M,maxmem=8192M -n host1.:host2. --no-start

> --no-name-check --no-ip-check clientA*

>
> Tue Nov 29 18:24:31 2016 * creating instance disks...
>
> Tue Nov 29 18:24:34 2016 adding instance clientA to cluster config
>
> Tue Nov 29 18:24:34 2016 adding disks to cluster config
>
> Tue Nov 29 18:24:34 2016 - INFO: Waiting for instance clientA to sync disks
>
> Tue Nov 29 18:24:35 2016 - INFO: - device disk/0: 0.10% done, 12h 8m 9s
> remaining (estimated)
>

> Tue Nov 29 18:38:38 2016 - INFO: - device disk/0: 1.90% done, 12h 27m 44s
> remaining (estimated)
>
>
>
> I am using 3 physical network ports on the server, one of which I
> configured as a bridge interface for the instances
>
> I think it is not a normal, but I have configured about 7 different
> scenarios, with fresh installation of the OS using both Ubuntu 14.04 and
> 16.04 and different set of hard drives but with almost the same result as
> regards to the time it takes to create an instance. Although, if I try
> creating an instance with plain disk template, it takes less than a minute
> to create 300GB. That is why I feel probably I am getting it wrong
> somewhere.

The plain disk template does no heavy disk write, so it's kind of
expected.

A few questions:

- What is the network speed? You mention 3 NICs, but of which type?
- What's the network bandwidth that you can achieve between primary and
secondary, on the IPs used for replication?
- What is the disk usage on both the primary and secondary ("iostat -dkx
/dev/sd? 15 50", let it run and see the values at the end)?
- What are the node parameters (gnt-node info) for primary and
secondary?

regards,
iustin

candlerb

unread,

Nov 29, 2016, 4:10:18 PM11/29/16

to ganeti, lord...@gmail.com, b.ca...@pobox.com

I have tried creating instances with a drbd disk of 150GB, 200GB and now 300GB the list time it took me to create an instance with 150GB was 5 hours 30 minutes. And for the instance with 300GB 12 hours as can be seen below:

One thing is that the drbd replication rate is intentionally limited:

# gnt-cluster info | grep -i rate

c-max-rate: 61440

c-min-rate: 4096

resync-rate: 61440

But at 60MB/sec, your 300GB should take 5120 seconds = 1h 25m

So: (1) check the output of "cat /proc/drbd". It will show you the resync as it happens and the rate achieved.

(2) The resync may be slowed down if there are many other disk writes going on at the same time (which also need to be replicated. It may also be slowed down if you only have 100Mbps connectivity between your servers; that would limit you to about 12MB/sec.

Probably a good idea to monitor the utilisation on your network ports e.g. with a tool like cacti, librenms etc.

(3) If when you're creating a new instance you add the flag "--no-wait-for-sync" then the syncing will take place in the background. Of course you'll have no failover resilience until it has completed, but at least you can keep using your cluster.

(4) RAID5 gives generally very poor write performance. If you have a controller card with a working battery backup unit it may be acceptable.

Having said that, syncing should be quite efficient even on RAID5, but it does require that you are writing in full RAID stripes, and that your RAID array isn't doing lots of small writes at the same time. Small writes totally kill RAID5 performance. If that's the problem then you'll probably want to change to RAID10.

I'm afraid there's no magic wand here: you need to measure your disk I/O and network I/O figures to work out what's going on.

Regards,

Brian.

Bukar

unread,

Nov 30, 2016, 9:02:48 AM11/30/16

to ganeti, lord...@gmail.com, b.ca...@pobox.com

- What is the network speed? You mention 3 NICs, but of which type?

They are all Gigabyte ports

- What's the network bandwidth that you can achieve between primary and
secondary, on the IPs used for replication?

------------------------------------------------------------

Client connecting to 10.10.13.91, TCP port 5001

TCP window size: 85.0 KByte (default)

------------------------------------------------------------

[ 3] local 10.10.13.94 port 47682 connected with 10.10.13.91 port 5001

[ ID] Interval Transfer Bandwidth

[ 3] 0.0-10.0 sec 78.9 MBytes 66.0 Mbits/sec

root@host2:~# iperf -c 10.10.13.91

One of the culprit is here!!, I noticed the switch ports I was using were configured with the speed of 100mbps I had to change it to 1000Mbps since they are all gigabyte ports.

See the output after the changes below:

------------------------------------------------------------

Client connecting to 10.10.13.91, TCP port 5001

TCP window size: 85.0 KByte (default)

------------------------------------------------------------

[ 3] local 10.10.13.94 port 47684 connected with 10.10.13.91 port 5001

[ ID] Interval Transfer Bandwidth

[ 3] 0.0-10.0 sec 1.10 GBytes 943 Mbits/sec

Afterwards there was a 600% improvement as creating an instance of 300GB now takes 2hours 15 minutes WOW!!

- What is the disk usage on both the primary and secondary ("iostat -dkx
/dev/sd? 15 50", let it run and see the values at the end)?

root@host1:~# iostat -dkx

Linux 4.4.0-47-generic (host1.) 11/30/2016 _x86_64_ (16 CPU)

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

sda 0.12 496.84 33.58 301.60 14264.50 3231.96 104.40 0.50 1.48 7.21 0.85 0.32 10.56

dm-0 0.00 0.00 11.47 791.23 446.66 3164.93 9.00 19.44 24.21 3.02 24.52 0.04 3.20

dm-1 0.00 0.00 0.55 0.63 2.52 2.54 8.59 0.00 4.03 3.80 4.23 2.46 0.29

dm-2 0.00 0.00 14.30 0.00 11542.07 0.00 1614.18 0.14 9.63 9.63 0.00 3.87 5.53

dm-3 0.00 0.00 0.29 0.39 1.61 11.15 37.62 0.00 1.94 4.21 0.29 0.10 0.01

drbd0 0.00 0.00 11.44 766.03 445.69 3058.16 9.01 57.84 74.40 3.04 75.47 0.05 4.03

drbd1 0.00 0.00 0.01 0.00 0.40 0.00 63.40 0.00 2.47 2.47 0.00 1.47 0.00

dm-4 0.00 0.00 2.61 0.00 2056.50 0.00 1573.63 0.03 12.23 12.23 0.00 4.33 1.13

dm-5 0.00 0.00 0.05 0.11 0.53 10.00 139.25 0.00 0.27 0.58 0.14 0.11 0.00

root@host1:~#

root@host2:~# iostat -dkx

Linux 4.4.0-47-generic (host2.) 11/30/2016 _x86_64_ (16 CPU)

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

sda 0.00 1.00 1.37 526.82 27.52 10311.46 39.15 0.04 0.08 12.63 0.05 0.03 1.57

dm-0 0.00 0.00 0.01 515.62 0.20 2062.49 8.00 0.02 0.04 6.35 0.04 0.02 1.06

dm-1 0.00 0.00 0.36 0.00 1.64 0.00 9.22 0.00 1.22 1.23 0.00 0.15 0.01

dm-2 0.00 0.00 0.01 9.30 0.25 7531.83 1617.38 0.00 0.41 1.04 0.41 0.41 0.38

dm-3 0.00 0.00 0.19 0.26 1.04 7.28 37.58 0.00 0.32 0.43 0.24 0.04 0.00

dm-4 0.00 0.00 0.01 0.83 0.15 691.06 1657.22 0.00 0.49 6.46 0.45 0.46 0.04

dm-5 0.00 0.00 0.03 0.06 0.29 6.48 158.49 0.00 0.32 0.27 0.35 0.11 0.00

- What are the node parameters (gnt-node info) for primary and
secondary?

gnt-node info host1

- Node name: host1

primary ip: 10.10.13.91

secondary ip: 10.10.13.92

master candidate: True

drained: False

offline: False

master_capable: True

vm_capable: True

primary for instances:

- Clientb

- ClientC

- ClientA

- stat

secondary for instances:

node parameters:

cpu_speed: default (1)

exclusive_storage: default (False)

oob_program: default ()

ovs: default (False)

ovs_link: default ()

ovs_name: default (switch1)

spindle_count: default (1)

ssh_port: default (22)

gnt-node info host1

- Node name: host1.

primary ip: 10.10.13.91

secondary ip: 10.10.13.92

master candidate: True

drained: False

offline: False

master_capable: True

vm_capable: True

primary for instances:

- ClientA

- Clientb

- ClientC

- stat

secondary for instances:

node parameters:

cpu_speed: default (1)

exclusive_storage: default (False)

oob_program: default ()

ovs: default (False)

ovs_link: default ()

ovs_name: default (switch1)

spindle_count: default (1)

ssh_port: default (22)

Thanks a lot

Bukar

unread,

Nov 30, 2016, 9:30:18 AM11/30/16

to ganeti, lord...@gmail.com, b.ca...@pobox.com

Hello Iustin and Brian,

Thanks a lot for your prompt response. We really appreciate the support you guys are providing to folks like myself.

Kudos to you guys!!.
#output of cat /proc/drbd

0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:42297588 nr:0 dw:42297588 dr:6079980 al:8318 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:5484 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:52428800 nr:0 dw:0 dr:52428832 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:52428800 nr:0 dw:0 dr:52428800 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

candlerb

unread,

Nov 30, 2016, 10:18:04 AM11/30/16

to ganeti, lord...@gmail.com, b.ca...@pobox.com

> I noticed the switch ports I was using were configured with the speed of 100mbps I had to change it to 1000Mbps since they are all gigabyte ports.

For future reference: leave all switch ports at "auto negotiation" and you should be fine.

It's only 20-year-old Catalyst switches that don't do auto negotiation properly.

#output of cat /proc/drbd

0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:42297588 nr:0 dw:42297588 dr:6079980 al:8318 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:5484 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:52428800 nr:0 dw:0 dr:52428832 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:52428800 nr:0 dw:0 dr:52428800 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Cool, that shows everything is in sync (between this particular node and its partners)

"cat /proc/drbd" gives the info in a slightly more detailled form.

If you do this while you are creating a new instance you'll get an idea of the replication speed.

Iustin Pop

unread,

Nov 30, 2016, 1:50:21 PM11/30/16

to gan...@googlegroups.com, lord...@gmail.com, b.ca...@pobox.com

On 2016-11-30 06:02:48, Bukar wrote:
>
>
> > - What is the network speed? You mention 3 NICs, but of which type?
> >
> They are all Gigabyte ports
>
> - What's the network bandwidth that you can achieve between primary and
> > secondary, on the IPs used for replication?
> >
>
>
>
> ------------------------------------------------------------
>
> Client connecting to 10.10.13.91, TCP port 5001
>
> TCP window size: 85.0 KByte (default)
>
> ------------------------------------------------------------
>
> [ 3] local 10.10.13.94 port 47682 connected with 10.10.13.91 port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 3] 0.0-10.0 sec 78.9 MBytes 66.0 Mbits/sec
>
> root@host2:~# iperf -c 10.10.13.91
>
>
>
> One of the culprit is here!!, I noticed the switch ports I was using were
> configured with the speed of 100mbps I had to change it to 1000Mbps since
> they are all gigabyte ports.

Good find!

> See the output after the changes below:
>
> ------------------------------------------------------------
>
> Client connecting to 10.10.13.91, TCP port 5001
>
> TCP window size: 85.0 KByte (default)
>
> ------------------------------------------------------------
>
> [ 3] local 10.10.13.94 port 47684 connected with 10.10.13.91 port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 3] 0.0-10.0 sec 1.10 GBytes 943 Mbits/sec

Yes, this is more the expected speed.

> Afterwards there was a 600% improvement as creating an instance of 300GB
> now takes 2hours 15 minutes WOW!!

:)

Unfortunately you've provided the "iostat -dkx", not "iostat -dkx 15
20", so this only shows average values since boot.

> - What are the node parameters (gnt-node info) for primary and
> > secondary?
> >
>
>
> gnt-node info host1
>
> - Node name: host1
>
> primary ip: 10.10.13.91
>
> secondary ip: 10.10.13.92

This is very strange. Your primary and secondary IP are on the same
network, so it looks like you don't use a separate network for the
replication. If this is the case, just don't use a secondary IP.

I would recommend though to use a separate network for the replication.

So it would be interesting to still see, now that you solved the
networking problem, what is the remaining bottleneck. A 300GB instance
should in theory finish replicating in 300*1024/100 (125 being max
theoretical throughput, but 100-110 more realistic) = 3072s, i.e. about
50 minutes.

I would guess it is storage, but for confirming that we need the iostat
numbers while the disks are replicating.

regards,
iustin

- what is the

candlerb

unread,

Nov 30, 2016, 2:38:52 PM11/30/16

to ganeti, lord...@gmail.com, b.ca...@pobox.com

On Wednesday, 30 November 2016 18:50:21 UTC, Iustin Pop wrote:

So it would be interesting to still see, now that you solved the
networking problem, what is the remaining bottleneck. A 300GB instance
should in theory finish replicating in 300*1024/100 (125 being max
theoretical throughput, but 100-110 more realistic) = 3072s, i.e. about
50 minutes.

Remember that the default ganeti configuration gives drbd a maximum target replication rate of 60MiB/sec (61440 KiB/sec), which makes 5120s or 1h 25min. Presumably this is to avoid fully flooding a 1G network when one VM is replicating.

It's still not achieving even that, so you're right to look at iostat next (make sure to look on both source and destination sides). But if neither side is showing 100% utilisation of any disk, you can try cranking up the drbd limit a bit.

Iustin Pop

unread,

Nov 30, 2016, 3:23:39 PM11/30/16

to gan...@googlegroups.com

On 2016-11-30 11:38:52, candlerb wrote:
> On Wednesday, 30 November 2016 18:50:21 UTC, Iustin Pop wrote:
> >
> >
> > So it would be interesting to still see, now that you solved the
> > networking problem, what is the remaining bottleneck. A 300GB instance
> > should in theory finish replicating in 300*1024/100 (125 being max
> > theoretical throughput, but 100-110 more realistic) = 3072s, i.e. about
> > 50 minutes.
> >
>
> Remember that the default ganeti configuration gives drbd a maximum target
> replication rate of 60MiB/sec (61440 KiB/sec), which makes 5120s or 1h
> 25min.

Correct, I forgot about that.

> Presumably this is to avoid fully flooding a 1G network when one VM
> is replicating.

Actually, the way way back reason to choose this number was dual-fold:
network speed was one thing, but also the speed of a single SATA HDD
when doing two I/O streams (DRBD metadata + the actual data).

With 10G networks and use of raid, the number should most definitely be
tweaked; I'm sure though people who do run Ganeti beyond experiments
will do that.

> It's still not achieving even that, so you're right to look at iostat next
> (make sure to look on both source and destination sides). But if neither
> side is showing 100% utilisation of any disk, you can try cranking up the
> drbd limit a bit.

My guess is that the secondary will show much higher disk utilisation,
possibly near 100%. Curious to see the actual data.

iustin

Phil Regnauld

unread,

Nov 30, 2016, 4:37:01 PM11/30/16

to gan...@googlegroups.com

Iustin Pop (iustin) writes:
>
> Actually, the way way back reason to choose this number was dual-fold:
> network speed was one thing, but also the speed of a single SATA HDD
> when doing two I/O streams (DRBD metadata + the actual data).
>
> With 10G networks and use of raid, the number should most definitely be
> tweaked; I'm sure though people who do run Ganeti beyond experiments
> will do that.

ISTR that setting some drbd specific paramters to achieve maximum
sync/resync broke instance creation. I'll try and dig up the thread.

Phil Regnauld

unread,

Dec 5, 2016, 5:06:27 PM12/5/16

to gan...@googlegroups.com

Ah:

https://groups.google.com/d/msg/ganeti/9X2J9zaH694/5DA_Ur5dBgAJ

That whole thread is interesting.

Reply all

Reply to author

Forward