Creating instances takes a very long time using drbd disk-template

203 views
Skip to first unread message

Bukar

unread,
Nov 29, 2016, 2:51:01 PM11/29/16
to ganeti, lord...@gmail.com, b.ca...@pobox.com

I have a two node cluster both on dl380 gen 9 servers, with the same configuration. They are both running  Ubuntu 16.04.1 OS on raid 5 with ganeti version 2.15 installed and kvm as the hypervisor.

 

gnt-node list

Node     DTotal DFree MTotal MNode MFree Pinst Sinst

host1     1.7T  1.4T  23.4G  1.4G 21.9G     1     0

host2      1.7T  1.4T  23.4G  270M 22.9G     0     1

 

I have tried creating instances with a drbd disk of 150GB, 200GB and now 300GB the list time it took me to create an instance with 150GB was 5 hours 30 minutes.  And for the instance with 300GB 12 hours as can be seen below:

root@host1:~# gnt-instance add -t drbd -o noop     -s 300G -B minmem=5120M,maxmem=8192M     -n host1.:host2.     --no-start --no-name-check --no-ip-check    clientA

Tue Nov 29 18:24:31 2016 * creating instance disks...

Tue Nov 29 18:24:34 2016 adding instance clientA to cluster config

Tue Nov 29 18:24:34 2016 adding disks to cluster config

Tue Nov 29 18:24:34 2016  - INFO: Waiting for instance clientA to sync disks

Tue Nov 29 18:24:35 2016  - INFO: - device disk/0:  0.10% done, 12h 8m 9s remaining (estimated)

Tue Nov 29 18:25:35 2016  - INFO: - device disk/0:  0.20% done, 12h 7m 10s remaining (estimated)

Tue Nov 29 18:26:35 2016  - INFO: - device disk/0:  0.30% done, 12h 3m 58s remaining (estimated)

Tue Nov 29 18:27:36 2016  - INFO: - device disk/0:  0.50% done, 11h 54m 12s remaining (estimated)

Tue Nov 29 18:28:36 2016  - INFO: - device disk/0:  0.60% done, 12h 4m 15s remaining (estimated)

Tue Nov 29 18:29:36 2016  - INFO: - device disk/0:  0.70% done, 12h 53m 25s remaining (estimated)

Tue Nov 29 18:30:36 2016  - INFO: - device disk/0:  0.90% done, 12h 30m 6s remaining (estimated)

Tue Nov 29 18:31:36 2016  - INFO: - device disk/0:  1.00% done, 12h 12m 58s remaining (estimated)

Tue Nov 29 18:32:37 2016  - INFO: - device disk/0:  1.10% done, 12h 0m 21s remaining (estimated)

Tue Nov 29 18:33:37 2016  - INFO: - device disk/0:  1.30% done, 11h 53m 26s remaining (estimated)

Tue Nov 29 18:34:37 2016  - INFO: - device disk/0:  1.40% done, 11h 42m 46s remaining (estimated)

Tue Nov 29 18:35:37 2016  - INFO: - device disk/0:  1.50% done, 12h 1m 25s remaining (estimated)

Tue Nov 29 18:36:37 2016  - INFO: - device disk/0:  1.70% done, 12h 12m 44s remaining (estimated)

Tue Nov 29 18:37:38 2016  - INFO: - device disk/0:  1.80% done, 12h 3m 31s remaining (estimated)

Tue Nov 29 18:38:38 2016  - INFO: - device disk/0:  1.90% done, 12h 27m 44s remaining (estimated)

 

I am using 3 physical network ports on the server, one  of which I configured as a bridge interface for the instances

I think it is not a normal, but I have configured about 7 different scenarios, with fresh installation of the OS using both Ubuntu 14.04 and 16.04 and different set of hard drives but with almost the same result as regards to the time it takes to create an instance.  Although, if I try creating an instance with plain disk template, it takes less than a minute to create 300GB. That is why I feel probably I am getting it wrong somewhere.

 

gnt-cluster info

Cluster name: ngcluster.

Cluster UUID: 459ea9d6-bef8-49fa-b0c1-2a685f1099bf

Creation time: 2016-11-29 15:46:49

Modification time: 2016-11-29 17:03:18

Master node: host1.

Architecture (this node): 64bits (x86_64)

Tags: (none)

Default hypervisor: kvm

Enabled hypervisors: kvm

Hypervisor parameters:

  kvm:

    acpi: True

    boot_order: disk

    cdrom2_image_path:

    cdrom_disk_type:

    cdrom_image_path:

    cpu_cores: 0

    cpu_mask: all

    cpu_sockets: 0

    cpu_threads: 0

    cpu_type:

    disk_aio: threads

    disk_cache: default

    disk_type: paravirtual

    floppy_image_path:

    initrd_path:

    kernel_args: ro

    kernel_path:

    keymap:

    kvm_extra:

    kvm_flag:

    kvm_path: /usr/bin/kvm

    machine_version:

    mem_path:

    migration_bandwidth: 32

    migration_caps:

    migration_downtime: 30

    migration_mode: live

    migration_port: 8102

    nic_type: paravirtual

    reboot_behavior: reboot

    root_path: /dev/vda1

    security_domain:

    security_model: none

    serial_console: True

    serial_speed: 38400

    soundhw:

    spice_bind:

    spice_image_compression:

    spice_ip_version: 0

    spice_jpeg_wan_compression:

    spice_password_file:

    spice_playback_compression: True

    spice_streaming_video:

    spice_tls_ciphers: HIGH:-DES:-3DES:-EXPORT:-DH

    spice_use_tls: False

    spice_use_vdagent: True

    spice_zlib_glz_wan_compression:

    usb_devices:

    usb_mouse:

    use_chroot: False

    use_localtime: False

    user_shutdown: False

    vga:

    vhost_net: False

    virtio_net_queues: 1

    vnc_bind_address: 0.0.0.0

    vnc_password_file:

    vnc_tls: False

    vnc_x509_path:

    vnc_x509_verify: False

    vnet_hdr: True

OS-specific hypervisor parameters:

OS parameters:

Hidden OSes:

Blacklisted OSes:

Cluster parameters:

  candidate pool size: 10

  maximal number of jobs running simultaneously: 20

  maximal number of jobs simultaneously tracked by the scheduler: 25

  mac prefix: aa:00:00

  master netdev: eno2

  master netmask: 32

  use external master IP address setup script: False

  lvm volume group: ngganeti

  lvm reserved volumes: (none)

  drbd usermode helper: /bin/true

  file storage path: /srv/ganeti/file-storage

  shared file storage path: /srv/ganeti/shared-file-storage

  gluster storage path: /var/run/ganeti/gluster

  maintenance of node health: False

  uid pool:

  default instance allocator: hail

  default instance allocator parameters:

  primary ip version: 4

  preallocation wipe disks: False

  OS search path: /srv/ganeti/os, /usr/local/lib/ganeti/os, /usr/lib/ganeti/os, /usr/share/ganeti/os

  ExtStorage Providers search path: /srv/ganeti/extstorage, /usr/local/lib/ganeti/extstorage, /usr/lib/ganeti/extstorage, /usr/share/ganeti/extstorage

  enabled disk templates: drbd, plain

  install image:

  instance communication network:

  zeroing image:

  compression tools:

    - gzip

    - gzip-fast

    - gzip-slow

  enabled user shutdown: False

Default node parameters:

  cpu_speed: 1

  exclusive_storage: False

  oob_program:

  ovs: False

  ovs_link:

  ovs_name: switch1

  spindle_count: 1

  ssh_port: 22

Default instance parameters:

  default:

    always_failover: False

    auto_balance: True

    maxmem: 128

    minmem: 128

    spindle_use: 1

    vcpus: 1

Default nic parameters:

  default:

    link: br0

    mode: bridged

    vlan:

Default disk parameters:

  blockdev:

  diskless:

  drbd:

    c-delay-target: 1

    c-fill-target: 0

    c-max-rate: 61440

    c-min-rate: 4096

    c-plan-ahead: 20

    data-stripes: 1

    disk-barriers: n

    disk-custom:

    dynamic-resync: False

    meta-barriers: False

    meta-stripes: 1

    metavg: ngganeti

    net-custom:

    protocol: C

    resync-rate: 61440

  ext:

    access: kernelspace

  file:

  gluster:

    access: kernelspace

    host: 127.0.0.1

    port: 24007

    volume: gv0

  plain:

    stripes: 1

  rbd:

    access: kernelspace

    pool: rbd

  sharedfile:

Instance policy - limits for instances:

  bounds specs:

    - max/0:

        cpu-count: 8

        disk-count: 16

        disk-size: 1048576

        memory-size: 32768

        nic-count: 8

        spindle-use: 12

      min/0:

        cpu-count: 1

        disk-count: 1

        disk-size: 1024

        memory-size: 128

        nic-count: 1

        spindle-use: 1

  std:

    cpu-count: 1

    disk-count: 1

    disk-size: 1024

    memory-size: 128

    nic-count: 1

    spindle-use: 1

  allowed disk templates: drbd, plain

  vcpu-ratio: 4

  spindle-ratio: 32

Data collectors:

  cpu-avg-load:

    active: True

    interval: 5.000s

  diskstats:

    active: True

    interval: 5.000s

  drbd:

    active: True

    interval: 5.000s

  inst-status-xen:

    active: True

    interval: 5.000s

  lv:

    active: True

    interval: 5.000s

  xen-cpu-avg-load:

    active: True

    interval: 5.000s

______________________

As can be seen in the gnt-verify command, I am presently having issues with ssh communication between the nodes, could it be the issue?

 

root@host1:~# gnt-cluster verify

Submitted jobs 76, 77

Waiting for job 76 ...

Tue Nov 29 18:21:22 2016 * Verifying cluster config

Tue Nov 29 18:21:22 2016 * Verifying cluster certificate files

Tue Nov 29 18:21:22 2016 * Verifying hypervisor parameters

Tue Nov 29 18:21:22 2016 * Verifying all nodes belong to an existing group

Waiting for job 77 ...

Tue Nov 29 18:21:23 2016 * Verifying group 'default'

Tue Nov 29 18:21:23 2016 * Gathering data (2 nodes)

Tue Nov 29 18:21:23 2016 * Gathering information about nodes (2 nodes)

Tue Nov 29 18:21:25 2016 * Gathering disk information (2 nodes)

Tue Nov 29 18:21:25 2016 * Verifying configuration file consistency

Tue Nov 29 18:21:25 2016   - ERROR: node: The public key file '/var/lib/ganeti/ganeti_pub_keys' does not exist. Consider running 'gnt-cluster renew-crypto --new-ssh-keys [--no-ssh-key-check]' to fix this.

Tue Nov 29 18:21:25 2016 * Verifying node status

Tue Nov 29 18:21:25 2016   - ERROR: node host1.: ssh communication with node 'host2.': ssh problem: Permission denied (publickey,password).\'r\n

Tue Nov 29 18:21:25 2016   - ERROR: node host1.: ssh communication with node 'host1.': ssh problem: Permission denied (publickey,password).\'r\n

Tue Nov 29 18:21:25 2016   - ERROR: node host2.: ssh communication with node 'host2.': ssh problem: Permission denied (publickey,password).\'r\n

Tue Nov 29 18:21:25 2016   - ERROR: node host2.: ssh communication with node 'host1.': ssh problem: Permission denied (publickey,password).\'r\n

Tue Nov 29 18:21:25 2016 * Verifying instance status

Tue Nov 29 18:21:25 2016 * Verifying orphan volumes

Tue Nov 29 18:21:25 2016 * Verifying N+1 Memory redundancy

Tue Nov 29 18:21:25 2016 * Other Notes

Tue Nov 29 18:21:25 2016 * Hooks Results

 

Kindly assist, I need to figure out what I have been doing wrong. I will really appreciate if I can find a way to configure drbd to work as it should.

 

Thank you

Iustin Pop

unread,
Nov 29, 2016, 3:39:36 PM11/29/16
to gan...@googlegroups.com, lord...@gmail.com, b.ca...@pobox.com
On 2016-11-29 11:51:01, Bukar wrote:
>
>
> I have a two node cluster both on dl380 gen 9 servers, with the same
> configuration. They are both running Ubuntu 16.04.1 OS on raid 5 with
> ganeti version 2.15 installed and kvm as the hypervisor.
>
>
>
> *gnt-node list *
>
> Node DTotal DFree MTotal MNode MFree Pinst Sinst
>
> host1 1.7T 1.4T 23.4G 1.4G 21.9G 1 0
>
> host2 1.7T 1.4T 23.4G 270M 22.9G 0 1
>
>
>
> I have tried creating instances with a drbd disk of 150GB, 200GB and now
> 300GB the list time it took me to create an instance with 150GB was 5 hours
> 30 minutes. And for the instance with 300GB 12 hours as can be seen below:
>
> *root@host1:~# gnt-instance add -t drbd -o noop -s 300G -B
> minmem=5120M,maxmem=8192M -n host1.:host2. --no-start
> --no-name-check --no-ip-check clientA*
>
> Tue Nov 29 18:24:31 2016 * creating instance disks...
>
> Tue Nov 29 18:24:34 2016 adding instance clientA to cluster config
>
> Tue Nov 29 18:24:34 2016 adding disks to cluster config
>
> Tue Nov 29 18:24:34 2016 - INFO: Waiting for instance clientA to sync disks
>
> Tue Nov 29 18:24:35 2016 - INFO: - device disk/0: 0.10% done, 12h 8m 9s
> remaining (estimated)
>
> Tue Nov 29 18:38:38 2016 - INFO: - device disk/0: 1.90% done, 12h 27m 44s
> remaining (estimated)
>
>
>
> I am using 3 physical network ports on the server, one of which I
> configured as a bridge interface for the instances
>
> I think it is not a normal, but I have configured about 7 different
> scenarios, with fresh installation of the OS using both Ubuntu 14.04 and
> 16.04 and different set of hard drives but with almost the same result as
> regards to the time it takes to create an instance. Although, if I try
> creating an instance with plain disk template, it takes less than a minute
> to create 300GB. That is why I feel probably I am getting it wrong
> somewhere.

The plain disk template does no heavy disk write, so it's kind of
expected.

A few questions:

- What is the network speed? You mention 3 NICs, but of which type?
- What's the network bandwidth that you can achieve between primary and
secondary, on the IPs used for replication?
- What is the disk usage on both the primary and secondary ("iostat -dkx
/dev/sd? 15 50", let it run and see the values at the end)?
- What are the node parameters (gnt-node info) for primary and
secondary?

regards,
iustin

candlerb

unread,
Nov 29, 2016, 4:10:18 PM11/29/16
to ganeti, lord...@gmail.com, b.ca...@pobox.com

I have tried creating instances with a drbd disk of 150GB, 200GB and now 300GB the list time it took me to create an instance with 150GB was 5 hours 30 minutes.  And for the instance with 300GB 12 hours as can be seen below:


One thing is that the drbd replication rate is intentionally limited:

# gnt-cluster info | grep -i rate
    c-max-rate: 61440
    c-min-rate: 4096
    resync-rate: 61440

But at 60MB/sec, your 300GB should take 5120 seconds = 1h 25m

So: (1) check the output of "cat /proc/drbd". It will show you the resync as it happens and the rate achieved.

(2) The resync may be slowed down if there are many other disk writes going on at the same time (which also need to be replicated. It may also be slowed down if you only have 100Mbps connectivity between your servers; that would limit you to about 12MB/sec.

Probably a good idea to monitor the utilisation on your network ports e.g. with a tool like cacti, librenms etc.

(3) If when you're creating a new instance you add the flag "--no-wait-for-sync" then the syncing will take place in the background. Of course you'll have no failover resilience until it has completed, but at least you can keep using your cluster.

(4) RAID5 gives generally very poor write performance. If you have a controller card with a working battery backup unit it may be acceptable.

Having said that, syncing should be quite efficient even on RAID5, but it does require that you are writing in full RAID stripes, and that your RAID array isn't doing lots of small writes at the same time. Small writes totally kill RAID5 performance.  If that's the problem then you'll probably want to change to RAID10.

I'm afraid there's no magic wand here: you need to measure your disk I/O and network I/O figures to work out what's going on.

Regards,

Brian.

Bukar

unread,
Nov 30, 2016, 9:02:48 AM11/30/16
to ganeti, lord...@gmail.com, b.ca...@pobox.com

- What is the network speed? You mention 3 NICs, but of which type?
           They are all Gigabyte ports  

- What's the network bandwidth that you can achieve between primary and
  secondary, on the IPs used for replication?
        
       

------------------------------------------------------------

Client connecting to 10.10.13.91, TCP port 5001

TCP window size: 85.0 KByte (default)

------------------------------------------------------------

[  3] local 10.10.13.94 port 47682 connected with 10.10.13.91 port 5001

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0-10.0 sec  78.9 MBytes  66.0 Mbits/sec

root@host2:~# iperf -c 10.10.13.91

 

One of the culprit is here!!, I noticed the switch ports I was using were configured with the speed of 100mbps I had to change it to 1000Mbps since they are all gigabyte ports.

 

See the output after the changes below:

------------------------------------------------------------

Client connecting to 10.10.13.91, TCP port 5001

TCP window size: 85.0 KByte (default)

------------------------------------------------------------

[  3] local 10.10.13.94 port 47684 connected with 10.10.13.91 port 5001

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0-10.0 sec  1.10 GBytes   943 Mbits/sec

 

Afterwards there was a 600% improvement as creating an instance of 300GB now takes 2hours 15 minutes WOW!!

 


- What is the disk usage on both the primary and secondary ("iostat -dkx
  /dev/sd? 15 50", let it run and see the values at the end)?

              

root@host1:~# iostat -dkx

Linux 4.4.0-47-generic (host1.)       11/30/2016      _x86_64_        (16 CPU)

 

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

sda               0.12   496.84   33.58  301.60 14264.50  3231.96   104.40     0.50    1.48    7.21    0.85   0.32  10.56

dm-0              0.00     0.00   11.47  791.23   446.66  3164.93     9.00    19.44   24.21    3.02   24.52   0.04   3.20

dm-1              0.00     0.00    0.55    0.63     2.52     2.54     8.59     0.00    4.03    3.80    4.23   2.46   0.29

dm-2              0.00     0.00   14.30    0.00 11542.07     0.00  1614.18     0.14    9.63    9.63    0.00   3.87   5.53

dm-3              0.00     0.00    0.29    0.39     1.61    11.15    37.62     0.00    1.94    4.21    0.29   0.10   0.01

drbd0             0.00     0.00   11.44  766.03   445.69  3058.16     9.01    57.84   74.40    3.04   75.47   0.05   4.03

drbd1             0.00     0.00    0.01    0.00     0.40     0.00    63.40     0.00    2.47    2.47    0.00   1.47   0.00

dm-4              0.00     0.00    2.61    0.00  2056.50     0.00  1573.63     0.03   12.23   12.23    0.00   4.33   1.13

dm-5              0.00     0.00    0.05    0.11     0.53    10.00   139.25     0.00    0.27    0.58    0.14   0.11   0.00

 

root@host1:~#

 

root@host2:~# iostat -dkx

Linux 4.4.0-47-generic (host2.)       11/30/2016      _x86_64_        (16 CPU)

 

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

sda               0.00     1.00    1.37  526.82    27.52 10311.46    39.15     0.04    0.08   12.63    0.05   0.03   1.57

dm-0              0.00     0.00    0.01  515.62     0.20  2062.49     8.00     0.02    0.04    6.35    0.04   0.02   1.06

dm-1              0.00     0.00    0.36    0.00     1.64     0.00     9.22     0.00    1.22    1.23    0.00   0.15   0.01

dm-2              0.00     0.00    0.01    9.30     0.25  7531.83  1617.38     0.00    0.41    1.04    0.41   0.41   0.38

dm-3              0.00     0.00    0.19    0.26     1.04     7.28    37.58     0.00    0.32    0.43    0.24   0.04   0.00

dm-4              0.00     0.00    0.01    0.83     0.15   691.06  1657.22     0.00    0.49    6.46    0.45   0.46   0.04

dm-5              0.00     0.00    0.03    0.06     0.29     6.48   158.49     0.00    0.32    0.27    0.35   0.11   0.00

 


- What are the node parameters (gnt-node info) for primary and
  secondary?
        

gnt-node info host1

- Node name: host1

  primary ip: 10.10.13.91

  secondary ip: 10.10.13.92

  master candidate: True

  drained: False

  offline: False

  master_capable: True

  vm_capable: True

  primary for instances:

    - Clientb

    - ClientC

    - ClientA

    - stat

  secondary for instances:

  node parameters:

    cpu_speed: default (1)

    exclusive_storage: default (False)

    oob_program: default ()

    ovs: default (False)

    ovs_link: default ()

    ovs_name: default (switch1)

    spindle_count: default (1)

    ssh_port: default (22)

 

 

gnt-node info host1

- Node name: host1.

  primary ip: 10.10.13.91

  secondary ip: 10.10.13.92

  master candidate: True

  drained: False

  offline: False

  master_capable: True

  vm_capable: True

  primary for instances:

    - ClientA

    - Clientb

    - ClientC

    - stat

  secondary for instances:

  node parameters:

    cpu_speed: default (1)

    exclusive_storage: default (False)

    oob_program: default ()

    ovs: default (False)

    ovs_link: default ()

    ovs_name: default (switch1)

    spindle_count: default (1)

    ssh_port: default (22)


Thanks a lot

Bukar

unread,
Nov 30, 2016, 9:30:18 AM11/30/16
to ganeti, lord...@gmail.com, b.ca...@pobox.com
Hello Iustin and Brian,

Thanks a lot for your prompt response. We really appreciate the support you guys are providing to folks like myself.

Kudos to you guys!!.
#output of cat /proc/drbd

 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:42297588 nr:0 dw:42297588 dr:6079980 al:8318 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:5484 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:52428800 nr:0 dw:0 dr:52428832 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:52428800 nr:0 dw:0 dr:52428800 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

candlerb

unread,
Nov 30, 2016, 10:18:04 AM11/30/16
to ganeti, lord...@gmail.com, b.ca...@pobox.com
I noticed the switch ports I was using were configured with the speed of 100mbps I had to change it to 1000Mbps since they are all gigabyte ports.

For future reference: leave all switch ports at "auto negotiation" and you should be fine.

It's only 20-year-old Catalyst switches that don't do auto negotiation properly.



#output of cat /proc/drbd

 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:42297588 nr:0 dw:42297588 dr:6079980 al:8318 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:5484 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:52428800 nr:0 dw:0 dr:52428832 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:52428800 nr:0 dw:0 dr:52428800 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0



Cool, that shows everything is in sync (between this particular node and its partners)

"cat /proc/drbd" gives the info in a slightly more detailled form.

If you do this while you are creating a new instance you'll get an idea of the replication speed.

Iustin Pop

unread,
Nov 30, 2016, 1:50:21 PM11/30/16
to gan...@googlegroups.com, lord...@gmail.com, b.ca...@pobox.com
On 2016-11-30 06:02:48, Bukar wrote:
>
>
> > - What is the network speed? You mention 3 NICs, but of which type?
> >
> They are all Gigabyte ports
>
> - What's the network bandwidth that you can achieve between primary and
> > secondary, on the IPs used for replication?
> >
>
>
>
> ------------------------------------------------------------
>
> Client connecting to 10.10.13.91, TCP port 5001
>
> TCP window size: 85.0 KByte (default)
>
> ------------------------------------------------------------
>
> [ 3] local 10.10.13.94 port 47682 connected with 10.10.13.91 port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 3] 0.0-10.0 sec 78.9 MBytes 66.0 Mbits/sec
>
> root@host2:~# iperf -c 10.10.13.91
>
>
>
> One of the culprit is here!!, I noticed the switch ports I was using were
> configured with the speed of 100mbps I had to change it to 1000Mbps since
> they are all gigabyte ports.

Good find!

> See the output after the changes below:
>
> ------------------------------------------------------------
>
> Client connecting to 10.10.13.91, TCP port 5001
>
> TCP window size: 85.0 KByte (default)
>
> ------------------------------------------------------------
>
> [ 3] local 10.10.13.94 port 47684 connected with 10.10.13.91 port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 3] 0.0-10.0 sec 1.10 GBytes 943 Mbits/sec

Yes, this is more the expected speed.

> Afterwards there was a 600% improvement as creating an instance of 300GB
> now takes 2hours 15 minutes WOW!!

:)
Unfortunately you've provided the "iostat -dkx", not "iostat -dkx 15
20", so this only shows average values since boot.

> - What are the node parameters (gnt-node info) for primary and
> > secondary?
> >
>
>
> gnt-node info host1
>
> - Node name: host1
>
> primary ip: 10.10.13.91
>
> secondary ip: 10.10.13.92

This is very strange. Your primary and secondary IP are on the same
network, so it looks like you don't use a separate network for the
replication. If this is the case, just don't use a secondary IP.

I would recommend though to use a separate network for the replication.

So it would be interesting to still see, now that you solved the
networking problem, what is the remaining bottleneck. A 300GB instance
should in theory finish replicating in 300*1024/100 (125 being max
theoretical throughput, but 100-110 more realistic) = 3072s, i.e. about
50 minutes.

I would guess it is storage, but for confirming that we need the iostat
numbers while the disks are replicating.

regards,
iustin

- what is the

candlerb

unread,
Nov 30, 2016, 2:38:52 PM11/30/16
to ganeti, lord...@gmail.com, b.ca...@pobox.com
On Wednesday, 30 November 2016 18:50:21 UTC, Iustin Pop wrote:

So it would be interesting to still see, now that you solved the
networking problem, what is the remaining bottleneck. A 300GB instance
should in theory finish replicating in 300*1024/100 (125 being max
theoretical throughput, but 100-110 more realistic) = 3072s, i.e. about
50 minutes.

Remember that the default ganeti configuration gives drbd a maximum target replication rate of 60MiB/sec (61440 KiB/sec), which makes 5120s or 1h 25min. Presumably this is to avoid fully flooding a 1G network when one VM is replicating.

It's still not achieving even that, so you're right to look at iostat next (make sure to look on both source and destination sides). But if neither side is showing 100% utilisation of any disk, you can try cranking up the drbd limit a bit.

Iustin Pop

unread,
Nov 30, 2016, 3:23:39 PM11/30/16
to gan...@googlegroups.com
On 2016-11-30 11:38:52, candlerb wrote:
> On Wednesday, 30 November 2016 18:50:21 UTC, Iustin Pop wrote:
> >
> >
> > So it would be interesting to still see, now that you solved the
> > networking problem, what is the remaining bottleneck. A 300GB instance
> > should in theory finish replicating in 300*1024/100 (125 being max
> > theoretical throughput, but 100-110 more realistic) = 3072s, i.e. about
> > 50 minutes.
> >
>
> Remember that the default ganeti configuration gives drbd a maximum target
> replication rate of 60MiB/sec (61440 KiB/sec), which makes 5120s or 1h
> 25min.

Correct, I forgot about that.

> Presumably this is to avoid fully flooding a 1G network when one VM
> is replicating.

Actually, the way way back reason to choose this number was dual-fold:
network speed was one thing, but also the speed of a single SATA HDD
when doing two I/O streams (DRBD metadata + the actual data).

With 10G networks and use of raid, the number should most definitely be
tweaked; I'm sure though people who do run Ganeti beyond experiments
will do that.

> It's still not achieving even that, so you're right to look at iostat next
> (make sure to look on both source and destination sides). But if neither
> side is showing 100% utilisation of any disk, you can try cranking up the
> drbd limit a bit.

My guess is that the secondary will show much higher disk utilisation,
possibly near 100%. Curious to see the actual data.

iustin

Phil Regnauld

unread,
Nov 30, 2016, 4:37:01 PM11/30/16
to gan...@googlegroups.com
Iustin Pop (iustin) writes:
>
> Actually, the way way back reason to choose this number was dual-fold:
> network speed was one thing, but also the speed of a single SATA HDD
> when doing two I/O streams (DRBD metadata + the actual data).
>
> With 10G networks and use of raid, the number should most definitely be
> tweaked; I'm sure though people who do run Ganeti beyond experiments
> will do that.

ISTR that setting some drbd specific paramters to achieve maximum
sync/resync broke instance creation. I'll try and dig up the thread.

Phil Regnauld

unread,
Dec 5, 2016, 5:06:27 PM12/5/16
to gan...@googlegroups.com
Ah:

https://groups.google.com/d/msg/ganeti/9X2J9zaH694/5DA_Ur5dBgAJ

That whole thread is interesting.
Reply all
Reply to author
Forward
0 new messages