I have a two node cluster both on dl380 gen 9 servers, with the same configuration. They are both running Ubuntu 16.04.1 OS on raid 5 with ganeti version 2.15 installed and kvm as the hypervisor.
gnt-node list
Node DTotal DFree MTotal MNode MFree Pinst Sinst
host1 1.7T 1.4T 23.4G 1.4G 21.9G 1 0
host2 1.7T 1.4T 23.4G 270M 22.9G 0 1
I have tried creating instances with a drbd disk of 150GB, 200GB and now 300GB the list time it took me to create an instance with 150GB was 5 hours 30 minutes. And for the instance with 300GB 12 hours as can be seen below:
root@host1:~# gnt-instance add -t drbd -o noop -s 300G -B minmem=5120M,maxmem=8192M -n host1.:host2. --no-start --no-name-check --no-ip-check clientA
Tue Nov 29 18:24:31 2016 * creating instance disks...
Tue Nov 29 18:24:34 2016 adding instance clientA to cluster config
Tue Nov 29 18:24:34 2016 adding disks to cluster config
Tue Nov 29 18:24:34 2016 - INFO: Waiting for instance clientA to sync disks
Tue Nov 29 18:24:35 2016 - INFO: - device disk/0: 0.10% done, 12h 8m 9s remaining (estimated)
Tue Nov 29 18:25:35 2016 - INFO: - device disk/0: 0.20% done, 12h 7m 10s remaining (estimated)
Tue Nov 29 18:26:35 2016 - INFO: - device disk/0: 0.30% done, 12h 3m 58s remaining (estimated)
Tue Nov 29 18:27:36 2016 - INFO: - device disk/0: 0.50% done, 11h 54m 12s remaining (estimated)
Tue Nov 29 18:28:36 2016 - INFO: - device disk/0: 0.60% done, 12h 4m 15s remaining (estimated)
Tue Nov 29 18:29:36 2016 - INFO: - device disk/0: 0.70% done, 12h 53m 25s remaining (estimated)
Tue Nov 29 18:30:36 2016 - INFO: - device disk/0: 0.90% done, 12h 30m 6s remaining (estimated)
Tue Nov 29 18:31:36 2016 - INFO: - device disk/0: 1.00% done, 12h 12m 58s remaining (estimated)
Tue Nov 29 18:32:37 2016 - INFO: - device disk/0: 1.10% done, 12h 0m 21s remaining (estimated)
Tue Nov 29 18:33:37 2016 - INFO: - device disk/0: 1.30% done, 11h 53m 26s remaining (estimated)
Tue Nov 29 18:34:37 2016 - INFO: - device disk/0: 1.40% done, 11h 42m 46s remaining (estimated)
Tue Nov 29 18:35:37 2016 - INFO: - device disk/0: 1.50% done, 12h 1m 25s remaining (estimated)
Tue Nov 29 18:36:37 2016 - INFO: - device disk/0: 1.70% done, 12h 12m 44s remaining (estimated)
Tue Nov 29 18:37:38 2016 - INFO: - device disk/0: 1.80% done, 12h 3m 31s remaining (estimated)
Tue Nov 29 18:38:38 2016 - INFO: - device disk/0: 1.90% done, 12h 27m 44s remaining (estimated)
I am using 3 physical network ports on the server, one of which I configured as a bridge interface for the instances
I think it is not a normal, but I have configured about 7 different scenarios, with fresh installation of the OS using both Ubuntu 14.04 and 16.04 and different set of hard drives but with almost the same result as regards to the time it takes to create an instance. Although, if I try creating an instance with plain disk template, it takes less than a minute to create 300GB. That is why I feel probably I am getting it wrong somewhere.
gnt-cluster info
Cluster name: ngcluster.
Cluster UUID: 459ea9d6-bef8-49fa-b0c1-2a685f1099bf
Creation time: 2016-11-29 15:46:49
Modification time: 2016-11-29 17:03:18
Master node: host1.
Architecture (this node): 64bits (x86_64)
Tags: (none)
Default hypervisor: kvm
Enabled hypervisors: kvm
Hypervisor parameters:
kvm:
acpi: True
boot_order: disk
cdrom2_image_path:
cdrom_disk_type:
cdrom_image_path:
cpu_cores: 0
cpu_mask: all
cpu_sockets: 0
cpu_threads: 0
cpu_type:
disk_aio: threads
disk_cache: default
disk_type: paravirtual
floppy_image_path:
initrd_path:
kernel_args: ro
kernel_path:
keymap:
kvm_extra:
kvm_flag:
kvm_path: /usr/bin/kvm
machine_version:
mem_path:
migration_bandwidth: 32
migration_caps:
migration_downtime: 30
migration_mode: live
migration_port: 8102
nic_type: paravirtual
reboot_behavior: reboot
root_path: /dev/vda1
security_domain:
security_model: none
serial_console: True
serial_speed: 38400
soundhw:
spice_bind:
spice_image_compression:
spice_ip_version: 0
spice_jpeg_wan_compression:
spice_password_file:
spice_playback_compression: True
spice_streaming_video:
spice_tls_ciphers: HIGH:-DES:-3DES:-EXPORT:-DH
spice_use_tls: False
spice_use_vdagent: True
spice_zlib_glz_wan_compression:
usb_devices:
usb_mouse:
use_chroot: False
use_localtime: False
user_shutdown: False
vga:
vhost_net: False
virtio_net_queues: 1
vnc_bind_address: 0.0.0.0
vnc_password_file:
vnc_tls: False
vnc_x509_path:
vnc_x509_verify: False
vnet_hdr: True
OS-specific hypervisor parameters:
OS parameters:
Hidden OSes:
Blacklisted OSes:
Cluster parameters:
candidate pool size: 10
maximal number of jobs running simultaneously: 20
maximal number of jobs simultaneously tracked by the scheduler: 25
mac prefix: aa:00:00
master netdev: eno2
master netmask: 32
use external master IP address setup script: False
lvm volume group: ngganeti
lvm reserved volumes: (none)
drbd usermode helper: /bin/true
file storage path: /srv/ganeti/file-storage
shared file storage path: /srv/ganeti/shared-file-storage
gluster storage path: /var/run/ganeti/gluster
maintenance of node health: False
uid pool:
default instance allocator: hail
default instance allocator parameters:
primary ip version: 4
preallocation wipe disks: False
OS search path: /srv/ganeti/os, /usr/local/lib/ganeti/os, /usr/lib/ganeti/os, /usr/share/ganeti/os
ExtStorage Providers search path: /srv/ganeti/extstorage, /usr/local/lib/ganeti/extstorage, /usr/lib/ganeti/extstorage, /usr/share/ganeti/extstorage
enabled disk templates: drbd, plain
install image:
instance communication network:
zeroing image:
compression tools:
- gzip
- gzip-fast
- gzip-slow
enabled user shutdown: False
Default node parameters:
cpu_speed: 1
exclusive_storage: False
oob_program:
ovs: False
ovs_link:
ovs_name: switch1
spindle_count: 1
ssh_port: 22
Default instance parameters:
default:
always_failover: False
auto_balance: True
maxmem: 128
minmem: 128
spindle_use: 1
vcpus: 1
Default nic parameters:
default:
link: br0
mode: bridged
vlan:
Default disk parameters:
blockdev:
diskless:
drbd:
c-delay-target: 1
c-fill-target: 0
c-max-rate: 61440
c-min-rate: 4096
c-plan-ahead: 20
data-stripes: 1
disk-barriers: n
disk-custom:
dynamic-resync: False
meta-barriers: False
meta-stripes: 1
metavg: ngganeti
net-custom:
protocol: C
resync-rate: 61440
ext:
access: kernelspace
file:
gluster:
access: kernelspace
host: 127.0.0.1
port: 24007
volume: gv0
plain:
stripes: 1
rbd:
access: kernelspace
pool: rbd
sharedfile:
Instance policy - limits for instances:
bounds specs:
- max/0:
cpu-count: 8
disk-count: 16
disk-size: 1048576
memory-size: 32768
nic-count: 8
spindle-use: 12
min/0:
cpu-count: 1
disk-count: 1
disk-size: 1024
memory-size: 128
nic-count: 1
spindle-use: 1
std:
cpu-count: 1
disk-count: 1
disk-size: 1024
memory-size: 128
nic-count: 1
spindle-use: 1
allowed disk templates: drbd, plain
vcpu-ratio: 4
spindle-ratio: 32
Data collectors:
cpu-avg-load:
active: True
interval: 5.000s
diskstats:
active: True
interval: 5.000s
drbd:
active: True
interval: 5.000s
inst-status-xen:
active: True
interval: 5.000s
lv:
active: True
interval: 5.000s
xen-cpu-avg-load:
active: True
interval: 5.000s
______________________
As can be seen in the gnt-verify command, I am presently having issues with ssh communication between the nodes, could it be the issue?
root@host1:~# gnt-cluster verify
Submitted jobs 76, 77
Waiting for job 76 ...
Tue Nov 29 18:21:22 2016 * Verifying cluster config
Tue Nov 29 18:21:22 2016 * Verifying cluster certificate files
Tue Nov 29 18:21:22 2016 * Verifying hypervisor parameters
Tue Nov 29 18:21:22 2016 * Verifying all nodes belong to an existing group
Waiting for job 77 ...
Tue Nov 29 18:21:23 2016 * Verifying group 'default'
Tue Nov 29 18:21:23 2016 * Gathering data (2 nodes)
Tue Nov 29 18:21:23 2016 * Gathering information about nodes (2 nodes)
Tue Nov 29 18:21:25 2016 * Gathering disk information (2 nodes)
Tue Nov 29 18:21:25 2016 * Verifying configuration file consistency
Tue Nov 29 18:21:25 2016 - ERROR: node: The public key file '/var/lib/ganeti/ganeti_pub_keys' does not exist. Consider running 'gnt-cluster renew-crypto --new-ssh-keys [--no-ssh-key-check]' to fix this.
Tue Nov 29 18:21:25 2016 * Verifying node status
Tue Nov 29 18:21:25 2016 - ERROR: node host1.: ssh communication with node 'host2.': ssh problem: Permission denied (publickey,password).\'r\n
Tue Nov 29 18:21:25 2016 - ERROR: node host1.: ssh communication with node 'host1.': ssh problem: Permission denied (publickey,password).\'r\n
Tue Nov 29 18:21:25 2016 - ERROR: node host2.: ssh communication with node 'host2.': ssh problem: Permission denied (publickey,password).\'r\n
Tue Nov 29 18:21:25 2016 - ERROR: node host2.: ssh communication with node 'host1.': ssh problem: Permission denied (publickey,password).\'r\n
Tue Nov 29 18:21:25 2016 * Verifying instance status
Tue Nov 29 18:21:25 2016 * Verifying orphan volumes
Tue Nov 29 18:21:25 2016 * Verifying N+1 Memory redundancy
Tue Nov 29 18:21:25 2016 * Other Notes
Tue Nov 29 18:21:25 2016 * Hooks Results
Kindly assist, I need to figure out what I have been doing wrong. I will really appreciate if I can find a way to configure drbd to work as it should.
Thank you
I have tried creating instances with a drbd disk of 150GB, 200GB and now 300GB the list time it took me to create an instance with 150GB was 5 hours 30 minutes. And for the instance with 300GB 12 hours as can be seen below:
- What is the network speed? You mention 3 NICs, but of which type?
- What's the network bandwidth that you can achieve between primary and
secondary, on the IPs used for replication?
------------------------------------------------------------
Client connecting to 10.10.13.91, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.10.13.94 port 47682 connected with 10.10.13.91 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 78.9 MBytes 66.0 Mbits/sec
root@host2:~# iperf -c 10.10.13.91
One of the culprit is here!!, I noticed the switch ports I was using were configured with the speed of 100mbps I had to change it to 1000Mbps since they are all gigabyte ports.
See the output after the changes below:
------------------------------------------------------------
Client connecting to 10.10.13.91, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.10.13.94 port 47684 connected with 10.10.13.91 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.10 GBytes 943 Mbits/sec
Afterwards there was a 600% improvement as creating an instance of 300GB now takes 2hours 15 minutes WOW!!
- What is the disk usage on both the primary and secondary ("iostat -dkx
/dev/sd? 15 50", let it run and see the values at the end)?
root@host1:~# iostat -dkx
Linux 4.4.0-47-generic (host1.) 11/30/2016 _x86_64_ (16 CPU)
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.12 496.84 33.58 301.60 14264.50 3231.96 104.40 0.50 1.48 7.21 0.85 0.32 10.56
dm-0 0.00 0.00 11.47 791.23 446.66 3164.93 9.00 19.44 24.21 3.02 24.52 0.04 3.20
dm-1 0.00 0.00 0.55 0.63 2.52 2.54 8.59 0.00 4.03 3.80 4.23 2.46 0.29
dm-2 0.00 0.00 14.30 0.00 11542.07 0.00 1614.18 0.14 9.63 9.63 0.00 3.87 5.53
dm-3 0.00 0.00 0.29 0.39 1.61 11.15 37.62 0.00 1.94 4.21 0.29 0.10 0.01
drbd0 0.00 0.00 11.44 766.03 445.69 3058.16 9.01 57.84 74.40 3.04 75.47 0.05 4.03
drbd1 0.00 0.00 0.01 0.00 0.40 0.00 63.40 0.00 2.47 2.47 0.00 1.47 0.00
dm-4 0.00 0.00 2.61 0.00 2056.50 0.00 1573.63 0.03 12.23 12.23 0.00 4.33 1.13
dm-5 0.00 0.00 0.05 0.11 0.53 10.00 139.25 0.00 0.27 0.58 0.14 0.11 0.00
root@host1:~#
root@host2:~# iostat -dkx
Linux 4.4.0-47-generic (host2.) 11/30/2016 _x86_64_ (16 CPU)
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 1.00 1.37 526.82 27.52 10311.46 39.15 0.04 0.08 12.63 0.05 0.03 1.57
dm-0 0.00 0.00 0.01 515.62 0.20 2062.49 8.00 0.02 0.04 6.35 0.04 0.02 1.06
dm-1 0.00 0.00 0.36 0.00 1.64 0.00 9.22 0.00 1.22 1.23 0.00 0.15 0.01
dm-2 0.00 0.00 0.01 9.30 0.25 7531.83 1617.38 0.00 0.41 1.04 0.41 0.41 0.38
dm-3 0.00 0.00 0.19 0.26 1.04 7.28 37.58 0.00 0.32 0.43 0.24 0.04 0.00
dm-4 0.00 0.00 0.01 0.83 0.15 691.06 1657.22 0.00 0.49 6.46 0.45 0.46 0.04
dm-5 0.00 0.00 0.03 0.06 0.29 6.48 158.49 0.00 0.32 0.27 0.35 0.11 0.00
- What are the node parameters (gnt-node info) for primary and
secondary?
gnt-node info host1
- Node name: host1
primary ip: 10.10.13.91
secondary ip: 10.10.13.92
master candidate: True
drained: False
offline: False
master_capable: True
vm_capable: True
primary for instances:
- Clientb
- ClientC
- ClientA
- stat
secondary for instances:
node parameters:
cpu_speed: default (1)
exclusive_storage: default (False)
oob_program: default ()
ovs: default (False)
ovs_link: default ()
ovs_name: default (switch1)
spindle_count: default (1)
ssh_port: default (22)
gnt-node info host1
- Node name: host1.
primary ip: 10.10.13.91
secondary ip: 10.10.13.92
master candidate: True
drained: False
offline: False
master_capable: True
vm_capable: True
primary for instances:
- ClientA
- Clientb
- ClientC
- stat
secondary for instances:
node parameters:
cpu_speed: default (1)
exclusive_storage: default (False)
oob_program: default ()
ovs: default (False)
ovs_link: default ()
ovs_name: default (switch1)
spindle_count: default (1)
ssh_port: default (22)
Thanks a lot
Thanks a lot for your prompt response. We really appreciate the support you guys are providing to folks like myself.
Kudos to you guys!!.
#output of cat /proc/drbd
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:42297588 nr:0 dw:42297588 dr:6079980 al:8318 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:0 nr:0 dw:0 dr:5484 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:52428800 nr:0 dw:0 dr:52428832 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:52428800 nr:0 dw:0 dr:52428800 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
So it would be interesting to still see, now that you solved the
networking problem, what is the remaining bottleneck. A 300GB instance
should in theory finish replicating in 300*1024/100 (125 being max
theoretical throughput, but 100-110 more realistic) = 3072s, i.e. about
50 minutes.