Performance Tuning in Virtual Lab Environments - Looking for advice

383 views
Skip to first unread message

Duncan Wannamaker

unread,
Jan 30, 2014, 9:54:38 AM1/30/14
to quadst...@googlegroups.com

Hello QuadStor supporters!  Thanks for making such an awesome product!

 

We have recently switched to QuadStor from OpenFiler as the iSCSI target of choice in our virtual lab environment used for VMware training.  The challenge I am having is optimizing the performance of the iSCSI target when running QuadStor as a VM.  I know this is highly dependent on the infrastructure supporting the VMs so I'm going to try and describe things for you the best that I can.

 

We run vCloud Director v5.1 to manage several clusters of ESXi v5.1 hosts.  Each host uses 20GBit Infiniband uplinks to an Oracle Fabric Interconnect. (formerly Xsigo)  This interconnect gives us the ability to virtualize network and storage adapters directly on our hosts.  This environment supports the creation of vApps that contain their own dynamically provisioned portgroups on a distributed vSwitch in vCenter.  Each vApp contains all of the VMs necessary for students to participate in official VMware courses that we host.   One of the VMs in this vApp is the QuadStor virtual storage appliance.

 

The QuadStor VM instance is configured with:

 

2x 2.5 Ghz CPUs

4GB of RAM

1 x 8GB system disk on LSI Logic SCSI adapter

1 x 100GB data volume on pvscsi Paravirtualized Storage Driver

2 x E1000 (1GBit) virtual networking (not bonded) - MTU limited to 1500 CentOS 6.5

QuadStor-Core-3.0.63

QuadStor-itf-3.0.63

 

The VM uses thin provisioning on the back end, but the VMs are consolidated so that the disks are flat and not linked-clones.  Due to limitations in the initial build, we are limited to 1500 MTU, so Jumbo frames is not possible at the present time. 

 

The problem is the performance numbers we are getting when cloning VMs from local storage on the nested ESXi hypervisor to the shared storage provided by QuadStor.  Here are some numbers that I have collected over the past few days after running some test:

 

This first set of tests was performed with the QuadStor and ESXi VMs being thin provisioning on the physical LUN:

 

Test 1: (control) Clone Win2k3 VM from ESXi01 local storage to QuadStor iSCSI LUN (TGT1) using 1GBit vmkernel with QuadStor DeDuplication Enabled = 35 seconds / 41,616 KBps (333 Mbit)

 

Test 2: Clone 1GB Win2k3 VM from ESXi02 local storage to QuadStor iSCSI LUN (TGT1) using 10GBit vmkernel with QuadStor DeDuplication Enabled = 29 seconds / 44,333 KBps (347 Mbit)

 

Test 3: Clone 1GB Win2k3 VM from ESXi01 local storage to QuadStor iSCSI LUN (TGT2) using 1 GBit vmkernel, with QuadStor DeDuplication Disabled = 35 seconds / 44,152 KBps (353 Mbit)

 

Test 4: Clone 1GB Win2k3-1 VM from ESXi02 local storage to QuadStor iSCSI LUN (TGT2) using 10GBit vmkernel, with QuadStor DeDuplication Disabled = 30 seconds / 49,575 KBps (396 Mbit)

 

After running this tests, I switched the disk backing for QuadStor and the ESXi hosts to thick lazy zeroed, I also tried using thick provisioning within the nested environment as well to see if using only thick provisioning would help:

 

Test 5 - Clone 1GB Win2k3 Thin Provisioned to QuadStor over 1GBit iSCSI: 73 seconds @19,007 KBps (152 Mbit) Test 6 - Clone 1GB Win2k3 Thick Provisioned to QuadStor over 1GBit iSCSI: 60 seconds @22,731 KBps (182 Mbit)

 

* Changed QuadStor to VMXNet3 to try and replicate the times from earlier

 

Test 7 - Clone 1GB Win2k3 Thin Provisioned to QuadStor over 1GBit iSCSI: 51 seconds @31,325 KBps (251 Mbit) Test 8 - Clone 1GB Win2k3 Thick Provisioned to QuadStor over 1Gbit iSCSI: 37 seconds @36,366 KBps (291 Mbit)

 

* Disabled DeDupe on QuadStor

 

Test 9 – Win2k3 Thin Provisioned to QuadStor over 1Gbit iSCSI with DeDuplication Disabled: 53 seconds @28,014 KBps (224 Mbit) Test 10 – Win2k3 Thick Provisioned to QuadStor over 1Gbit iSCSI with Deduplication Disabled: 36 seconds @36,997 KBps (296 Mbit)

 

 

In all cases, I was only about to get about a maximum of 50MB/sec when writing to the QuadStor LUNs.  Statistics on the SAN show there are almost 7000 IOPs required to clone the 1GB image, which is a lot!  Also, I don't think we are hitting the maximum of our SAN LUNs, as I can clone on the nested ESXi host's local storage in about half the time it takes to clone to the iSCSI target.

 

Any advice you guys can provide would be awesome.

 

Thanks,

Duncan

 

Reply all
Reply to author
Forward
0 new messages