Sorry for the delay. In the meanwhile I tore down the server and
re-configured it using ethernet bonding. It worked, according to
iozone, provided moderately better throughput than the single
connection I got before. Moderately. Measurably. Not significantly.
I tore it down after that and reconfigured again using MPIO, and funny
enough, this time it worked. I can access the lun now using two
devices (sdb and sdd), and both ethernet devices that connect to iscsi
show traffic.
The weird thing is that aside from writing bonding was measurably
faster than MPIO. Does that seem right?
Here's the dmesg, if that lends any clues. Thanks for any input!
--Kyle
==== 156 lines of dmesg follows ====
cxgb3i: tag itt 0x1fff, 13 bits, age 0xf, 4 bits.
iscsi: registered transport (cxgb3i)
device-mapper: table: 253:6: multipath: error getting device
device-mapper: ioctl: error adding target to table
device-mapper: table: 253:6: multipath: error getting device
device-mapper: ioctl: error adding target to table
Broadcom NetXtreme II CNIC Driver cnic v2.0.0 (March 21, 2009)
cnic: Added CNIC device: eth0
cnic: Added CNIC device: eth1
cnic: Added CNIC device: eth2
cnic: Added CNIC device: eth3
Broadcom NetXtreme II iSCSI Driver bnx2i v2.0.1e (June 22, 2009)
iscsi: registered transport (bnx2i)
scsi3 : Broadcom Offload iSCSI Initiator
scsi4 : Broadcom Offload iSCSI Initiator
scsi5 : Broadcom Offload iSCSI Initiator
scsi6 : Broadcom Offload iSCSI Initiator
iscsi: registered transport (tcp)
iscsi: registered transport (iser)
bnx2: eth0: using MSIX
ADDRCONF(NETDEV_UP): eth0: link is not ready
bnx2i: iSCSI not supported, dev=eth0
bnx2i: iSCSI not supported, dev=eth0
bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive &
transmit flow control ON
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
bnx2: eth2: using MSIX
ADDRCONF(NETDEV_UP): eth2: link is not ready
bnx2i: iSCSI not supported, dev=eth2
bnx2i: iSCSI not supported, dev=eth2
bnx2: eth2 NIC Copper Link is Up, 1000 Mbps full duplex
ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
bnx2: eth3: using MSIX
ADDRCONF(NETDEV_UP): eth3: link is not ready
bnx2i: iSCSI not supported, dev=eth3
bnx2i: iSCSI not supported, dev=eth3
bnx2: eth3 NIC Copper Link is Up, 1000 Mbps full duplex
ADDRCONF(NETDEV_CHANGE): eth3: link becomes ready
eth0: no IPv6 routers present
eth2: no IPv6 routers present
scsi7 : iSCSI Initiator over TCP/IP
scsi8 : iSCSI Initiator over TCP/IP
scsi9 : iSCSI Initiator over TCP/IP
scsi10 : iSCSI Initiator over TCP/IP
Vendor: DGC Model: RAID 5 Rev: 0429
Type: Direct-Access ANSI SCSI revision: 04
sdb : very big device. try to use READ CAPACITY(16).
SCSI device sdb: 7693604864 512-byte hdwr sectors (3939126 MB)
sdb: Write Protect is off
sdb: Mode Sense: 7d 00 00 08
SCSI device sdb: drive cache: write through
sdb : very big device. try to use READ CAPACITY(16).
SCSI device sdb: 7693604864 512-byte hdwr sectors (3939126 MB)
sdb: Write Protect is off
sdb: Mode Sense: 7d 00 00 08
SCSI device sdb: drive cache: write through
sdb:<5> Vendor: DGC Model: RAID 5 Rev: 0429
Type: Direct-Access ANSI SCSI revision: 04
Vendor: DGC Model: RAID 5 Rev: 0429
Type: Direct-Access ANSI SCSI revision: 04
sdc : very big device. try to use READ CAPACITY(16).
SCSI device sdc: 7693604864 512-byte hdwr sectors (3939126 MB)
sdc: test WP failed, assume Write Enabled
sdc: asking for cache data failed
sdc: assuming drive cache: write through
Vendor: DGC Model: RAID 5 Rev: 0429
Type: Direct-Access ANSI SCSI revision: 04
sdc : very big device. try to use READ CAPACITY(16).
SCSI device sdc: 7693604864 512-byte hdwr sectors (3939126 MB)
sdc: test WP failed, assume Write Enabled
sde : very big device. try to use READ CAPACITY(16).
sdc: asking for cache data failed
sdc: assuming drive cache: write through
sdc:<5>SCSI device sde: 7693604864 512-byte hdwr sectors (3939126 MB)
sd 8:0:0:0: Device not ready: <6>: Current: sense key: Not Ready
Add. Sense: Logical unit not ready, manual intervention required
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
sd 8:0:0:0: Device not ready: <6>: Current: sense key: Not Ready
Add. Sense: Logical unit not ready, manual intervention required
end_request: I/O error, dev sdc, sector 0
... that repeats a bunch of times ....
Buffer I/O error on device sdc, logical block 0
unable to read partition table
sdd : very big device. try to use READ CAPACITY(16).
sd 8:0:0:0: Attached scsi disk sdc
SCSI device sdd: 7693604864 512-byte hdwr sectors (3939126 MB)
sd 8:0:0:0: Attached scsi generic sg4 type 0
unknown partition table
sd 7:0:0:0: Attached scsi disk sdb
sd 7:0:0:0: Attached scsi generic sg5 type 0
sdd: Write Protect is off
sdd: Mode Sense: 7d 00 00 08
SCSI device sdd: drive cache: write through
sdd : very big device. try to use READ CAPACITY(16).
SCSI device sdd: 7693604864 512-byte hdwr sectors (3939126 MB)
sdd: Write Protect is off
sdd: Mode Sense: 7d 00 00 08
SCSI device sdd: drive cache: write through
sdd: unknown partition table
sd 9:0:0:0: Attached scsi disk sdd
sd 9:0:0:0: Attached scsi generic sg6 type 0
sde: test WP failed, assume Write Enabled
sde: asking for cache data failed
sde: assuming drive cache: write through
sde:<6>sd 10:0:0:0: Device not ready: <6>: Current: sense key: Not Ready
Add. Sense: Logical unit not ready, manual intervention required
end_request: I/O error, dev sde, sector 0
Buffer I/O error on device sde, logical block 0
sd 10:0:0:0: Device not ready: <6>: Current: sense key: Not Ready
Add. Sense: Logical unit not ready, manual intervention required
end_request: I/O error, dev sde, sector 0
... that repeats a bunch of times ....
unable to read partition table
sd 10:0:0:0: Attached scsi disk sde
sd 10:0:0:0: Attached scsi generic sg7 type 0
sd 8:0:0:0: Device not ready: <6>: Current: sense key: Not Ready
Add. Sense: Logical unit not ready, manual intervention required
end_request: I/O error, dev sdc, sector 7693604736
... that repeats a bunch of times, for both sde and sdc ....
device-mapper: multipath emc: version 0.0.3 loaded
device-mapper: multipath emc: long trespass command will be send
device-mapper: multipath emc: honor reservation bit will not be set (default)
device-mapper: multipath: Using dm hw handler module emc for
failover/failback and device management.
device-mapper: multipath emc: long trespass command will be send
device-mapper: multipath emc: honor reservation bit will not be set (default)
device-mapper: multipath: Using dm hw handler module emc for
failover/failback and device management.
device-mapper: multipath emc: long trespass command will be send
device-mapper: multipath emc: honor reservation bit will not be set (default)
device-mapper: multipath: Using dm hw handler module emc for
failover/failback and device management.
device-mapper: multipath emc: long trespass command will be send
device-mapper: multipath emc: honor reservation bit will not be set (default)
device-mapper: multipath: Using dm hw handler module emc for
failover/failback and device management.
device-mapper: multipath emc: emc_pg_init: sending switch-over command
.... non related stuff ....
Buffer I/O error on device sdc, logical block 0
Buffer I/O error on device sdc, logical block 1
Buffer I/O error on device sdc, logical block 2
Buffer I/O error on device sdc, logical block 3
sd 8:0:0:0: Device not ready: <6>: Current: sense key: Not Ready
Add. Sense: Logical unit not ready, manual intervention required
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
If you have just a single iscsi connection/login from the initiator to the
target, then you'll have only one tcp connection, and that means bonding
won't help you at all - you'll be only able to utilize one link of the
bond.
bonding needs multiple tcp/ip connections for being able to give more
bandwidth.
> I tore it down after that and reconfigured again using MPIO, and funny
> enough, this time it worked. I can access the lun now using two
> devices (sdb and sdd), and both ethernet devices that connect to iscsi
> show traffic.
>
> The weird thing is that aside from writing bonding was measurably
> faster than MPIO. Does that seem right?
>
That seems a bit weird.
How did you configure multipath? Please paste your multipath settings.
-- Pasi
That's what I thought, but I figured it was one of the following three
possibilities:
MPIO was (mis)configured and using more overhead than bonding
OR the initiator was firing multiple concurrent requests (which you
say it doesn't, I'll believe you)
OR the san was under massively different load between the test runs
(not too likely, but possible. Only one other lun is in use).
> That seems a bit weird.
That's what I thought, otherwise I would have just gone with it.
> How did you configure multipath? Please paste your multipath settings.
>
> -- Pasi
Here's the /etc/multipath.conf. Where there other config options that
you'd need to see?
devnode_blacklist {
devnode "^sda[0-9]*"
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z][[0-9]*]"
devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}
devices {
device {
vendor "EMC "
product "SYMMETRIX"
path_grouping_policy multibus
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
path_selector "round-robin 0"
features "0"
hardware_handler "0"
failback immediate
}
device {
vendor "DGC"
product "*"
path_grouping_policy group_by_prio
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout "/sbin/mpath_prio_emc /dev/%n"
hardware_handler "1 emc"
features "1 queue_if_no_path"
no_path_retry 300
path_checker emc_clariion
failback immediate
}
}
With MPIO are you seeing the same throughput you would see if you only
used one path at a time?
With bonding I saw, on average 99-100% of the speed (worst case 78%)
of a single path.
With MPIO (2 nics) I saw, on average 82% of the speed (worst case 66%)
of the single path.
With MPIO with one nic (ifconfig downed the second), I saw, on average
86% of the speed (worst case 66%) of the single path.
There were situations where bonding and MPIO both scored slightly
higher than the single path, but that is most likely due to
differences on the array, since the tests weren't run back to back.
--Kyle