Core-iSCSI and SW-RAID

mage

unread,

Jun 22, 2006, 8:45:23 AM6/22/06

to Core-iSCSI

Hello,

i would like to have a SW-RAID1 running on top of two iSCSI-channels
which are connected to 2 iSCSI-targets. I am able to create the
raid-array and everything looks good. However, if i disconnect one of
the links to the iSCSI-targets, the raid-computer gets locked up until
it gets again connection to the iSCSI-target. In my point of view it
should also run if there is only one connection available.

Is there a way to force the iSCSI-channel to go _automatically_ offline
(like 'initiator-ctl forcechanoffline channel=0') if it losts
connection to the iSCSI-target? I found no other way to let the
raid-array know that one of the iSCSI-discs is no longer available.

Thanks for help,

Markus

Edwin Clubb

unread,

Jun 22, 2006, 1:16:33 PM6/22/06

to Core-...@googlegroups.com

Markus,

I'm not sure how RAID1 on top of iSCSI is really beneficial. It seems to me that you want
RAID1 (or higher) on the target's DAS drives and have a failover mechanism on the
inititiator. Core-iscsi currently provides portal target group failover, so that if the
primary network connection to the target portal is broken the session can be restarted on
a second network interface to a different portal on the same target. I have done this for
our systems. To make this work, your target needs to advertise at least two portals on
different subnets, and your initiator must have a network interface on each subnet.

Assuming your target has portals on 192.168.1.100 and 192.168.2.100, and your initiator
has interfaces at 192.168.1.10 and 192.168.2.10, then your channel definition would look
like this:

CHANNEL="0 1 NULL 192.168.1.100 3260 0
AuthMethod=None;MaxRecvDataSegmentLength=65536;ImmediateData=Yes
nopout_timeout=5;tpgfailover=1;tpgfailover_login_retries=5 iqn.target.name"

Core-iscsi will build an initial connection to portal 192.168.1.100 on the 192.168.1.10
interface. If the initiator session fails on this connection and cannot be reestablished
after 5 retries, core-iscsi will try to open a session to the 192.168.2.100 portal on the
192.168.2.10 interface. Note that you must specify NULL for the interface name for this
to work, so that core-iscsi can pick whichever interface it wants to use. Also note that
5 is the minimum value that core-iscsi allows you to set for the tpgfailover_login_retries
parameter. This the failover process can take awhile (about 30 seconds on my systems).

I'm hoping that core-iscsi will eventually support ERL=2 on a MC/S configuration which
would provide for better redundancy (not to mention performance). At the moment, you can
either have MC/S but no failover, or TPG failover on SC/S, but not both.

Ed

-------------------------------------
Edwin Clubb
VMTH Computer Services, UC Davis
One Shields Ave
Davis, CA 95616
Voice: 530/752-5996, FAX: 530/752-4743

William Studenmund

unread,

Jun 22, 2006, 9:33:09 PM6/22/06

to Core-...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Jun 22, 2006, at 10:16 AM, Edwin Clubb wrote:

> Markus,
>
> I'm not sure how RAID1 on top of iSCSI is really beneficial. It
> seems to me that you want
> RAID1 (or higher) on the target's DAS drives and have a failover
> mechanism on the
> inititiator. Core-iscsi currently provides portal target group
> failover, so that if the

Actually, iSCSI level RAID can be very beneficial, just against other
failures. You're right that RAID on the DAS drives will help with
disk failure.

iSCSI-level RAID will, however, help with storage box failure. If you
are running an application where you can't tolerate disk array
failures, RAID will help.

Take care,

Bill
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFEm0TaDJT2Egh26K0RAhpVAJwI/sHVpcRuWvkPgdYsFNkXSA4zLACeJaNS
obObxz4PskUFMAHDHReV91E=
=UoXi
-----END PGP SIGNATURE-----

mage

unread,

Jun 23, 2006, 3:38:29 AM6/23/06

to Core-iSCSI

Hi Edwin,

Edwin Clubb schrieb:

> I'm not sure how RAID1 on top of iSCSI is really beneficial. It seems to me that you want
> RAID1 (or higher) on the target's DAS drives and have a failover mechanism on the
> inititiator. Core-iscsi currently provides portal target group failover, so that if the

We have 2 different tagets running on two different locations. I want
to deploy the raid1 array to be able to shutdown on of the targets
without completly disabling the disc.

Yesterday i read about the same problem in the linux-raid newsgroup.
Unfortunately the solution is not very detailed described:

"With regards to the problem I was having with node failures, at least

with iSCSI the solution was setting a timeout so that a "disk failed"
error was actually returned - by default the iSCSI initiator assumes
any
disconnection errors are network-related and transient, so it simply
stops any IO to the iSCSI target until it reappears. Now that I've
specified a timeout, node "failures" behave as expected and the mirror

goes into degraded mode."

What timer/timeout is he talking about? I was testing with several
settings, but was unable to succeed.

Greetings,

Markus

Edwin Clubb

unread,

Jun 23, 2006, 8:33:24 PM6/23/06

to Core-...@googlegroups.com

Markus,

I'm just guessing, but he may be referring to a tcp stack timout setting. It seems to me
that for this to work, the scsi timeouts would need to be shorter that the network layer
timeouts, so that the RAID layer sees a scsi error before the initiator notices that there
is a network problem.

Ed

-------------------------------------

Nicholas A. Bellinger

unread,

Jun 24, 2006, 5:31:36 AM6/24/06

to Core-...@googlegroups.com

Greetings all,

Core-iSCSI gives two options in channel communication failures:

1) Failures of all SCSI commands with a non-recoverable status to SCSI.
This will take RAID elements offline and can currently be done with
init.d/initiator pause + initiator-ctl forcechanoffline. Note that a
channel currently needs to be released with initiator-ctl freechan after
iSCSI LUN IO operations have been failed from the iSCSI LUN in question.
Looking into pausing/failing tasks based on iSCSI LUN might prove to be
useful.

2) Failures of SPC TEST_UNIT_READY requests that are issued from a
user-level daemon (dm-multipath). These failures are generated by
timeouts in user-level daemon or failures back to SCSI so device-mapper
can perform mirror operations on device-mapper blockdevices. These
failures will be coming from struct scsi_device that is referencing
major/minors that are passed to 'dmsetup create $DM_BLOCKDEV'.

There is an example online:
http://www.redhat.com/archives/dm-devel/2005-April/msg00013.html

--nab

Nicholas A. Bellinger

unread,

Jun 24, 2006, 6:11:16 AM6/24/06

to Core-...@googlegroups.com

On Thu, 2006-06-22 at 05:45 -0700, mage wrote:
> Hello,
>
> i would like to have a SW-RAID1 running on top of two iSCSI-channels
> which are connected to 2 iSCSI-targets. I am able to create the
> raid-array and everything looks good. However, if i disconnect one of
> the links to the iSCSI-targets, the raid-computer gets locked up until
> it gets again connection to the iSCSI-target. In my point of view it
> should also run if there is only one connection available.
>

Feel free to provide some examples with init.d/initiator status and the
scenarios you have described. This setup is across two channels with
two network portals, correct?

Something else that would be useful to check is the status of the
connections on both channels, and obvious bit about subnets and being
able to ping the network portal that the initiator node still has fabric
access to.

--nab

--
Nicholas A. Bellinger <n...@kernel.org>

mage

unread,

Jun 26, 2006, 4:48:00 AM6/26/06

to Core-iSCSI

Hi Nicholas,

Nicholas A. Bellinger schrieb:

> Feel free to provide some examples with init.d/initiator status and the
> scenarios you have described. This setup is across two channels with
> two network portals, correct?

I have one iscsi-initiator-machine with two network interfaces. One
interface is connected to ip-subnet A (192.168.1.0), the other
interface is connected to ip-subnet B (192.168.2.0). On ip-subnet A is
one iscsi-target available and on ip-subnet B is one iscsi-target
available. These two targets are completly autonomous (they have no
connection between them).

I can mount the iscsi-shares on my initiator-machine:

CHANNEL="0 1 eth1 192.168.1.2 3260 0
AuthMethod=None;MaxRecvDataSegmentLength=8192 nopout_timeout=5
iqn.2006-04.de.lewtelnet:storage.test.disk1"
CHANNEL="1 1 eth2 192.168.2.2 3260 0
AuthMethod=None;MaxRecvDataSegmentLength=8192 nopout_timeout=5
iqn.2006-04.de.lewtelnet:storage.test.disk2"

Now i create a raid-array with:

mdadm --create /dev/md0 --level=raid1 --raid-devices=2 /dev/sdb
/dev/sdc

everything looks good:
mdadm -D /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Wed Jun 21 14:49:28 2006
Raid Level : raid1
Array Size : 3903680 (3.72 GiB 4.00 GB)
Device Size : 3903680 (3.72 GiB 4.00 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Mon Jun 26 07:49:53 2006
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

UUID : bebc73f2:2613674e:cf91b4e1:237389fb
Events : 0.31060

Number Major Minor RaidDevice State
1 3 65 1 active sync /dev/sdb
2 22 1 2 active sync /dev/sdc

When i now disconnect one of the network-interfaces on the
initiator-machine, the machine "hangs" until i put the plug again into
the interface. The initiator shows the following messages:

Jun 26 09:58:49 test1 kernel: iscsi_handle_netif_timeout:1765:
***ERROR*** Detected PHY loss on Network Interface: eth1 for iSCSI CID:
0 on SID: 1908
Jun 26 09:58:49 test1 kernel: iCHANNEL[8] - Performing Cleanup for
failed iSCSI CID: 0 to iqn.2006-04.de.lewtelnet:storage.test.disk1
Jun 26 09:58:49 test1 kernel: iCHANNEL[8] - Decremented iSCSI
connection count to 0 to node:
iqn.2006-04.de.lewtelnet:storage.test.disk1
Jun 26 09:58:49 test1 kernel: iCHANNEL[8] - released iSCSI session to
node: iqn.2006-04.de.lewtelnet:storage.test.disk1
Jun 26 09:58:49 test1 kernel: iSCSI Core Stack[1] - Decremented number
of active iSCSI sessions to 8

What works:
-> If i disable the LUN on one of the target machines, the initiator
claims about scsi io errors and the raid-array goes then (correctly)
into degraded mode.
-> if i disable the iscsi-channel with "initiator-ctl forceoffline",
the initiator claims about scsi io errors and the raid-array goes then
(correctly) into degraded mode.

In my point of view, something between the initiator and the
scsi-subsystem is not working as i expect it (or i am unable to
configure it :). It would be an nice feature if i could specify the way
in which network errors are handled. Something like a timeout to
specify when scsi-io-errors should be returned.