I've set the replacement timeout high, so the iscsi root system should be
able to recover from the short outage if the iscsi target fails over to
another server:
node.session.timeo.replacement_timeout = 86400
Unfortunately, this doesn't always work. Sometimes the OS will report
filesystem errors and mount the fs read-only. A short time later the iscsi
targets will be reconnected, but the filesystem is already read-only by
then.
The logs show (default iscsi transport 724 was used for this test):
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sda, sector
1336006
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sda, sector
1336006
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166993
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166993
<more disk errors>
Apr 21 11:35:26 front003 kernel: ext3_abort called.
Apr 21 11:35:26 front003 kernel: ext3_abort called.
Apr 21 11:35:26 front003 kernel: EXT3-fs error (device sda1):
ext3_journal_start_sb: Detected aborted journal
Apr 21 11:35:26 front003 kernel: EXT3-fs error (device sda1):
ext3_journal_start_sb: Detected aborted journal
Apr 21 11:35:26 front003 kernel: Remounting filesystem read-only
Apr 21 11:35:26 front003 kernel: Remounting filesystem read-only
Apr 21 11:35:36 front003 kernel: connection1:0: iscsi: detected conn error
(1011)
Apr 21 11:35:36 front003 kernel: connection1:0: iscsi: detected conn error
(1011)
Apr 21 11:35:36 front003 iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
Apr 21 11:35:40 front003 kernel: connection5:0: iscsi: detected conn error
(1011)
Apr 21 11:35:40 front003 kernel: connection5:0: iscsi: detected conn error
(1011)
Apr 21 11:35:40 front003 kernel: connection8:0: iscsi: detected conn error
(1011)
Apr 21 11:35:40 front003 kernel: connection8:0: iscsi: detected conn error
(1011)
Apr 21 11:35:40 front003 iscsid: received iferror -38
Apr 21 11:35:40 front003 iscsid: received iferror -38
Apr 21 11:35:40 front003 iscsid: received iferror -38
Apr 21 11:35:40 front003 iscsid: received iferror -38
Apr 21 11:35:40 front003 iscsid: received iferror -38
Apr 21 11:35:40 front003 iscsid: connection1:0 is operational after
recovery (1 attempts)
Is there any way to prevent this, so a iscsi root system can recover
gracefully from a short outage?
Niels
Could you send the parts of the log before the fs errors? We want to see
how many conn errors there were and when they occured and when the
replacement/recovery timeout fired.
Could you also run
iscsiadm -m node -T target -p ip:port
for the root target and send the output?
And could you run
cat /sys/class/iscsi_session/session1/recovery_tmo
(session1 is the session for the root disk right. If not replace the one
with whatever number it is).
and send that outpout?
> Apr 21 11:35:40 front003 iscsid: connection1:0 is operational after
> recovery (1 attempts)
>
This is weird, because it only took one recovery attempt, so it looks
like it was a really short outage and should have been back within the
replacement you set.
> Is there any way to prevent this, so a iscsi root system can recover
> gracefully from a short outage?
You are using tgtd.
What are the causes of these disconnections? I guess manual caused by
you, but is it:
1) cabling, switches
2) firewall, routing etc.
3) tgtd restart?
--
Tomasz Chmielewski
http://wpkg.org
Anyway, if it's 3) tgtd restart - it's "by design", and you should
complain on stgt-devel mailing list. There were some changes in the git
lately, but AFAIK, it hasn't improved in all areas.
If it's either 1) or 2), there is something fishy here.
Supposing it's 3) - there is a slight race before you start tgtd and
before you can configure targets with tgtadm. Therefore, a "sleep 2s"
would be recommended. The rest is like below - tgtd already listens on
3260, but has no targets configured. Any initiator that connects will be
rejected, and hence your immediate fs errors.
One workaround is to block iSCSI traffic with iptables before starting
tgtd and removing the block after all targets are configured.
target initiator
-----------------------------------
not started trying to reconnect
start tgtd trying to reconnect
sleep 2s trying to reconnect
nothing configured login I/O error - non fatal
configure target1 conn to target1 OK
no such target conn to target2 FAIL
I/O error to target2
configure target2 too late, fatal, we lost it
In this case the failover is indeed manual for failover testing. The
failover process basically is:
server1: remove virtual IP
server1: remove luns from tgtd
server1: Make local DRBD device secondary
server2: Make DRBD device primary
server2: add luns to tgtd
server2: Add virtual IP
Since the iscsi initiators connect to the virtual IP, there can be no
network connectivity while the switch is in progress. This should prevent
any race conditions in tgtd.
The network trace looks normal, some retries to the VIP while the switch
is in progress, and a RST of the connection once the switch is done. The
initiator reconnects normally after the reset, but the damage has already
been done by then. (And it looks like the problems even start before the
RST of the connection)
Niels
Have you tried doing everything on server1:
server1: remove virtual IP
server1: remove luns from tgtd
server1: add luns to tgtd
server1: Add virtual IP
If it works, try running DRBD in multi-master mode, and do:
server1: remove virtual IP
server1: remove luns from tgtd
server2: add luns to tgtd
server2: Add virtual IP
Actually, I don't think removing luns in tgtd is supported when
initiators are still connected. And tgtd only reacts to a brutal "pkill
-9 tgtd".
> Since the iscsi initiators connect to the virtual IP, there can be no
> network connectivity while the switch is in progress. This should prevent
> any race conditions in tgtd.
Yep. Unless you omitted something important ;)
These were actually the first error messages for that test. The I/O errors
happen almost instantly, and it takes almost 10 seconds after that to
detect the actual connection error. (After which it almost immediately
reconnects). Possibly this has something to do with the way I do the
switchover? (Remove virtual IP address from one server and add it to the
other.)
>
> Could you also run
> iscsiadm -m node -T target -p ip:port
> for the root target and send the output?
#iscsiadm -m node -T iqn.2007-11.com.smys:storage.front003 -p 10.40.99.1:3260
iscsiadm: Config file line 47 too long.
node.name = iqn.2007-11.com.smys:storage.front003
node.tpgt = 1
node.startup = manual
iface.hwaddress = default
iface.iscsi_ifacename = default
iface.net_ifacename = default
iface.transport_name = tcp
node.discovery_address = 10.40.99.1
node.discovery_port = 3260
node.discovery_type = send_targets
node.session.initial_cmdsn = 0
node.session.initial_login_retry_max = 4
node.session.cmds_max = 32
node.session.queue_depth = 32
node.session.auth.authmethod = CHAP
node.session.auth.username = frontend
node.session.auth.password = ********
node.session.auth.username_in = <empty>
node.session.auth.password_in = <empty>
node.session.timeo.replacement_timeout = 86400
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.host_reset_timeout = 60
node.session.iscsi.FastAbort = Yes
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.DefaultTime2Wait = 2
node.session.iscsi.MaxConnections = 1
node.session.iscsi.MaxOutstandingR2T = 1
node.session.iscsi.ERL = 0
node.conn[0].address = 10.40.99.1
node.conn[0].port = 3260
node.conn[0].startup = manual
node.conn[0].tcp.window_size = 524288
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.login_timeout = 30
node.conn[0].timeo.auth_timeout = 45
node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0
node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
node.conn[0].iscsi.HeaderDigest = None,CRC32C
node.conn[0].iscsi.DataDigest = None
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No
Some values have changed since the log I send you:
node.session.cmds_max 128 -> 32
node.session.iscsi.InitialR2T Yes -> No
DefaultTime2Retain 20 -> 86400
DefaultTime2Wait 2 -> 10
>
> And could you run
>
> cat /sys/class/iscsi_session/session1/recovery_tmo
>
> (session1 is the session for the root disk right. If not replace the one
> with whatever number it is).
>
> and send that outpout?
#cat /sys/class/iscsi_session/session1/recovery_tmo
86400
>
>> Apr 21 11:35:40 front003 iscsid: connection1:0 is operational after
>> recovery (1 attempts)
>>
>
> This is weird, because it only took one recovery attempt, so it looks
> like it was a really short outage and should have been back within the
> replacement you set.
The outage is probably 15-20 seconds, while the target is switched over to
another server. The actual reconnect is to another identically configured
server/target. The weird thing is the timeline:
11:35:25 I/O errors
11:35:36 connection error detected
11:35:40 reconnected
Niels
I'm using the "offline" patch, which fixes this nicely. I can restart tgtd
without any issues. (The patch is not in git yet though)
>
> Supposing it's 3) - there is a slight race before you start tgtd and
> before you can configure targets with tgtadm. Therefore, a "sleep 2s"
> would be recommended. The rest is like below - tgtd already listens on
> 3260, but has no targets configured. Any initiator that connects will be
> rejected, and hence your immediate fs errors.
The "offline" patch fixed this. In this case, tgtd will only respond to
commands once you set it to running.
>
> One workaround is to block iSCSI traffic with iptables before starting
> tgtd and removing the block after all targets are configured.
Removing the IP address should have the same effect. I'll also do some
test with iptables to see if that helps.
Niels
Huh, that does not make sense. It is also weird that there are no scsi
errors before the block layer ones like here:
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sda, sector
> 1336006
The iscsi layer can only fail commands after getting the connection
error, so you should see
> Apr 21 11:35:36 front003 kernel: connection1:0: iscsi: detected conn
error
> (1011)
Then something about the recovery/replacement timeout expiring, then
some scsi error messages, and finally the block errors above.
> happen almost instantly, and it takes almost 10 seconds after that to
> detect the actual connection error. (After which it almost immediately
> reconnects). Possibly this has something to do with the way I do the
> switchover? (Remove virtual IP address from one server and add it to the
> other.)
Maybe.
Yes, that's what you would expect, but somehow this isn't happening.
The kernel is a centos dom0 kernel: 2.6.18-53.1.14.el5xen
The error messages relating to connection1 before the error are from an
earlier test:
Apr 21 11:33:34 front003 kernel: connection1:0: iscsi: detected conn error
(1011)
Apr 21 11:33:34 front003 kernel: connection1:0: iscsi: detected conn error
(1011)
Apr 21 11:33:34 front003 iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
Apr 21 11:33:38 front003 iscsid: received iferror -38
Apr 21 11:33:38 front003 iscsid: received iferror -38
Apr 21 11:33:38 front003 iscsid: received iferror -38
Apr 21 11:33:38 front003 iscsid: received iferror -38
Apr 21 11:33:38 front003 iscsid: received iferror -38
Apr 21 11:33:38 front003 iscsid: connection1:0 is operational after
recovery (1 attempts)
In this earlier case the failover worked correctly. (Maybe no commands
queued?)
The full logs for the error I send earlier:
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sda, sector
1336006
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sda, sector
1336006
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166993
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166993
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166994
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166994
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166995
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166995
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166996
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166996
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166997
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166997
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166998
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166998
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166999
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 166999
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 167000
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 167000
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 167001
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 167001
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 167002
Apr 21 11:35:25 front003 kernel: Buffer I/O error on device sda1, logical
block 167002
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: lost page write due to I/O error on sda1
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sda, sector
1337070
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sda, sector
1337070
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sda, sector
1337078
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sda, sector
1337078
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sda, sector
1335606
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sda, sector
1335606
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sda, sector
1335710
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sda, sector
1335710
Apr 21 11:35:25 front003 kernel: Aborting journal on device sda1.
Apr 21 11:35:25 front003 kernel: Aborting journal on device sda1.
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sdf, sector
18344
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sdf, sector
18344
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sdf, sector
18520
Apr 21 11:35:25 front003 kernel: end_request: I/O error, dev sdf, sector
18520
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
787520
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
787520
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
788224
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
788224
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector 0
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector 0
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
788240
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
788240
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
788384
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
788384
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1572880
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1572880
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1572888
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1572888
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1572944
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1572944
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1573000
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1573000
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1573016
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1573016
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1573024
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1573024
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1573328
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1573328
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1573600
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
1573600
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
2097168
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
2097168
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
2097176
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
2097176
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
2097576
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
2097576
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
2883584
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
2883584
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
2885008
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
2885008
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
3407888
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
3407888
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
3407904
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
3407904
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
3407936
Apr 21 11:35:26 front003 kernel: end_request: I/O error, dev sdj, sector
3407936
Apr 21 11:35:26 front003 kernel: ext3_abort called.
Apr 21 11:35:26 front003 kernel: ext3_abort called.
Apr 21 11:35:26 front003 kernel: EXT3-fs error (device sda1):
ext3_journal_start_sb: Detected aborted journal
Apr 21 11:35:26 front003 kernel: EXT3-fs error (device sda1):
ext3_journal_start_sb: Detected aborted journal
Apr 21 11:35:26 front003 kernel: Remounting filesystem read-only
Apr 21 11:35:26 front003 kernel: Remounting filesystem read-only
Apr 21 11:35:36 front003 kernel: connection1:0: iscsi: detected conn error
(1011)
Apr 21 11:35:36 front003 kernel: connection1:0: iscsi: detected conn error
(1011)
Apr 21 11:35:36 front003 iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
Apr 21 11:35:40 front003 kernel: connection5:0: iscsi: detected conn error
(1011)
Apr 21 11:35:40 front003 kernel: connection5:0: iscsi: detected conn error
(1011)
Apr 21 11:35:40 front003 kernel: connection8:0: iscsi: detected conn error
(1011)
Apr 21 11:35:40 front003 kernel: connection8:0: iscsi: detected conn error
(1011)
Apr 21 11:35:40 front003 iscsid: received iferror -38
Apr 21 11:35:40 front003 iscsid: received iferror -38
Apr 21 11:35:40 front003 iscsid: received iferror -38
Apr 21 11:35:40 front003 iscsid: received iferror -38
Apr 21 11:35:40 front003 iscsid: received iferror -38
Apr 21 11:35:40 front003 iscsid: connection1:0 is operational after
recovery (1 attempts)
Apr 21 11:35:40 front003 iscsid: Kernel reported iSCSI connection 5:0
error (1011) state (3)
Apr 21 11:35:40 front003 iscsid: Kernel reported iSCSI connection 8:0
error (1011) state (3)
Apr 21 11:35:40 front003 kernel: connection4:0: iscsi: detected conn error
(1011)
Apr 21 11:35:40 front003 kernel: connection4:0: iscsi: detected conn error
(1011)
Apr 21 11:35:41 front003 iscsid: Kernel reported iSCSI connection 4:0
error (1011) state (3)
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: connection5:0 is operational after
recovery (1 attempts)
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: connection4:0 is operational after
recovery (1 attempts)
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: received iferror -38
Apr 21 11:35:44 front003 iscsid: connection8:0 is operational after
recovery (1 attempts)
Niels
I have rewritten the failover to use a killall -9 tgtd, and now everything
works as expected. It looks like the server would still have some
connections active for a short time, even though the IP address was
already removed. (netstat also shows the old connections, even though the
IP address in not configured on the server anymore). In that case, the lun
would be removed while the initiator was using it, which would probably
cause the problems I saw. The actual connection would still be up, but the
data would not be accessible anymore.
>> Since the iscsi initiators connect to the virtual IP, there can be no
>> network connectivity while the switch is in progress. This should
>> prevent
>> any race conditions in tgtd.
>
> Yep. Unless you omitted something important ;)
Indeed, it seems this was an invalid assumption.
Niels