At storage summit this year, it was decided iscsi needs to catch up with
the other iscsi drivers, and add something like FC's dev_loss_tmo. When
this timeout expires, it will cause the iscsi layer to remove the scsi
devices and fail any IO that was queued. Upper layers like dm-multipath
(actually multipathd and dm-multipath work together) then handle hotplug
removal events from the device deletion event, by removing paths from
the dm-multipath device.
When the session gets logged back in, we kick off a scan and re-find
devices. multipathd and dm-multipath then handle this by adding the
paths back to the multipath device.
If you are not using dm-multipath and are mounting the FS on the iscsi
device then you basically see the same thing as before. IO errors,
followed by the FS getting mounted read only.
This patch will also fix a problem where if a device is offlined due to
the transport going kaput scsi_internal_device_unblock() would not set
the device back to running. With this patch the device gets destroyed
and a new one gets added, so we completely bypass that problem like how
FC and SAS does (note: it does not address the problem with
iscsi_block_eh races - waiting on the fc discussion).
This patch was made over Linus's tree but can be applied to scsi-misc or
scsi-rc-fixes. I have only done some light testing. I wanted to post
this though, so other driver developers could test it out.
1) What will be the default value for the new timeout?
2) I think this should be at least KERN_WARN(ing):
+ iscsi_cls_session_printk(KERN_INFO, session,
+ "session dev loss timed out after %d secs\n",
+ session->dev_loss_tmo);
3) As FC hardly ever looses packets or suffers from variable delays, it's much easier to timeout an FC device than a TCP device (IMHO)
4) With Linux an non-persistent device names, frequent remove and re-add of devices should be avoided if possible (In other operating systems the same device file may re-appear if the device comes back online)
5) I hope (i.e. I have no idea about it) that outstanding requests for a device that is going to be removed are removed cleanly before as well.
Regards,
Ulrich
>>> Mike Christie <mich...@cs.wisc.edu> schrieb am 06.10.2010 um 10:46 in
Nachricht <4CAC376C...@cs.wisc.edu>:
Not, sure yet. Suggestions?
>
> 2) I think this should be at least KERN_WARN(ing):
> + iscsi_cls_session_printk(KERN_INFO, session,
> + "session dev loss timed out after %d secs\n",
> + session->dev_loss_tmo);
>
> 3) As FC hardly ever looses packets or suffers from variable delays, it's much easier to timeout an FC device than a TCP device (IMHO)
>
> 4) With Linux an non-persistent device names, frequent remove and re-add of devices should be avoided if possible (In other operating systems the same device file may re-appear if the device comes back online)
>
Linux has persistent names in /dev/disk that should be used with iscsi.
We do the initial scan and login in parallel, so you should be using
those names now.
> 5) I hope (i.e. I have no idea about it) that outstanding requests for a device that is going to be removed are removed cleanly before as well.
>
What do you mean by cleanly? It should not leak and there should not be
oopes. You will get IO errors like you do when the replacement/recovery
timeout expires since at this time we cannot execute IO. It is basically
a iscsiadm -m ... -u when the session is down, but the session does not
get destroyed. Instead it stays around and tries to relogin. If it loggs
back in, we rescan the session and re-add devices.
Apart from that: good point. I was on the verge of updating
multipath-tools to checking/modifying recovery_tmo for iSCSI, but
maybe I'll wait for a few days.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
ha...@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Markus Rex, HRB 16746 (AG N�rnberg)
Was I drinking :) because it is backwards.
> Yet you implement it at totally different angle, leaving
> recovery_tmo untouched.
> I would have expected some integration there ....
> And, in fact, I think it might be more worthwhile to have a 1:1
> mapping between recovery_tmo and dev_loss_tmo and implement a new
> handling for fast_io_fail, which would be quite attractive to iscsi,
> too.
recovery_tmo is actually the same as FC's fast_io_fail. It is like that
upstream and in my patch.
iSCSI's dev_loss_tmo is the same as FC's dev_loss_tmo.
I did not want to change the behavior of recovery_tmo because people are
using it and it works just like FC's fast_io_fail_tmo. So I added
dev_loss_tmo which has the new behavior that works just like FC. The
user will then only notice a difference in behavior if they turn it on.