This just indicates there is n initial problem. During this time the
iscsi layer will not yet fail the device. When the recovery/replacement
timeout fires and you see this
session recovery timed out after %d secs
is when the iscsi layer will fail IO.
> No udev events as seen by udevadm monitor
>
> 3. Perform sg_readcap and readsize
>
> $ sudo sg_readcap -v /dev/sdc | grep address | sed s'/.*blocks=//'
> read capacity (10) cdb: 25 00 00 00 00 00 00 00 00 00
> read capacity (10): transport: Host_status=0x0f is invalid
> Driver_status=0x00 [DRIVER_OK, SUGGEST_OK]
>
I am not sure if you are checking for the right thing. When you see the
session recovery timedout error above sg_readcap will fail like above or
will fail with
"No such device or address"
So at this time the sg_readcap output and the /sys...size value below
would not match. Even above it seems that if you printed the output of
sg_readcap then you would not see the correct size returned.
> $ cat /sys/block/sdc/device//block/sdc/size
> 4395405168
The /sys/ cap is going to be the same. It would only change if maybe you
rescanned the device when the transport is failed, because this would
case the kernel's read cap cmd to fail.
Instead of the above read cap checks , in 3.6 you can check if the
device state is "transport-offline" if it is then you can just delete
the device.
Something like this in pseudo code:
if /sys/block/sdc/device/state == "transport-offline" then
echo 1 > /sys/block/sdc/device/delete
endif
>
> So the size is still the same. This is on kernel 3.6.0 on the target and
> 3.2.14 on the initiator.
>
> Regarding dev_loss_tmo, this has to be implemented at the kernel level
> correct? Since this feature is important to me, I can go ahead and try
> to implement that.
>
Ok. Here is some info on the current status:
Implement iSCSI dev loss support.
Currently if a session is down for longer than replacement/recovery_timeout
seconds, the iscsi layer will unblock the devices and fail IO. Other
transport, like FC and SAS, will do something similar. FC has a
fast_io_fail tmo which will unblock devices and fail IO, then it has a
dev_loss_tmo which will delete the devices accessed through that port.
iSCSI needs to implement dev_loss_tmo behavior, because apps are beginning
to expect this behavior. An initial path was made here:
http://groups.google.com/group/open-iscsi/msg/031510ab4cecccfd?dmode=source
Since all drivers want this behavior we want to make it common. We need to
change the patch in that link to add a dev_loss_tmo handler callback to the
scsi_transport_template struct, and add some common sysfs and helpers
functions to manage the dev_loss_tmo variable.