You can monitor which sessions have errors by listening on the iscsi
netlink interface. I am not sure if that is very user friendly. You
would have to listen for ISCSI_KEVENT_CONN_ERROR events and filter out
other junk. This does not tell you if a disk has errors though.
For upstream we are working on a more complete solution for
dm-mutlipath, so it can figure out when session/connections have link
problems or if a disk is offlined by the scsi eh. But right now, I do
not think there is any easy way to do what you need.
It is not really that easy, because if the nop times out the iscsi layer
will drop the session and the disk state will not change to offline. The
disk state will only change if the scsi command timer fires and the scsi
eh runs and fails. In this case the disk state will go to offline.
For the nop timeout case and the scsi eh failing case, the iscsi session
state will go to failed, so you could check that instead. That value is in
/sys/class/iscsi_session/session%SID/state
>
> If the network comes back up, how soon will the disk state go to
> 'running' ?
When the iscsi session is dropped due to a nop timeout or the scsi eh
failing, the initiator will basically poll the network ever couple of
seconds by trying to reconnect the tcp connection. And so it depends on
the type of failure. If the initiator is trying to reconnect the tcp
connection when the network comes up, then we could reconnect right
away, or if the network layer cannot figure things out the reconnect
could timeout and then the next try would work, or if the network had
given us a error right away when we tried the reconnect then it on the
next reconnect attempt we would be successful.