In setup_full_feature_phase, iscsid calls into the kernel via
start_conn, then sets all the relevant device states to "running" via
session_online_devs. This second step is redundant since start_conn will
set the device states to running. Moreover, it can cause tasks to hang
forever: between start_conn and session_online_devs, the kernel could
detect another conn error and block the session again, which quiesces
the device queues. Setting the device state to "running" via sysfs kicks
off a rescan, and if the device queue is quiesced, the rescan will hang.
The iscsid kernel stacktrace looks like the following:
[<0>] blk_execute_rq+0x11c/0x170
[<0>] __scsi_execute+0x108/0x270
[<0>] scsi_vpd_inquiry+0x6d/0xc0
[<0>] scsi_get_vpd_size+0x33/0x70
[<0>] scsi_get_vpd_buf+0x25/0xb0
[<0>] scsi_attach_vpd+0x33/0x1a0
[<0>] scsi_rescan_device+0x2a/0x90
[<0>] store_state_field+0x1b0/0x250
[<0>] kernfs_fop_write_iter+0x130/0x1c0
[<0>] new_sync_write+0x10c/0x190
[<0>] vfs_write+0x218/0x2a0
[<0>] ksys_write+0x59/0xd0
[<0>] do_syscall_64+0x3a/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Since iscsid is responsible for recovery from the second conn error but
it is stuck, the relevant device queues will remain quiesced forever.
Tasks attempting I/O on these queues will thus also get stuck.
For these two reasons, remove the call to session_online_devs in
setup_full_feature_phase.
Signed-off-by: Uday Shankar <
usha...@purestorage.com>
---
usr/initiator.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/usr/initiator.c b/usr/initiator.c
index 56bf38b..6cbdcba 100644
--- a/usr/initiator.c
+++ b/usr/initiator.c
@@ -1068,7 +1068,6 @@ setup_full_feature_phase(iscsi_conn_t *conn)
} else {
session->notify_qtask = NULL;
- session_online_devs(session->hostno, session->id);
mgmt_ipc_write_rsp(c->qtask, ISCSI_SUCCESS);
log_warning("connection%d:%d is operational after recovery "
"(%d attempts)", session->id, conn->id,
--
2.25.1