Re: [PATCH] scsi: libiscsi: Set expecting_cc_ua flag when stop_conn

8 views
Skip to first unread message

michael....@oracle.com

unread,
Oct 11, 2024, 10:49:02 AM10/11/24
to Xiang Zhang, ldu...@suse.com, cle...@redhat.com, ames.Bo...@hansenpartnership.com, martin....@oracle.com, james...@broadcom.com, ram.v...@broadcom.com, nja...@marvell.com, open-...@googlegroups.com, linux...@vger.kernel.org, linux-...@vger.kernel.org
CC'ing the fibre channel experts because they might have the same issue.

On 10/11/24 3:18 AM, Xiang Zhang wrote:
> Initiator need to recover session and reconnect to target, after calling stop_conn. And target will rebuild new session info, and mark ASC_POWERON_RESET ua sense for scsi devices belong to the target(device reset). After recovery, first scsi command(scmd) request to target will get ASC_POWERON_RESET(ua sense) + SAM_STAT_CHECK_CONDITION(status) in response.
> According to scsi code: "scsi_done --> scsi_complete --> scsi_decide_disposition --> scsi_check_sense", if expecting_cc_ua = 0, scmd response with ASC_POWERON_RESET(ua sense) will ignore "cmd->retries <= cmd->allowed", fail directly. It will cause SCSI return io_error to upper layer without retry.

Just want to make sure I understand the problem.

Does the failure only happen with tape or passthrough or if removable is
set?

For commands coming from sd, then scsi_io_completion will end up calling
scsi_io_completion_action and seeing the UNIT_ATTENTION and will retry.
I'm not saying we shouldn't do a fix like you did below. Just want to
make sure I understand the case you describe above.


> If we set expecting_cc_ua=1 in fail_scsi_tasks, SISC will retry the scmd which is response with ASC_POWERON_RESET. The scmd second request to target can successful, because target will clear ASC_POWERON_RESET in device pending ua_sense_list after first scmd request.


What does "SISC" stand for?

>
> Signed-off-by: Xiang Zhang <hawkxi...@gmail.com>
> ---
> drivers/scsi/libiscsi.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 0fda8905eabd..317e57be32b3 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -629,9 +629,10 @@ static void __fail_scsi_task(struct iscsi_task *task, int err)
> conn->session->queued_cmdsn--;
> /* it was never sent so just complete like normal */
> state = ISCSI_TASK_COMPLETED;
> - } else if (err == DID_TRANSPORT_DISRUPTED)
> + } else if (err == DID_TRANSPORT_DISRUPTED) {
> state = ISCSI_TASK_ABRT_SESS_RECOV;
> - else
> + sc->device->expecting_cc_ua = 1;


The failure case can happen with other transports like fibre channel
right? If it's common I think we want this in the core scsi code.

For iscsi, we want to set expecting_cc_ua whenever we call
scsi_block_targets() or whenever we return DID_TRANSPORT_DISRUPTED or
DID_TRANSPORT_FAILFAST.

FC developers, I'm not sure if that's the case for you. For example if
your driver called fc_remote_port_delete -> scsi_block_targets but then
the issue is resolved quickly, like for a quick cable pull, and you
called fc_remote_port_add, could there be cases where you did not get a
I_T Nexus loss/reset type of issue?

Or is it the case where anytime a fc driver calls fc_remote_port_delete
then you will expect a UA after calling fc_remote_port_add again?

kernel test robot

unread,
Oct 12, 2024, 10:42:12 AM10/12/24
to Xiang Zhang, ldu...@suse.com, cle...@redhat.com, michael....@oracle.com, ames.Bo...@hansenpartnership.com, martin....@oracle.com, ll...@lists.linux.dev, oe-kbu...@lists.linux.dev, open-...@googlegroups.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, Xiang Zhang
Hi Xiang,

kernel test robot noticed the following build warnings:

[auto build test WARNING on mkp-scsi/for-next]
[also build test WARNING on jejb-scsi/for-next linus/master v6.12-rc2 next-20241011]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Xiang-Zhang/scsi-libiscsi-Set-expecting_cc_ua-flag-when-stop_conn/20241011-161915
base: https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
patch link: https://lore.kernel.org/r/20241011081807.65027-1-hawkxiang.cpp%40gmail.com
patch subject: [PATCH] scsi: libiscsi: Set expecting_cc_ua flag when stop_conn
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20241012/202410122213...@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241012/202410122213...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <l...@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410122213...@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/scsi/libiscsi.c:634:3: warning: variable 'sc' is uninitialized when used here [-Wuninitialized]
634 | sc->device->expecting_cc_ua = 1;
| ^~
drivers/scsi/libiscsi.c:618:22: note: initialize the variable 'sc' to silence this warning
618 | struct scsi_cmnd *sc;
| ^
| = NULL
1 warning generated.


vim +/sc +634 drivers/scsi/libiscsi.c

610
611 /*
612 * session back and frwd lock must be held and if not called for a task that
613 * is still pending or from the xmit thread, then xmit thread must be suspended
614 */
615 static void __fail_scsi_task(struct iscsi_task *task, int err)
616 {
617 struct iscsi_conn *conn = task->conn;
618 struct scsi_cmnd *sc;
619 int state;
620
621 if (cleanup_queued_task(task))
622 return;
623
624 if (task->state == ISCSI_TASK_PENDING) {
625 /*
626 * cmd never made it to the xmit thread, so we should not count
627 * the cmd in the sequencing
628 */
629 conn->session->queued_cmdsn--;
630 /* it was never sent so just complete like normal */
631 state = ISCSI_TASK_COMPLETED;
632 } else if (err == DID_TRANSPORT_DISRUPTED) {
633 state = ISCSI_TASK_ABRT_SESS_RECOV;
> 634 sc->device->expecting_cc_ua = 1;
635 } else
636 state = ISCSI_TASK_ABRT_TMF;
637
638 sc = task->sc;
639 sc->result = err << 16;
640 scsi_set_resid(sc, scsi_bufflen(sc));
641 iscsi_complete_task(task, state);
642 }
643

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Mike Christie

unread,
Oct 14, 2024, 11:35:07 AM10/14/24
to 张翔, ldu...@suse.com, cle...@redhat.com, ames.Bo...@hansenpartnership.com, martin....@oracle.com, james...@broadcom.com, ram.v...@broadcom.com, nja...@marvell.com, open-...@googlegroups.com, linux...@vger.kernel.org, linux-...@vger.kernel.org
On 10/12/24 2:55 AM, 张翔 wrote:
>
>
> For commands coming from sd, then scsi_io_completion will end up calling
> scsi_io_completion_action and seeing the UNIT_ATTENTION and will retry.
> I'm not saying we shouldn't do a fix like you did below. Just want to
> make sure I understand the case you describe above.
>
>  
> For commands coming from sd, then scsi_complete calling scsi_decide_disposition to get "enum scsi_disposition", scsi_decide_disposition seeing the SAM_STAT_CHECK_CONDITION and calling scsi_check_sense function, then scsi_check_sense seeing UNIT_ATTENTION. If expecting_cc_ua == 1, scsi_check_sense return NEEDS_RETRY and scsi_complete will retry.


For sd, scsi_decide_disposition will return SUCCESS. scsi_complete will call
scsi_finish_command. In there we call the upper layer done callback, sd_done,
and it will return 0 as there are no good bytes. scsi_io_completion will
initially complete 0 bytes. If there are retries left then we call
scsi_io_completion_action which sees the UA and will retry.

Mike Christie

unread,
Oct 14, 2024, 11:40:00 AM10/14/24
to Xiang Zhang, ldu...@suse.com, cle...@redhat.com, ames.Bo...@hansenpartnership.com, martin....@oracle.com, open-...@googlegroups.com, linux...@vger.kernel.org
On 10/14/24 12:36 AM, Xiang Zhang wrote:
> Initiator need to recover session and reconnect to target, after calling stop_conn. And target will rebuild new session info, and mark ASC_POWERON_RESET ua sense for scsi devices belong to the target(device reset). After recovery, first scsi command(scmd) request to target will get ASC_POWERON_RESET(ua sense) + SAM_STAT_CHECK_CONDITION(status) in response.
> For command's response coming, according to scsi function calling: "scsi_done --> scsi_complete --> scsi_decide_disposition --> scsi_check_sense", if expecting_cc_ua = 0, scmd response with ASC_POWERON_RESET(ua sense) will make scsi_complete ignore "cmd->retries <= cmd->allowed", fail directly. It will cause SCSI return io_error to upper layer without retry.
> If we set expecting_cc_ua=1 in fail_scsi_tasks, scsi_complete will retry scmd which is response with ASC_POWERON_RESET. The scmd second request to target can successful, because target will clear ASC_POWERON_RESET in device pending ua_sense_list after first scmd request.
>
> Signed-off-by: Xiang Zhang <hawkxi...@gmail.com>
> ---
> V1 -> V2: Fix build variable 'sc' is uninitialized warning(Reported-by: kernel test robot <l...@intel.com>).
> ---
> drivers/scsi/libiscsi.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 0fda8905eabd..f6bfe0c4f8a4 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -621,6 +621,7 @@ static void __fail_scsi_task(struct iscsi_task *task, int err)
> if (cleanup_queued_task(task))
> return;
>
> + sc = task->sc;
> if (task->state == ISCSI_TASK_PENDING) {
> /*
> * cmd never made it to the xmit thread, so we should not count
> @@ -629,12 +630,12 @@ static void __fail_scsi_task(struct iscsi_task *task, int err)
> conn->session->queued_cmdsn--;
> /* it was never sent so just complete like normal */
> state = ISCSI_TASK_COMPLETED;
> - } else if (err == DID_TRANSPORT_DISRUPTED)
> + } else if (err == DID_TRANSPORT_DISRUPTED) {
> state = ISCSI_TASK_ABRT_SESS_RECOV;
> - else
> + sc->device->expecting_cc_ua = 1;
> + } else
> state = ISCSI_TASK_ABRT_TMF;
>
> - sc = task->sc;
> sc->result = err << 16;
> scsi_set_resid(sc, scsi_bufflen(sc));
> iscsi_complete_task(task, state);


This should be fixed in a common way like I mentioned in the other thread.
Reply all
Reply to author
Forward
0 new messages