Kernel panic: Hung task on unbind session

25 views
Skip to first unread message

ajhu...@gmail.com

unread,
Jan 21, 2021, 9:52:27 AM1/21/21
to open-iscsi
Hi Folks,

I am looking at a kernel panic due to a hung task and could use some help understanding whether this is a known issue.  Kernel version is 4.14.63.

Here is an complete stack trace of the hung kworker task.

crash> bt 106700
PID: 106700  TASK: ffff885eb22ebe80  CPU: 8   COMMAND: "kworker/u32:0"
 #0 [ffffc900550ebab8] __schedule at ffffffff815f0b78
 #1 [ffffc900550ebb50] schedule at ffffffff815f1248
 #2 [ffffc900550ebb58] schedule_timeout at ffffffff815f4fe6
 #3 [ffffc900550ebbf8] wait_for_completion at ffffffff815f1cf0
 #4 [ffffc900550ebc48] flush_workqueue at ffffffff8108ec66
 #5 [ffffc900550ebce8] drain_workqueue at ffffffff8108ef84
 #6 [ffffc900550ebd10] destroy_workqueue at ffffffff81091ce5
 #7 [ffffc900550ebd30] scsi_host_dev_release at ffffffffa0095ced [scsi_mod]
 #8 [ffffc900550ebd48] device_release at ffffffff81453c90
 #9 [ffffc900550ebd68] kobject_put at ffffffff815d8130
#10 [ffffc900550ebd88] iscsi_session_release at ffffffffa0aebf88 [scsi_transport_iscsi]
#11 [ffffc900550ebda8] device_release at ffffffff81453c90
#12 [ffffc900550ebdc8] kobject_put at ffffffff815d8130
#13 [ffffc900550ebde8] device_release at ffffffff81453c90
#14 [ffffc900550ebe08] kobject_put at ffffffff815d8130
#15 [ffffc900550ebe28] scsi_remove_target at ffffffffa00a3e92 [scsi_mod]
#16 [ffffc900550ebe70] __iscsi_unbind_session at ffffffffa0aecd8d [scsi_transport_iscsi]
#17 [ffffc900550ebe98] process_one_work at ffffffff8108f62a
#18 [ffffc900550ebed8] worker_thread at ffffffff8108f84b
#19 [ffffc900550ebf10] kthread at ffffffff8109536a
#20 [ffffc900550ebf50] ret_from_fork at ffffffff816001ef

After poking around in the kdump, I've discovered that the worker thread that called __iscsi_unbind_session did so for a work item that came from the same workqueue that is being destroyed at the top of the stack. My understanding of work queues is that this isn't allowed and will result in a hung task.   

Here we can see where the __iscsi_unbind_session work is queued to a SCSI work queue

static int
iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group)
{
.
.
.
case ISCSI_UEVENT_UNBIND_SESSION:
session = iscsi_session_lookup(ev->u.d_session.sid);
if (session)
scsi_queue_work(iscsi_session_to_shost(session),     <--- unbind work queued to scsi work queue
&session->unbind_work);
else
err = -EINVAL;
break;
Here we can see that this puts the work item onto Scsi_Host->work_q 

int scsi_queue_work(struct Scsi_Host *shost, struct work_struct *work)
{
if (unlikely(!shost->work_q)) {
shost_printk(KERN_ERR, shost,
"ERROR: Scsi host '%s' attempted to queue scsi-work, "
"when no workqueue created.\n", shost->hostt->name);
dump_stack();

return -EINVAL;
}

return queue_work(shost->work_q, work);      <--- Work item goes into Scsi_Host->work_q
}
Here we can see the scsi_host_dev_release routine destroying the Scsi_Host->work_q

static void scsi_host_dev_release(struct device *dev)
{
struct Scsi_Host *shost = dev_to_shost(dev);
struct device *parent = dev->parent;

scsi_proc_hostdir_rm(shost->hostt);

/* Wait for functions invoked through call_rcu(&shost->rcu, ...) */
rcu_barrier();

if (shost->tmf_work_q)
destroy_workqueue(shost->tmf_work_q);
if (shost->ehandler)
kthread_stop(shost->ehandler);
if (shost->work_q)
destroy_workqueue(shost->work_q);      <--- Destroying Scsi_Host->work_q

I did some searching and couldn't locate a similar stack trace. Does anyone know if this a known issue? 

If not a known issue, any ideas as to what would normally keep the Scsi_Host device from being removed inline in this call stack? This happened on two hosts with mniutes of each other after starting to disconnect from 2 targets. I believe the unbind session was kicked off from an iscsiadm command to terminate the session but other than that nothing out of the ordinary was going on. 

Thanks in advance, 
Adam

The Lee-Man

unread,
Feb 19, 2021, 3:02:56 PM2/19/21
to open-iscsi
Yes, you bring up a good point in my opinion. I do not know this code well, but it seems like UNBIND_SESSION could never work.

Mike Chistie? Chris Leech?
Reply all
Reply to author
Forward
0 new messages