Question about iscsi session block

Zhengyuan Liu

unread,

Feb 15, 2022, 10:49:19 AM2/15/22

to linux...@vger.kernel.org, open-iscsi, dm-d...@redhat.com, ldu...@suse.com, le...@redhat.com, bob...@oracle.com

Hi, all

We have an online server which uses multipath + iscsi to attach storage
from Storage Server. There are two NICs on the server and for each it
carries about 20 iscsi sessions and for each session it includes about 50
iscsi devices (yes, there are totally about 2*20*50=2000 iscsi block devices
on the server). The problem is: once a NIC gets faulted, it will take too long
(nearly 80s) for multipath to switch to another good NIC link, because it
needs to block all iscsi devices over that faulted NIC firstly. The callstack is
shown below:

void iscsi_block_session(struct iscsi_cls_session *session)
{
queue_work(iscsi_eh_timer_workq, &session->block_work);
}

__iscsi_block_session() -> scsi_target_block() -> target_block() ->
device_block() -> scsi_internal_device_block() -> scsi_stop_queue() ->
blk_mq_quiesce_queue()>synchronize_rcu()

For all sessions and all devices, it was processed sequentially, and we have
traced that for each synchronize_rcu() call it takes about 80ms, so
the total cost
is about 80s (80ms * 20 * 50). It's so long that the application can't
tolerate and
may interrupt service.

So my question is that can we optimize the procedure to reduce the time cost on
blocking all iscsi devices? I'm not sure if it is a good idea to increase the
workqueue's max_active of iscsi_eh_timer_workq to improve concurrency.

Thanks in advance.

Donald Williams

unread,

Feb 15, 2022, 11:25:59 AM2/15/22

to open-...@googlegroups.com, linux...@vger.kernel.org, dm-d...@redhat.com, ldu...@suse.com, le...@redhat.com, bob...@oracle.com

Hello,

Something else to check is your MPIO configuration. I have seen this same symptom when the linux MPIO feature "queue_if_no_path" was enabled

From the /etc/multipath.conf file showing it enabled.

failback immediate
features "1 queue_if_no_path"

Also, in the past some versions of linux multipathd would wait for a very long time before moving all I/O to the remaining path.

Regards,

Don

--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/CAOOPZo4uNCicVmoHa2za0%3DO1_XiBdtBvTuUzqBTeBc3FmDqEJw%40mail.gmail.com.

Mike Christie

unread,

Feb 15, 2022, 11:31:52 AM2/15/22

to Zhengyuan Liu, linux...@vger.kernel.org, open-iscsi, dm-d...@redhat.com, ldu...@suse.com, le...@redhat.com

We need a patch, so the unblock call waits/cancels/flushes the block call or
they could be running in parallel.

I'll send a patchset later today so you can test it.

Zhengyuan Liu

unread,

Feb 15, 2022, 8:28:26 PM2/15/22

to Mike Christie, linux...@vger.kernel.org, open-iscsi, dm-d...@redhat.com, ldu...@suse.com, le...@redhat.com

I'm glad to test once you push the patchset.

Thank you, Mike.

michael....@oracle.com

unread,

Feb 15, 2022, 9:19:26 PM2/15/22

to Zhengyuan Liu, linux...@vger.kernel.org, open-iscsi, dm-d...@redhat.com, ldu...@suse.com, le...@redhat.com

I forgot I did this recently :)

commit 7ce9fc5ecde0d8bd64c29baee6c5e3ce7074ec9a
Author: Mike Christie <michael....@oracle.com>
Date: Tue May 25 13:18:09 2021 -0500

scsi: iscsi: Flush block work before unblock

We set the max_active iSCSI EH works to 1, so all work is going to execute
in order by default. However, userspace can now override this in sysfs. If
max_active > 1, we can end up with the block_work on CPU1 and
iscsi_unblock_session running the unblock_work on CPU2 and the session and
target/device state will end up out of sync with each other.

This adds a flush of the block_work in iscsi_unblock_session.

It was merged in 5.14.

Ulrich Windl

unread,

Feb 16, 2022, 5:12:54 AM2/16/22

to open-iscsi

>>> Donald Williams <don.e.w...@gmail.com> schrieb am 15.02.2022 um 17:25 in
Nachricht
<CAK3e-EZbJMDHkozGiz8LnMNA...@mail.gmail.com>:

> Hello,
> Something else to check is your MPIO configuration. I have seen this
> same symptom when the linux MPIO feature "queue_if_no_path" was enabled
>
> From the /etc/multipath.conf file showing it enabled.
>
> failback immediate
> features "1 queue_if_no_path"

Yes, the actual config is interesting. Especially when usind MD-RAID, you typically do not want "1 queue_if_no_path", but if the app can't handle I/O errors, one might want it.
For a FC SAN featuring ALUA we use:
...
polling_interval 5
max_polling_interval 20
path_selector "service-time 0"
...
path_checker "tur"
...
fast_io_fail_tmo 5
dev_loss_tmo 600

The logs are helpful, too. For example (there were some paths remaining all the time):
Cable was unplugged:
Feb 14 12:56:05 h16 kernel: qla2xxx [0000:41:00.0]-500b:3: LOOP DOWN detected (2 7 0 0).
Feb 14 12:56:10 h16 multipathd[5225]: sdbi: mark as failed
Feb 14 12:56:10 h16 multipathd[5225]: SAP_V11-PM: remaining active paths: 7
Feb 14 12:56:10 h16 kernel: sd 3:0:6:3: rejecting I/O to offline device
Feb 14 12:56:10 h16 kernel: sd 3:0:6:14: rejecting I/O to offline device
Feb 14 12:56:10 h16 kernel: sd 3:0:6:15: rejecting I/O to offline device

So 5 seconds later the paths are offlined.

Cable was re-plugged:
Feb 14 12:56:22 h16 kernel: qla2xxx [0000:41:00.0]-500a:3: LOOP UP detected (8 Gbps).
Feb 14 12:56:22 h16 kernel: qla2xxx [0000:41:00.0]-11a2:3: FEC=enabled (data rate).
Feb 14 12:56:26 h16 multipathd[5225]: SAP_CJ1-PM: sdbc - tur checker reports path is up
Feb 14 12:56:26 h16 multipathd[5225]: 67:96: reinstated
Feb 14 12:56:26 h16 multipathd[5225]: SAP_CJ1-PM: remaining active paths: 5
Feb 14 12:56:26 h16 kernel: device-mapper: multipath: 254:4: Reinstating path 67:96.
Feb 14 12:56:26 h16 kernel: device-mapper: multipath: 254:6: Reinstating path 67:112.

So 4 seconds later new paths are discovered.

Regards,
Ulrich

> https://groups.google.com/d/msgid/open-iscsi/CAK3e-EZbJMDHkozGiz8LnMNAZ%2BSoC
> A%2BQeK0kpkqM4vQ4pz86SQ%40mail.gmail.com.

Donald Williams

unread,

Feb 16, 2022, 8:31:15 AM2/16/22

to open-...@googlegroups.com

Hello,

Thanks. On the app side, with iSCSI SANs I extend the disk\ timeout value in the OS. To better handle any transitory network events, and controller failovers.

In linux that's important to prevent filesystems like EXT4 from remounting RO on an error.

I would like to know which vendor they are using for iSCSI storage.

Regards,

Don

To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/620CCE20020000A100047D30%40gwsmtp.uni-regensburg.de.

Mike Christie

unread,

Feb 26, 2022, 6:00:14 PM2/26/22

to Zhengyuan Liu, linux...@vger.kernel.org, open-iscsi, dm-d...@redhat.com, ldu...@suse.com, le...@redhat.com

Hey, I found one more bug when max_active > 1. While fixing it I decided to just
fix this so we can do the sessions recoveries in parallel and the user doesn't have
to worry about setting max_active.

I'll send a patchset and cc you.

Zhengyuan Liu

unread,

May 24, 2022, 2:29:37 AM5/24/22

to Mike Christie, linux...@vger.kernel.org, open-iscsi, dm-d...@redhat.com, ldu...@suse.com, le...@redhat.com

Hi, Mike,

Sorry for the delayed reply since I have no environment to check your
bellow patcheset untile recently

https://lore.kernel.org/all/20220226230435.3873...@oracle.com/

After applied those series, the total time has dropped from 80s to
nearly 10s, it's a great improvement.

Thanks, again

On Sun, Feb 27, 2022 at 7:00 AM Mike Christie

Reply all

Reply to author

Forward