max_sectors_kb so small, throughout so small

3kbo...@gmail.com

unread,

Aug 27, 2018, 7:18:50 PM8/27/18

to open-iscsi

Hi folks,

I am newbie to open-iscsi.

My case is I export ceph rbd by open-iscsi.

I found the max_sectors_kb is 64, the value is so small, and 4M sequence write is only about 10MB/s.

I can not increase max_sectors_kb, if I do it will return "bash: echo: write error: Invalid argument"(But I can change the value to a small one < 64, max_hw_sectors_kb is 32767)

Before I export the ceph rbd device by open-iscsi, the max_sectors_kb of bare rbd device is 4096, and 4M sequence write is only about 600MB/s.

If I change max_sectors_kb of bare rbd device from 4096 to 64, 4M sequence write will be decreased form 600MB/s to 20MB/s.

So I quite curious about max_sectors_kb, it seems that open-iscsi disk throughout may have a relationship with max_sectors_kb.

why max_sectors_kb of iscsi disk is so small, can someone explain this?

Thanks

Mike Christie

unread,

Aug 27, 2018, 8:49:46 PM8/27/18

to open-...@googlegroups.com

On 08/21/2018 08:52 PM, 3kbo...@gmail.com wrote:
> Hi folks,
>
> I am newbie to open-iscsi.
> My case is I export ceph rbd by open-iscsi.
>
> I found the max_sectors_kb is 64, the value is so small, and 4M sequence
> write is only about 10MB/s.
> I can not increase max_sectors_kb, if I do it will return "bash: echo:
> write error: Invalid argument"(But I can change the value to a small one
> < 64, max_hw_sectors_kb is 32767)
>

In new version of the linux kernel the initiator will use the optimal
value reported by the target and then uses the max reported as the limit
that the user can set. It sounds like you are using rbd/ceph with
tcmu-runner which has a default limits of 64K.

If you are using targetcli/lio directly then you can set hw_max_sectors
through targcli when you create the device or in the saveconfig.json file.

If you are using the ceph-iscsi tools then I am actually working on a
gwcli command to configure this right now.

3kbo...@gmail.com

unread,

Sep 2, 2018, 10:31:48 PM9/2/18

to open-iscsi

Hello Mike,

Thank you for your informative response.

在 2018年8月28日星期二 UTC+8上午8:49:46，Mike Christie写道：

On 08/21/2018 08:52 PM, 3kbo...@gmail.com wrote:
> Hi folks,
>
> I am newbie to open-iscsi.
> My case is I export ceph rbd by open-iscsi.
>
> I found the max_sectors_kb is 64, the value is so small, and 4M sequence
> write is only about 10MB/s.
> I can not increase max_sectors_kb, if I do it will return "bash: echo:
> write error: Invalid argument"(But I can change the value to a small one
> < 64, max_hw_sectors_kb is 32767)
>

In new version of the linux kernel the initiator will use the optimal
value reported by the target and then uses the max reported as the limit
that the user can set. It sounds like you are using rbd/ceph with
tcmu-runner which has a default limits of 64K.

Yes, I am using tcmu-runner and gwcli.

If you are using targetcli/lio directly then you can set hw_max_sectors
through targcli when you create the device or in the saveconfig.json file.

If you are using the ceph-iscsi tools then I am actually working on a
gwcli command to configure this right now.

Yes, I am using ceph-iscsi tools.How can I change the limit by ceph-iscsi tools?

Thanks.

3kbo...@gmail.com

unread,

Sep 3, 2018, 3:27:36 AM9/3/18

to open-iscsi

Hello Mike,

I am very appreciate for your instruction.

Now I can set hw_max_sectors through targcli when i create the device. I set it to 8192 same as raw rbd device.

The performance improve a little, 4M seq write increase from 24MB/s to 40MB/s.(hw_max_sectors 64->8192, it is a /backstore/user:rbd/disk_xxx device)

But it is far away from block device, if I usr /backstore/block 4M seq write it will be 500MB/s, performance is still a big problem.(first I should map a rbd device , then export it by targetcli)

The performance difference between /backstore/user:rbd/ and /backstore/block is so big, is it normal?

Could you give me some suggestions about improving the performance of /backstore/user:rbd/ device.

Thanks very much!

在 2018年9月3日星期一 UTC+8上午10:31:48，3kbo...@gmail.com写道：

3kbo...@gmail.com

unread,

Sep 3, 2018, 3:31:13 AM9/3/18

to open-iscsi

Hello Mike,

I am very appreciate for your instruction.

Now I can set hw_max_sectors through targcli when i create the device. I set it to 8192 same as raw rbd device.

The performance improve a little, 4M seq write increase from 24MB/s to 40MB/s.(hw_max_sectors 64->8192, it is a /backstore/user:rbd/disk_xxx device)

But it is far away from block device, if I usr /backstore/block 4M seq write it will be 500MB/s, performance is still a big problem.(first I should map a rbd device , then export it by targetcli)

The performance difference between /backstore/user:rbd/ and /backstore/block is so big, is it normal?

Could you give me some suggestions about improving the performance of /backstore/user:rbd/ device.

Thanks very much!

在 2018年9月3日星期一 UTC+8上午10:31:48，3kbo...@gmail.com写道：

Hello Mike,

Mike Christie

unread,

Sep 4, 2018, 5:16:21 PM9/4/18

to open-...@googlegroups.com

On 09/03/2018 02:31 AM, 3kbo...@gmail.com wrote:
> Hello Mike,
>
> I am very appreciate for your instruction.
> Now I can set hw_max_sectors through targcli when i create the device.
> I set it to 8192 same as raw rbd device.
> The performance improve a little, 4M seq write increase from 24MB/s to
> 40MB/s.(hw_max_sectors 64->8192, it is a /backstore/user:rbd/disk_xxx
> device)
> But it is far away from block device, if I usr /backstore/block 4M seq
> write it will be 500MB/s, performance is still a big problem.(first I
> should map a rbd device , then export it by targetcli)
>
> The performance difference between /backstore/user:rbd/ and
> /backstore/block is so big, is it normal?

backstore/user will be lower than block, but I do not think it would a
difference like you are seeing.

What lio fabric driver are you using? iSCSI? What kernel version and
what version of tcmu-runner?

> --
> You received this message because you are subscribed to the Google
> Groups "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to open-iscsi+...@googlegroups.com
> <mailto:open-iscsi+...@googlegroups.com>.
> To post to this group, send email to open-...@googlegroups.com
> <mailto:open-...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/open-iscsi.
> For more options, visit https://groups.google.com/d/optout.

3kbo...@gmail.com

unread,

Sep 5, 2018, 10:50:59 PM9/5/18

to open-iscsi

Thank you very much, Mike!

在 2018年9月5日星期三 UTC+8上午5:16:21，Mike Christie写道：

On 09/03/2018 02:31 AM, 3kbo...@gmail.com wrote:
> Hello Mike,
>
> I am very appreciate for your instruction.
> Now I can set hw_max_sectors through targcli when i create the device.
> I set it to 8192 same as raw rbd device.
> The performance improve a little, 4M seq write increase from 24MB/s to
> 40MB/s.(hw_max_sectors 64->8192, it is a /backstore/user:rbd/disk_xxx
> device)
> But it is far away from block device, if I usr /backstore/block 4M seq
> write it will be 500MB/s, performance is still a big problem.(first I
> should map a rbd device , then export it by targetcli)
>
> The performance difference between /backstore/user:rbd/ and
> /backstore/block is so big, is it normal?

backstore/user will be lower than block, but I do not think it would a
difference like you are seeing.

ok, it is a good news for me.

It is still possible for me to improve my iscsi disk performance.

What lio fabric driver are you using? iSCSI? What kernel version and
what version of tcmu-runner?

io fabric driver : iscsi

iscsid version: 2.0-873

OS version: CentOS Linux release 7.5.1804 (Core)

kernel version: 3.10.0-862.el7.x86_64

tcmu-runner version: 1.4.0-rc1

Target Build:

targetcli /iscsi create iqn.2018-09.com.test:target1

targetcli /backstores/user:rbd create name=my_replicated_test size=1000G cfgstring=rbd_pool/replicated_image1 hw_max_sectors=8192

targetcli /iscsi/iqn.2018-09.com.test:target1/tpg1/luns create /backstores/user:rbd/my_replicated_test

targetcli /iscsi/iqn.2018-09.com.test:target1/tpg1/portals create 10.0.1.111

targetcli /iscsi/iqn.2018-09.com.test:target1/tpg1 set auth userid=****** password=******

targetcli /iscsi/iqn.2018-09.com.test:target1/tpg1 set attribute authentication=1 demo_mode_write_protect=0 generate_node_acls=1

Target Setting:

> <mailto:open-iscsi+unsub...@googlegroups.com>.

3kbo...@gmail.com

unread,

Sep 5, 2018, 11:36:29 PM9/5/18

to open-iscsi

Sorry ,the capture for targetcli seems can not display correctly.Paste it again.

o- / ......................................................................................................................... [...]

o- backstores .............................................................................................................. [...]

| o- block .................................................................................................. [Storage Objects: 1]

| | o- rbd_iblock1 ..................................................... [/dev/rbd/rbd_pool/image2 (50.0TiB) write-thru activated]

| | o- alua ................................................................................................... [ALUA Groups: 1]

| | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]

| o- fileio ................................................................................................. [Storage Objects: 0]

| o- pscsi .................................................................................................. [Storage Objects: 0]

| o- ramdisk ................................................................................................ [Storage Objects: 0]

| o- user:glfs .............................................................................................. [Storage Objects: 0]

| o- user:rbd ............................................................................................... [Storage Objects: 1]

| | o- my_replicated_test ..................................................... [rbd_pool/replicated_image1 (1000.0GiB) activated]

| | o- alua ................................................................................................... [ALUA Groups: 1]

| | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]

| o- user:zbc ............................................................................................... [Storage Objects: 0]

o- iscsi ............................................................................................................ [Targets: 1]

| o- iqn.2018-09.com.test:target1 ...................................................................................... [TPGs: 1]

| o- tpg1 ..................................................................................... [gen-acls, tpg-auth, 1-way auth]

| o- acls .......................................................................................................... [ACLs: 0]

| o- luns .......................................................................................................... [LUNs: 2]

| | o- lun0 ..................................................................... [user/my_replicated_test (default_tg_pt_gp)]

| | o- lun1 ................................................ [block/rbd_iblock1 (/dev/rbd/rbd_pool/image2) (default_tg_pt_gp)]

| o- portals .................................................................................................... [Portals: 1]

| o- 10.0.1.111:3260 .................................................................................................. [OK]

o- loopback ......................................................................................................... [Targets: 0]

o- vhost ............................................................................................................ [Targets: 0]

o- xen-pvscsi ....................................................................................................... [Targets: 0]

在 2018年9月6日星期四 UTC+8上午10:50:59，3kbo...@gmail.com写道：

Mike Christie

unread,

Sep 11, 2018, 12:30:42 PM9/11/18

to open-...@googlegroups.com

Hey,

Cc mchr...@redhat.com, or I will not see these messages until I check
the list maybe once a week.

On 09/05/2018 10:36 PM, 3kbo...@gmail.com wrote:
> What lio fabric driver are you using? iSCSI? What kernel version
> and
> what version of tcmu-runner?
>
> io fabric driver : iscsi
>
> iscsid version: 2.0-873
>
> OS version: CentOS Linux release 7.5.1804 (Core)
>
> kernel version: 3.10.0-862.el7.x86_64
>
> tcmu-runner version: 1.4.0-rc1
>
>
> Target Build:
>
> targetcli /iscsi create iqn.2018-09.com.test:target1
>
> targetcli /backstores/user:rbd create name=my_replicated_test
> size=1000G cfgstring=rbd_pool/replicated_image1 hw_max_sectors=8192

That kernel has a default data area (kernel buffer used to pass scsi
command data) of 8MB, so with a command size of 8192 sectors you could
only 2 commands at once.

When you create the backstore pass it a new max_data_area_mb value like
this (make sure you have the newest rtslib-fb, configshell-fb and
targetcli-fb from the upstream github repos):

targetcli /backstores/user:rbd create name=my_replicated_test
size=1000G cfgstring=rbd_pool/replicated_image1 hw_max_sectors=8192

control=max_data_area_mb=32

This would increase the buffer to 32 MB.

Or for an existing setup add the control line in the saveconfig.json
between the config and hw_max_sectors line like this:

"config": "rbd/rbd_pool/replicated_image"
"control": "max_data_area_mb=32",
"hw_max_sectors": 8192,

Note that this will prealloocate 32 MBs of memory for the device.

>
> targetcli /iscsi/iqn.2018-09.com.test:target1/tpg1/luns create
> /backstores/user:rbd/my_replicated_test
>
> targetcli /iscsi/iqn.2018-09.com.test:target1/tpg1/portals create
> 10.0.1.111
>
> targetcli /iscsi/iqn.2018-09.com.test:target1/tpg1 set auth
> userid=****** password=******
>
> targetcli /iscsi/iqn.2018-09.com.test:target1/tpg1 set attribute
> authentication=1 demo_mode_write_protect=0 generate_node_acls=1
>
>
> Target Setting:
>
>
>

> > <mailto:open-iscsi+...@googlegroups.com>.

> > To post to this group, send email to open-...@googlegroups.com
> > <mailto:open-...@googlegroups.com>.
> > Visit this group at https://groups.google.com/group/open-iscsi

> <https://groups.google.com/group/open-iscsi>.

> > For more options, visit https://groups.google.com/d/optout

> <https://groups.google.com/d/optout>.

>
> --
> You received this message because you are subscribed to the Google
> Groups "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to open-iscsi+...@googlegroups.com

> <mailto:open-iscsi+...@googlegroups.com>.

Mike Christie

unread,

Sep 11, 2018, 12:39:16 PM9/11/18

to open-...@googlegroups.com

On 09/11/2018 11:30 AM, Mike Christie wrote:
> Hey,
>
> Cc mchr...@redhat.com, or I will not see these messages until I check
> the list maybe once a week.
>
> On 09/05/2018 10:36 PM, 3kbo...@gmail.com wrote:
>> What lio fabric driver are you using? iSCSI? What kernel version
>> and
>> what version of tcmu-runner?
>>
>> io fabric driver : iscsi
>>
>> iscsid version: 2.0-873
>>
>> OS version: CentOS Linux release 7.5.1804 (Core)
>>
>> kernel version: 3.10.0-862.el7.x86_64
>>
>> tcmu-runner version: 1.4.0-rc1
>>
>>

There is also a perf bug in that initiator if the node.session.cmds_max
is greater than the LIO default_cmdsn_depth and your IO test tries to
send cmds > node.session.cmds_max.

So set the node.session.cmds_max and default_cmdsn_depth to the same
value. You can set the default_cmdsn_depth in saveconfig.json, and set
cmds_max in the iscsiadm node db (after you set it make sure you logout
and login the session again).

3kbo...@gmail.com

unread,

Sep 12, 2018, 10:26:36 PM9/12/18

to open-iscsi

Thank you for your reply, Mike.

Now my iscsi disk performance can be around 300MB/s in 4M sequence write(TCMU+LIO)

It increase from 20MB/s to 300MB/s, after I can change max_data_area_mb from 8 to 256 && hw_max_sectors from 128 to 8192.

To my cluster, after a lot of tests I found that I should keep "max_data_area_mb>128 && hw_max_sectors>=4096" in order to get a good performance.

Does my setting can cause some side effects?

Are there any other parameters can improve the performance quite obvious?

Why the default value of max_data_area_mb and hw_max_sectors is so small, and bad performance?

Could you talk something about this?

At least, max_data_area_mb>128 && hw_max_sectors>=4096, I can get a better performance seems to be acceptable.

If my settings can give other users some help, I will be happy.

在 2018年9月12日星期三 UTC+8上午12:39:16，Mike Christie写道：

On 09/11/2018 11:30 AM, Mike Christie wrote:
> Hey,
>
> Cc mchr...@redhat.com, or I will not see these messages until I check
> the list maybe once a week.
>
> On 09/05/2018 10:36 PM, 3kbo...@gmail.com wrote:
>> What lio fabric driver are you using? iSCSI? What kernel version
>> and
>> what version of tcmu-runner?
>>
>> io fabric driver : iscsi
>>
>> iscsid version: 2.0-873
>>
>> OS version: CentOS Linux release 7.5.1804 (Core)
>>
>> kernel version: 3.10.0-862.el7.x86_64
>>
>> tcmu-runner version: 1.4.0-rc1
>>
>>

There is also a perf bug in that initiator if the node.session.cmds_max
is greater than the LIO default_cmdsn_depth and your IO test tries to
send cmds > node.session.cmds_max.

I have known the bug before, because I had google a lot.

It increase from 320MB/s to 340MB/s (4M seq write), seems like a stable promotion.

Settings Before: 320MB/s

node.session.cmds_max = 2048

default_cmdsn_depth = 64

Settings After: 340MB/s

node.session.cmds_max = 64

default_cmdsn_depth = 64

So set the node.session.cmds_max and default_cmdsn_depth to the same
value. You can set the default_cmdsn_depth in saveconfig.json, and set
cmds_max in the iscsiadm node db (after you set it make sure you logout
and login the session again).

Now I set set the node.session.cmds_max and default_cmdsn_depth both to be 64.

Thank you very much!

Ulrich Windl

unread,

Sep 13, 2018, 2:25:59 AM9/13/18

to open-iscsi

Hi!

I'm somewhat surprised: Maybe it's all about latency, because with FC-SAN we
typically see a performance _decrease_ if large sequential requests are being
transmitted. So actually we did limit the default amount of max_sectors_kb.
Most "intelligent" SAN systems break down large requests to small internal
requests (like 128kB) that are being handled, and the initiator gets an
acknowledge once all such requests internal have finised.

Regards,
Ulrich

>>> Mike Christie <mchr...@redhat.com> schrieb am 11.09.2018 um 18:30 in
Nachricht
<5B97EDA...@redhat.com>:

> email to open-iscsi+...@googlegroups.com.
> To post to this group, send email to open-...@googlegroups.com.

Mike Christie

unread,

Sep 20, 2018, 1:43:14 PM9/20/18

to open-...@googlegroups.com

On 09/12/2018 09:26 PM, 3kbo...@gmail.com wrote:
> Thank you for your reply, Mike.
> Now my iscsi disk performance can be around 300MB/s in 4M sequence
> write(TCMU+LIO)
> It increase from 20MB/s to 300MB/s, after I can change max_data_area_mb
> from 8 to 256 && hw_max_sectors from 128 to 8192.
> To my cluster, after a lot of tests I found that I should keep
> "max_data_area_mb>128 && hw_max_sectors>=4096" in order to get a good
> performance.
> Does my setting can cause some side effects?

It depends on the kernel. For the RHEL/Centos kernel you are using the
kernel will preallocate max_data_area_mb of memory for each device. For
upstream, we no longer preallocate, but once it is allocated we do not
free the memory unless global_max_data_area_mb is hit or the device is
removed.

With a high hw_max_sectors latency will increase due to sending really
large commands, so it depends on your workload and what you need.

We used to set hw_max_sectors to the rbd object size (4MB by default),
but in our testing we would see throughput go down around 512k - 1MB.

> Are there any other parameters can improve the performance quite obvious?

The normal networking ones like using jumbo frames,
net.core.*/net.ipv4.*, etc. Check your nic's documentation for the best
settings.

There are some iscsi ones like the cmdsn/cmds_max I mentioned and then
also the segment related ones like MaxRecvDataSegmentLength,
MaxXmitDataSegmentLength, MaxBurstLength, FirstBurstLength and have
ImmediateData on.

> Why the default value of max_data_area_mb and hw_max_sectors is so
> small, and bad performance?

I do not know. It was just what the original author had used initially.

> Could you talk something about this?
> At least, max_data_area_mb>128 && hw_max_sectors>=4096, I can get a
> better performance seems to be acceptable.
> If my settings can give other users some help, I will be happy.
>
> 在 2018年9月12日星期三 UTC+8上午12:39:16，Mike Christie写道：
>
> On 09/11/2018 11:30 AM, Mike Christie wrote:
> > Hey,
> >

> > Cc mchr...@redhat.com <javascript:>, or I will not see these

3kbo...@gmail.com

unread,

Sep 21, 2018, 10:16:58 AM9/21/18

to open-iscsi

Thank you for your opinion of max_sectors_kb, Ulrich.

It provided me some inspiration for understanding the whole thing.

Thank you for your advice, Mike.

I will tune some parameters as you have mentioned below.

I will share if I could make achievements in performance tuning.

在 2018年9月21日星期五 UTC+8上午1:43:14，Mike Christie写道：

> <mailto:open-iscsi+unsub...@googlegroups.com>.

Chan

unread,

Feb 15, 2019, 11:44:24 AM2/15/19

to open-iscsi

Hello

I am having very similar issue.

Scenario 1 ) When I mount a rbd volume directly using rbd map command to a linux kernel. I am getting good performance.

when I do dd on the above volume, I am getting 380 - 430 MB's. -- I am good with this.

Scenario 2 ) I am using iscsi-gateway. iscsi1 and iscs2 as targets and presenting LUNS ro all my initiators.

When I do dd on the volumes presented via iscsi gateway, my performance is 10 to 25 MB's. which is very poor.

I am gwcli commandline and connot use the belwo command

"create images image=test1 size=100g max_data_area_mb=128

Unexpected keyword parameter 'max_data_area_mb'

I also cannot use targetcli on this server. How can I increase the max_data_area_mb && hw_max_sectors sizes in my cluster ?

Thank you in advance

Reply all

Reply to author

Forward