Reconnects and I/O errors with Synology iSCSI targets

1,139 views
Skip to first unread message

Ancoron Luciferis

unread,
May 15, 2015, 12:31:44 PM5/15/15
to open-...@googlegroups.com
Hi,

I currently have some trouble with LUNs exposed by Synology boxes.

As of Linux kernel 3.19, the max_sectors_kb value always is 32767 for
any LUN, which results into errors on the Synology target side and
reconnects on the Linux initiator side:

On the Synology target:

iSCSI: Unable to allocate memory for iscsi_cmd_t->iov_data.
iSCSI: iSCSI: Close -
I[iqn.1993-08.org.debian:01:99bdf6552818][192.168.xxx.78],
T[iqn.2000-01.com.synology:yyy.mylun.7e92dd8012]
iSCSI: iSCSI: Single CHAP security negotiation completed sucessfully.
iSCSI: iSCSI: Login -
I[iqn.1993-08.org.debian:01:99bdf6552818][192.168.xxx.78],
T[iqn.2000-01.com.synology:yyy.mylun.7e92dd8012][192.168.xxx.91:3260]

On the Linux initiator side:

Kernel reported iSCSI connection 4:0 error (1020 -
ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
connection4:0: detected conn error (1020)
iscsid: connection4:0 is operational after recovery (1 attempts)


In some cases, e.g. when stacking a LUN with LUKS encryption, I also see
a lot of I/O errors at the client side (in this example, I created an
ext3 file-system on it):

sd 11:0:0:1: [sdj] UNKNOWN(0x2003) Result: hostbyte=0x0e driverbyte=0x00
sd 11:0:0:1: [sdj] CDB: opcode=0x2a 2a 00 00 00 1a 00 00 1e 08 00
blk_update_request: I/O error, dev sdj, sector 6656
Buffer I/O error on dev dm-0, logical block 64, lost async page write
Buffer I/O error on dev dm-0, logical block 65, lost async page write
Buffer I/O error on dev dm-0, logical block 66, lost async page write
Buffer I/O error on dev dm-0, logical block 67, lost async page write
Buffer I/O error on dev dm-0, logical block 68, lost async page write
Buffer I/O error on dev dm-0, logical block 69, lost async page write
Buffer I/O error on dev dm-0, logical block 70, lost async page write
Buffer I/O error on dev dm-0, logical block 71, lost async page write
Buffer I/O error on dev dm-0, logical block 72, lost async page write
Buffer I/O error on dev dm-0, logical block 73, lost async page write
sd 11:0:0:1: [sdj] UNKNOWN(0x2003) Result: hostbyte=0x0e driverbyte=0x00
sd 11:0:0:1: [sdj] CDB: opcode=0x2a 2a 00 3e 78 c5 50 00 40 00 00
blk_update_request: I/O error, dev sdj, sector 1048102224
...
buffer_io_error: 6835 callbacks suppressed


I suspect this is due to the fact that the Synology target does not
present any meaningful block limits, e.g.:

# sg_vpd -p bl /dev/sdj
Block limits VPD page (SBC):
Write same no zero (WSNZ): 0
Maximum compare and write length: 0 blocks
Optimal transfer length granularity: 0 blocks
Maximum transfer length: 0 blocks
Optimal transfer length: 0 blocks
Maximum prefetch length: 0 blocks
Maximum unmap LBA count: 0
Maximum unmap block descriptor count: 0
Optimal unmap granularity: 0
Unmap granularity alignment valid: 0
Unmap granularity alignment: 0
Maximum write same length: 0x0 blocks


What bothers me most is that "Maximum transfer length" is set to zero,
which from my understanding of the spec means that no limits apply,
hence the Linux initiator is correctly not applying any limit here.

However, at the target side, I can see the following limits:

For block-level LUNs:
# cat /sys/kernel/config/target/core/.../attrib/hw_max_sectors
128
# cat /sys/kernel/config/target/core/.../attrib/max_sectors
128

...and for file-level LUNs:
# cat /sys/kernel/config/target/core/.../attrib/hw_max_sectors
1024
# cat /sys/kernel/config/target/core/.../attrib/max_sectors
1024

According to a sector size of 512 bytes, I should be safe with 64
respectively 512 for the value of max_sectors_kb. I have tested
extensively with heavy load and different scenarios (e.g. formatting,
iozone single and multi-threaded tests) that after lowering that value
on the initiator side seems to resolve the problem.

However, while testing I also came up with the following limits for
max_sectors_kb:

fileio: 4096
iblock: 512

...which to me does not make much sense, except if the sector size would
be 4096, which it isn't (at least not reported):
$ cat /sys/block/sdj/queue/hw_sector_size
512

So, I am puzzled. Can someone help me making sense out of this?

Btw., the DI VPD of the LUNs are:

# sg_vpd -p di_lu /dev/sdj
Device Identification VPD page:
Addressed logical unit:
designator type: NAA, code set: Binary
0x600140520cc460dd155ed32b5db631d3
designator type: T10 vendor identification, code set: ASCII
vendor id: SYNOLOGY
vendor specific: iSCSI Storage:20cc460d-155e-32b5-b631-3e5d006efbd2
designator type: Logical unit group, code set: Binary
Logical unit group: 0x0

Is there a way to _configure_ attributes for specific SCSI devices?


Thanx for any help! :)


Regards,

Ancoron

Michael Christie

unread,
May 15, 2015, 12:54:29 PM5/15/15
to open-...@googlegroups.com

On May 7, 2015, at 7:06 AM, Ancoron Luciferis <ancoron....@googlemail.com> wrote:

> Hi,
>
> I currently have some trouble with LUNs exposed by Synology boxes.
>
> As of Linux kernel 3.19, the max_sectors_kb value always is 32767 for
> any LUN, which results into errors on the Synology target side and
> reconnects on the Linux initiator side:
>


As you saw, in the newer kernel the block/scsi layer will now send requests up to the size the target or initiator say they can handle. In your case, because the target is reporting zeros, the scsi layer is only using the initiator limit which is that 32K.

It seems there is a bug with lio based targets. In this patch for Qnap targets which used lio, we just added a new black list flag so we use the old default value because we could not contact the vendor to get more info as it is not always as simple as just matching the max_hw_sectors/max_sectors on the target side:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi?id=35e9a9f93994d7f7d12afa41169c7ba05513721b

You can normally manually override the max sectors setting, but that is broken under some conditions. It is fixed with this patch

http://marc.info/?l=linux-scsi&m=142864974021809&w=3

Could you send me the vendor/product info, so we can add your target to the blacklist?

Ancoron Luciferis

unread,
May 16, 2015, 2:17:05 AM5/16/15
to open-...@googlegroups.com
Hi,

thanx for the info. I currently have the following devices at my
disposal for testing:

* Synology DS409+
* Synology DS1813+
* Synology DS1815+

All LUNs created on these devices are identified by the following strings:

Vendor: SYNOLOGY
Product: iSCSI Storage
Revision: 3.1

As there is no reliable way to differentiate the block-level from the
file-level LUNs on the initiator side (at least I can't see any reliable
difference in device identification data), setting max_sectors_kb to 512
should be a safe value for Synology LUNs (1024 is too much for its
block-level LUNs).

If you'd like me to collect some more data, please say so.


Cheers,

Ancoron

Mike Christie

unread,
May 18, 2015, 3:53:01 PM5/18/15
to open-...@googlegroups.com
1024 is actually what we have been using for as long as lio based
targets have existed, so it should work fine. Did it fail before?

It looks like lio based targets would break up a command that is larger
than the max sectors value you were looking at. The issue with really
large IOs we are see now seems to be trying to allocate enough resources
sometimes.

Ancoron Luciferis

unread,
May 18, 2015, 7:16:45 PM5/18/15
to open-...@googlegroups.com
Well, I thought 1024 blocks with 512 bytes each would result into a
max_sectors_kb of 512, or? So exactly what I got empirically as the
maximum for the Synology block-level LUNs. :)

>
> It looks like lio based targets would break up a command that is larger
> than the max sectors value you were looking at. The issue with really
> large IOs we are see now seems to be trying to allocate enough resources
> sometimes.
>

Yes, that seems o go in line what I see at the newer target-side:
"Unable to allocate memory for iscsi_cmd_t->iov_data"

However, what I found quite strange are the numbers I see for
hw_max_sectors at the target side, which for me don't fit into my math,
as they are 1024 for file-level LUNs and 128 for block-level, meaning
the real safe max_sectors_kb would have been 64. Nevertheless, I've
stress-tested multiple LUNs for some days now without any issue at 512
kb/1024 sectors.

If there is anything I can run to further pin-point the issue or provide
other useful information, let me know.


Cheers,

Ancoron

Mike Christie

unread,
May 19, 2015, 5:26:42 PM5/19/15
to open-...@googlegroups.com
You're right. I misread your mail.



>
>>
>> It looks like lio based targets would break up a command that is larger
>> than the max sectors value you were looking at. The issue with really
>> large IOs we are see now seems to be trying to allocate enough resources
>> sometimes.
>>
>
> Yes, that seems o go in line what I see at the newer target-side:
> "Unable to allocate memory for iscsi_cmd_t->iov_data"
>
> However, what I found quite strange are the numbers I see for
> hw_max_sectors at the target side, which for me don't fit into my math,
> as they are 1024 for file-level LUNs and 128 for block-level, meaning
> the real safe max_sectors_kb would have been 64. Nevertheless, I've
> stress-tested multiple LUNs for some days now without any issue at 512
> kb/1024 sectors.

Ok.

>
> If there is anything I can run to further pin-point the issue or provide
> other useful information, let me know.
>

Attached is the patch I am going to submit upstream if I do not hear
back from Synology to see if they have a fix in a newer firmware
already. I do not have a good contact there, so I am just going to wait
a day or so, then send it upstream if I do not get a reply.

blist-synology-iscsi.patch

Ancoron Luciferis

unread,
May 20, 2015, 2:28:35 AM5/20/15
to open-...@googlegroups.com
Thank you! The patch works just fine.


tom.st...@one-sightsolutions.com

unread,
Aug 12, 2016, 5:45:25 PM8/12/16
to open-iscsi
Hi there,

I am experiencing the same problem for HP P2000 G3 10Gbit iSCSI targets. Can this be included in the patch, and how would one apply said patch?
Reply all
Reply to author
Forward
0 new messages