libiscsi and SCST

371 views
Skip to first unread message

klaus.ho...@gmail.com

unread,
Sep 25, 2013, 12:27:51 PM9/25/13
to libi...@googlegroups.com
Hi,

I'm currently testing qemu+libiscsi with SCST.
I ran the test suite and had some problems with the newest git branch:

When turning on HeaderDigest it hung at this test forever:
> Test: ColdReset
And on the target I saw this message:
> iscsi-scst: ***ERROR***: RX header digest failed
> iscsi-scst: ***ERROR***: rx header digest for initiator iqn.2006-10.corp.test:testsrv01 failed (-5)


Then I turned off Header digest and then the hanging test worked, but I got this message on the target side:
> scst: ***ERROR***: Expected data direction 1 for opcode 0x2f (handler vdisk_fileio, target iscsi) doesn't match decoded value 4

I also installed an Ubuntu 12.04 client using libiscsi and ran bonnie++

Here's a collection of target side messages I got (all with HeaderDigest turned on):
> iscsi-scst: ***ERROR***: Timeout 30 sec sending data/waiting for reply to/from initiator iqn.2007-10.com.github:sahlberg:libiscsi:iscsi-test (SID 15390000a97dd380), closing connection
> scst: ***ERROR***: Refusing unknown opcode 89
> scst: ***ERROR***: Refusing unknown opcode 8b

Also the guest freezed when running bonnie++ after the first iteration.

As attachment I'm sending the test results.
On for version 1.9.0 and for the latest branch.

Thanks,
Klaus
libiscsi.1.9.0.test-cu.txt
libiscsi.branch.test-cu.txt

ronnie sahlberg

unread,
Sep 25, 2013, 10:51:18 PM9/25/13
to libi...@googlegroups.com
It looks like SCST needs some work here,
It is probably best if we just do a couple of tests that fail at a
time so that we can take bite sized chunks at this.

The first bunch of failures you get are for the OrWrite tests.
OrWrite is an optional opcode in SBC and often only implemented in
higher end arrays but seldom in user grade disks or non-enterprise
gear.

You also have this in your email :
> scst: ***ERROR***: Refusing unknown opcode 8b

Opcode 0x8b IS OrWrite so I assume what happens here is that SCST just
does not implement this opcode but it fails to respond correctly back
to the test tool/initiator.


I patched STGT so that it no longer supports OrWrite and when running
one of the tests without OrWrite support, this is what is supposed to
happen :
$ ./bin/iscsi-test-cu iscsi://127.0.0.1/iqn.ronnie.test/1 --dataloss
--test SCSI.OrWrite.Simple -V



CUnit - A Unit testing framework for C - Version 2.1-0
http://cunit.sourceforge.net/


Suite: OrWrite
Test: Simple ...
Test ORWRITE of 1-256 blocks at the start of the LUN
Send ORWRITE LBA:0 blocks:1 wrprotect:0 dpo:0 fua:0 fua_nv:0 group:0
[SKIPPED] ORWRITE is not implemented.
[SKIPPED] ORWRITE is not implemented.
passed

--Run Summary: Type Total Ran Passed Failed
suites 1 1 n/a 0
tests 1 1 1 0
asserts 1 1 1 0
Tests completed with return value: 0


I.e. the test will still pass even if the target does not support this
opcode, it will pass as skipped.
But this depends on the target responding correctly and it does not
look like SCST does.

IF SCST can not handle an opcode or if it is missing from SCST it
should respond with
CHECK_CONDITION
ILLEGAL_REQUEST
INVALID_OPERATION_CODE

I cant tell what SCST returns here but you could just run this test
and capture a network trace in wireshark and see what happens.
So this is something you should fix in SCST.



The prefetch10/16 flags tests test that a target can handle the IMMED
bit and the GROUP field in the cdb.
Here it looks like SCST fails the opcode if any of these are set.
Again, you probably want to look at a network trace and see what SCST
responds with and perhaps
check what does SCST do with the IMMED and GROUP flags.


Test: testRead10ReadProtect ... FAILED
This one sends rdprotect != 0 in the CDB. If the medium is not
formatted with protectiuon information then the target must fail these
commands. I bet SCST does not check the rdprotect field.
There are similar tests for all other READ* commands, as well as
WRITE* VERIFY* etc.
If it is wrong in Read10 it is probably wrong in all of them.


Test: testRead10Invalid ... FAILED
This probably means you dont return residuals correctly for
overflow/underflow. This is common.
But you need a network trace to find out what is wrong.


Start with these ones and see if they can be fixed.



Since there are so many failures you should probably just run a single
test at a time.
Add the -V flag when you run the test. It will make it print a lot
more verbose information
and will often give hints on what is wrong with the target.


regards
ronnie sahlberg
> --
> You received this message because you are subscribed to the Google Groups
> "libiscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to libiscsi+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

ronnie sahlberg

unread,
Sep 25, 2013, 11:33:25 PM9/25/13
to libi...@googlegroups.com
Klaus,

Can you run the prefetch10 flags tests and take a network trace of it
and send me?
SCST might not be at fault for the IMMED==1 failure, this might be a
test tool bug.

The standard allows a target to respond with either GOOD or CONDITIONS_MET :

===
If the IMMED bit is set to one and the cache has sufficient capacity
to accept all of the specified logical blocks,
then the device server shall complete the command with CONDITION MET status.
If the IMMED bit is set to one and the cache does not have sufficient
capacity to accept all of the specified
logical blocks, then the device server shall complete the command with
GOOD status.
===

So far I have not seen devices return CONDITIONS_MET and if a target
would respond with it the current test would fail.
I will change the test to allow CONDITIONS_MET but I really would like
to see if SCST responds with this since that network trace will go
into my collection.

regards
ronnie sahlberg

Klaus Hochlehnert

unread,
Sep 26, 2013, 1:50:43 PM9/26/13
to libi...@googlegroups.com
Hi,

I ran the OrWrite test again (with HeaderDigest turned off):
Suite: OrWrite
  Test: Simple ... 
    Test ORWRITE of 1-256 blocks at the start of the LUN
    Send ORWRITE LBA:0 blocks:1 wrprotect:0 dpo:0 fua:0 fua_nv:0 group:0
    [SKIPPED] ORWRITE is not implemented.
    [SKIPPED] ORWRITE is not implemented.
passed

--Run Summary: Type      Total     Ran  Passed  Failed
               suites        1       1     n/a       0
               tests         1       1       1       0
               asserts       1       1       1       0
Tests completed with return value: 0


On the target it prints this:
scst: ***ERROR***: Refusing unknown opcode 8b

Which looks ok to me.


The prefetch tests are like this:
Suite: Prefetch10
  Test: Simple ... 
    Test PREFETCH10 of 1-256 blocks at the start of the LUN
    Send PREFETCH10 LBA:0 blocks:1 immed:0 group:0
    [SKIPPED] PREFETCH10 is not implemented on target
    [SKIPPED] PREFETCH10 is not implemented.
passed
  Test: BeyondEol ... 
    Test PREFETCH10 1-256 blocks one block beyond the end
    Send PREFETCH10 (Expecting LBA_OUT_OF_RANGE) LBA:10485760 blocks:1 immed:0 group:0
    [SKIPPED] PREFETCH10 is not implemented on target
    [SKIPPED] PREFETCH10 is not implemented.
passed
  Test: ZeroBlocks ... 
    Test PREFETCH10 0-blocks at LBA==0
    Send PREFETCH10 LBA:0 blocks:0 immed:0 group:0
    [SKIPPED] PREFETCH10 is not implemented on target
    [SKIPPED] PREFETCH10 is not implemented.
passed
  Test: Flags ... 
    Test PREFETCH10 flags
    Test PREFETCH10 with IMMED==1
    Send PREFETCH10 LBA:0 blocks:1 immed:1 group:0
    [SKIPPED] PREFETCH10 is not implemented on target
    [SKIPPED] PREFETCH10 is not implemented.
passed

--Run Summary: Type      Total     Ran  Passed  Failed
               suites        1       1     n/a       0
               tests         4       4       4       0
               asserts       4       4       4       0
Tests completed with return value: 0



Regards, Klaus

Klaus Hochlehnert

unread,
Sep 26, 2013, 3:36:48 PM9/26/13
to libi...@googlegroups.com
Hi,

I nailed down all failing tests now, ran them again and checked what scst responded in the syslog.
Only entries which didn't look like duplicates or the same entry just with different parameters were added.
And a "-" indicates no log entry.

I'll also send this to the SCST mailing list...


Suite: iSCSIResiduals
  Test: Read10Invalid
    scst: ***ERROR***: Expected data direction 1 for opcode 0x28 (handler vdisk_fileio, target iscsi) doesn't match decoded value 2

  Test: Write10Residuals
    -
  Test: Write12Residuals
    -
  Test: Write16Residuals
    -
  Test: WriteVerify10Residuals
    -
  Test: WriteVerify12Residuals
    -
  Test: WriteVerify16Residuals
    -


Suite: Read10
  Test: ReadProtect
    -


Suite: Read12
  Test: ReadProtect
    -


Suite: Read16
  Test: BeyondEol
    dev_vdisk: Access beyond the end of the device (5368709120 of 5368709120, len 512)
    dev_vdisk: Access beyond the end of the device (-512 of 5368709120, len 512)
    dev_vdisk: Access beyond the end of the device (5368708608 of 5368709120, len 1024)

  Test: ZeroBlocks
    dev_vdisk: Access beyond the end of the device (5368709632 of 5368709120, len 0)
    dev_vdisk: Access beyond the end of the device (-512 of 5368709120, len 0)

  Test: ReadProtect
    -


Suite: Reserve6
  Test: 2Initiators
    -

Suite: StartStopUnit
  Test: Simple
    -


Suite: Verify10
  Test: Simple
    scst: ***ERROR***: Expected data direction 1 for opcode 0x2f (handler vdisk_fileio, target iscsi) doesn't match decoded value 4

  Test: BeyondEol
    scst: ***ERROR***: Expected data direction 1 for opcode 0x2f (handler vdisk_fileio, target iscsi) doesn't match decoded value 4

  Test: VerifyProtect
    scst: ***ERROR***: Expected data direction 1 for opcode 0x2f (handler vdisk_fileio, target iscsi) doesn't match decoded value 4

  Test: Flags
    scst: ***ERROR***: Expected data direction 1 for opcode 0x2f (handler vdisk_fileio, target iscsi) doesn't match decoded value 4

  Test: Mismatch
    scst: ***ERROR***: Expected data direction 1 for opcode 0x2f (handler vdisk_fileio, target iscsi) doesn't match decoded value 4


Suite: Verify12
  Test: Simple
    scst: ***ERROR***: Expected data direction 1 for opcode 0xaf (handler vdisk_fileio, target iscsi) doesn't match decoded value 4

  Test: BeyondEol
    scst: ***ERROR***: Expected data direction 1 for opcode 0xaf (handler vdisk_fileio, target iscsi) doesn't match decoded value 4

  Test: VerifyProtect
    scst: ***ERROR***: Expected data direction 1 for opcode 0xaf (handler vdisk_fileio, target iscsi) doesn't match decoded value 4
  
  Test: Flags
    scst: ***ERROR***: Expected data direction 1 for opcode 0xaf (handler vdisk_fileio, target iscsi) doesn't match decoded value 4
  
  Test: Mismatch
    scst: ***ERROR***: Expected data direction 1 for opcode 0xaf (handler vdisk_fileio, target iscsi) doesn't match decoded value 4


Suite: Verify16
  Test: Simple
    scst: ***ERROR***: Expected data direction 1 for opcode 0x8f (handler vdisk_fileio, target iscsi) doesn't match decoded value 4

  Test: BeyondEol
    scst: ***ERROR***: Expected data direction 1 for opcode 0x8f (handler vdisk_fileio, target iscsi) doesn't match decoded value 4
  
  Test: ZeroBlocks
    scst: ***ERROR***: Expected data direction 1 for opcode 0x8f (handler vdisk_fileio, target iscsi) doesn't match decoded value 4
  
  Test: VerifyProtect
    scst: ***ERROR***: Expected data direction 1 for opcode 0x8f (handler vdisk_fileio, target iscsi) doesn't match decoded value 4
  
  Test: Flags
    scst: ***ERROR***: Expected data direction 1 for opcode 0x8f (handler vdisk_fileio, target iscsi) doesn't match decoded value 4
  
  Test: Mismatch
    scst: ***ERROR***: Expected data direction 1 for opcode 0x8f (handler vdisk_fileio, target iscsi) doesn't match decoded value 4


Suite: Write10
  Test: WriteProtect
    -


Suite: Write12
  Test: WriteProtect
    -


Suite: Write16
  Test: BeyondEol
    dev_vdisk: Access beyond the end of the device (5368709120 of 5368709120, len 512)
    dev_vdisk: Access beyond the end of the device (-512 of 5368709120, len 512)
    dev_vdisk: Access beyond the end of the device (5368708608 of 5368709120, len 1024)
  
  Test: ZeroBlocks
    dev_vdisk: Access beyond the end of the device (5368709632 of 5368709120, len 0)
    dev_vdisk: Access beyond the end of the device (-512 of 5368709120, len 0)

  Test: WriteProtect
    -


Suite: WriteSame10
  Test: ZeroBlocks
    scst: ***ERROR***: Expected data direction 4 for opcode 0x41 (handler vdisk_fileio, target iscsi) doesn't match decoded value 1


Suite: WriteSame16
  Test: ZeroBlocks
    scst: ***ERROR***: Expected data direction 4 for opcode 0x93 (handler vdisk_fileio, target iscsi) doesn't match decoded value 1


Suite: WriteVerify10
  Test: WriteProtect
    -


Suite: WriteVerify12
  Test: WriteProtect
    -


Suite: WriteVerify16
  Test: BeyondEol
    dev_vdisk: Access beyond the end of the device (5368709120 of 5368709120, len 512)
    dev_vdisk: Access beyond the end of the device (-512 of 5368709120, len 512)
    dev_vdisk: Access beyond the end of the device (5368708608 of 5368709120, len 1024)

  Test: ZeroBlocks
    dev_vdisk: Access beyond the end of the device (5368709632 of 5368709120, len 0)
    dev_vdisk: Access beyond the end of the device (-512 of 5368709120, len 0)

  Test: WriteProtect
    -


Thanks and regards,
Klaus


Am Mittwoch, 25. September 2013 18:27:51 UTC+2 schrieb Klaus Hochlehnert:

ronnie sahlberg

unread,
Sep 26, 2013, 9:37:28 PM9/26/13
to libi...@googlegroups.com
I just pushed a fix for the header digest errors.

Basically, when there is a session failure libiscsi will reconnect the
session and re-send all commands that were in flight.
This all happens in lib/connect.c:iscsi_reconnect():348

When we re-queue the PDU, we also have to modify the InitiatorTaskTag,
the CmdSN and the StatSN so that they now
match the expected values on the new, reconnected, and this changes
the header (but if there was header digest, we never updated it).

So if header digest was used the PDU would now be invalid since we
sill used the original header digest.


So the fix is simply to change which function we use to re-queue the
PDU to the one that will re-calculate the digest.


Thanks for finding this!
Since this is a pretty bad bug for those that use header digest I will
likely make a new release over the coming weekend.


regards
ronnei sahlberg


On Wed, Sep 25, 2013 at 8:33 PM, ronnie sahlberg
Reply all
Reply to author
Forward
0 new messages