ESX 3.5 errors

92 views
Skip to first unread message

Chris

unread,
Feb 8, 2008, 10:41:29 AM2/8/08
to Linux-iSCSI.org Target Development
my setup:
I am using Ubuntu Gutsy with my own compile of 2.6.23 kernel (I'm also
testing iscsi-scst which works better with a vanilla kernel). I have
install the gutsy deb for target-utils and compiled the driver source
deb with m-a.
I have created an install.target based on this
http://linux-iscsi.org/builds/conf/install.target but with my
initiator names and only one hba and lun defined

I created a 20GB sparse file with: dd of=testfile count=0 bs=4K
seek=5M
and added it with the createvirtdev (I'm not sure if creating the
sparse file is necessary, is it?)

I first tested with Windows 2k3 x64 and MS's initiator and it worked
good.

getting some errors with vmware ESX 3.5. the first error on the
target is:
Feb 8 09:33:21 file1 kernel: [ 898.900540] iscsi_handle_scsi_cmd:
1479: ***ERROR*** R_BIT or W_BIT set when Expected Data Transfer
Length is 0. Bad iSCSI Initiator.

after about 50 automatic retries it did finally connect without this
error, though I also get these errors on every connect but it seems to
work anyway:
Feb 8 09:33:23 file1 kernel: [ 900.901910] iscsi_find_param_from_key:
789: ***ERROR*** Unable to locate key "X-com.cisco.PingTimeout".
Feb 8 09:33:23 file1 kernel: [ 900.901916] iscsi_find_param_from_key:
789: ***ERROR*** Unable to locate key "X-com.cisco.sendAsyncText".
Feb 8 09:33:23 file1 kernel: [ 900.901921] iscsi_find_param_from_key:
789: ***ERROR*** Unable to locate key "X-com.cisco.protocol".

I was also getting regular errors, but only after creating a vmfs on
the LUN, on my esx vmkernel log with no corresponding error on the
target and it works despite these errors:
Feb 8 08:51:26 esx1 vmkernel: 5:09:22:35.154 cpu2:1122)iSCSI: bus 0
target 3 trying to establish session 0x9e180a0 to portal 0, address
127.0.0.1 port 3261 group 1
Feb 8 08:51:26 esx1 vmkernel: 5:09:22:35.154 cpu2:1122)iSCSI: session
0x9e180a0 to iqn.2003-01.org.linux-iscsi.file1.i686:sn.ab90cd162ef1
failed to connect, rc -5, I/O error
Feb 8 08:51:26 esx1 vmkernel: 5:09:22:35.154 cpu2:1122)iSCSI: session
0x9e180a0 connect failed at 46575518
Feb 8 08:51:26 esx1 vmkernel: 5:09:22:35.154 cpu2:1122)<5>iSCSI:
session 0x9e180a0 iSCSI: session 0x9e180a0 retrying all the portals
again, since the portal list got exhausted
Feb 8 08:51:26 esx1 vmkernel: 5:09:22:35.154 cpu2:1122)iSCSI: session
0x9e180a0 to iqn.2003-01.org.linux-iscsi.file1.i686:sn.ab90cd162ef1
waiting 60 seconds before next login attempt


I then noticed in the esx config that it thought the path had failed
for the lun, so I did a rescan and here's the log from ESX:
Feb 8 09:00:50 esx1 vmkernel: 5:09:31:59.393 cpu1:1037)ScsiScan: 395:
Path 'vmhba32:C0:T2:L0': Vendor: 'iSCSI ' Model: 'DISK
' Rev: '0 '
Feb 8 09:00:50 esx1 vmkernel: 5:09:31:59.393 cpu1:1037)ScsiScan: 396:
Type: 0x0, ANSI rev: 4
Feb 8 09:00:50 esx1 vmkernel: 5:09:31:59.394 cpu1:1037)ScsiScan: 395:
Path 'vmhba32:C0:T4:L0': Vendor: 'SBEi-INC' Model: 'FILEIO' Rev:
'v2.8'
Feb 8 09:00:50 esx1 vmkernel: 5:09:31:59.394 cpu1:1037)ScsiScan: 396:
Type: 0x0, ANSI rev: 2
Feb 8 09:00:50 esx1 vmkernel: 5:09:31:59.395 cpu1:1037)WARNING:
ScsiUid: 550: Path 'vmhba32:C0:T4:L0' : supports ANSI version
'SCSI-2' (0x2). In order to be used with ESX a device must support the
SCSI 3 protocol.
Feb 8 09:00:50 esx1 vmkernel: 5:09:31:59.395 cpu1:1037)ScsiScan: 516:
Path 'vmhba32:C0:T4:L0': No standard UID: Failure
Feb 8 09:00:50 esx1 vmkernel: 5:09:31:59.403 cpu0:1035)SCSI: 861:
GetInfo for adapter vmhba32, [0x3f8bdd80], max_vports=0,
vports_inuse=0, linktype=0, state=0, failreason=0, rv=-1, sts=bad001f
Feb 8 09:00:50 esx1 vmkernel: 5:09:31:59.404 cpu0:1035)iSCSI: session
0x9e180a0 replacement timed out, failing to queue 0x3d20bd80 cdb 0xa0
and any following commands to (0 0 3 0), iqn.2003-01.org.linux-
iscsi.file1.i686:sn.ab90cd162ef1
Feb 8 09:00:50 esx1 vmkernel: 5:09:31:59.419 cpu1:1036)SCSI: 861:
GetInfo for adapter vmhba32, [0x3f8bdd80], max_vports=0,
vports_inuse=0, linktype=0, state=0, failreason=0, rv=-1, sts=bad001f
Feb 8 09:00:50 esx1 vmkernel: VMWARE SCSI Id: Supported VPD pages for
vmhba32:C0:T2:L0 : 0x0 0x80 0x83
Feb 8 09:00:50 esx1 vmkernel: VMWARE SCSI Id: Device id info for
vmhba32:C0:T2:L0: 0x1 0x1 0x0 0x18 0x69 0x53 0x43 0x53 0x49 0x0 0x0
0x0 0x34 0x71 0x6a 0x43 0x48 0x39 0x4e 0x4d 0x33 0x76 0x41 0x65 0x72
0x72 0x69 0x49
Feb 8 09:00:50 esx1 vmkernel: VMWARE SCSI Id: Id for vmhba32:C0:T2:L0
0x20 0x20 0x20 0x20 0x44 0x49 0x53 0x4b 0x20 0x20
Feb 8 09:00:50 esx1 vmkernel: VMWARE SCSI Id: Supported VPD pages for
vmhba32:C0:T4:L0 : 0x0 0x80 0x83
Feb 8 09:00:50 esx1 vmkernel: VMWARE SCSI Id: Device id info for
vmhba32:C0:T4:L0: 0x2 0x1 0x0 0x22 0x53 0x42 0x45 0x69 0x2d 0x49 0x4e
0x43 0x46 0x49 0x4c 0x45 0x49 0x4f 0x3a 0x73 0x6e 0x2e 0x61 0x62 0x39
0x30 0x63 0x64 0x31 0x36 0x32 0x65 0x66 0x31 0x3a 0x30 0x5f 0x30
Feb 8 09:00:50 esx1 vmkernel: VMWARE SCSI Id: Id for vmhba32:C0:T4:L0
0x73 0x6e 0x2e 0x61 0x62 0x39 0x30 0x63 0x64 0x31 0x36 0x32 0x65 0x66
0x31 0x3a 0x30 0x5f 0x30 0x46 0x49 0x4c 0x45 0x49 0x4f

and now, for seemingly no reason again, the path has failed, and the
target server seems to have crashed though the console is logging
reserve and releases from the esx server.

I was running iometer from a win2k3 guest on esx 3.5 with a 2nd
virtual disk added that's on the LIO virtdisk lun which is on an XFS
mounted LVM volume on a 3ware7504 RAID5 of 4 older 120GB IDE disks.
IOMeter test is 4 workers of max 64 sectos, 16 outstanding io per
target, 32K block size, 75% read sequential and it was holding very
steady at 113MB/s total and 3600 IOps. this is actually a bit better
than I would have expect from these disks, guessing some target side
caching is in play?

maza...@bellsouth.net

unread,
Feb 8, 2008, 1:53:20 PM2/8/08
to Linux-iSCSI.org Target Development
Does this look like the same thing to you (only you are using
EX3.5)? At first glance, It does to me.

http://groups.google.com/group/linux-iscsi-target-dev/browse_thread/thread/139339d30d71e836?hl=en

-Mike

Chris Weiss

unread,
Feb 8, 2008, 2:24:04 PM2/8/08
to linux-iscsi...@googlegroups.com


maybe, maybe not. ESX uses a vmware-modified cisco initiator, and the
bulk of my issues happen before guests are even powered on. I'm also
not sure what workstation has to do with iscsi, it has no iscsi
support, so the issue there is either with guests disks on a mounted
LUN or with a guest using a LUN as a raw device, and "bringing in
storage LUNs into the guest" isn't really very clear. With ESX, I'm
basically doing the former.

with my tests on scst, it has been said many times that vmware is
doing "bad" things, but if that's the case, then how to do they work
with other target products? do these product just allow the bad
practices, or is vmware doing these things because there commercial
targets are bad as well? And since vmware doesn't seem willing to
work with the community on this, where do we go? iscsi HBA's?

I'm not fond of the idea of telling my boss I need to spend $800 on an
HBA in order to prove, or not, that vmware has a broken initiator.
And then what if I do prove it? then what? vmware won't fix it
because my targets aren't on the HCL, so i have to buy more HBA's?

also, I haven't reset my locked up server yet because the disks are
still active, even though the guest that was using them is now powered
off and even the console is unreposonsive. whatever went wrong went
horribly wrong.

Chris Weiss

unread,
Feb 8, 2008, 3:01:18 PM2/8/08
to Linux-iSCSI.org Target Development
disks settles, reset target, back to this error trying to connect from ESX

Feb 8 13:59:00 file1 kernel: [ 1540.573829] Received iSCSI login
request from 10.254.0.3 on TCP Network Portal 10.254.0.99:3260
Feb 8 13:59:00 file1 kernel: [ 1540.573899] Located Storage Object:
iqn.2003-01.org.linux-iscsi.file1.i686:sn.ab90cd162ef1
Feb 8 13:59:00 file1 kernel: [ 1540.573904] Located Portal Group Object: 1
Feb 8 13:59:00 file1 kernel: [ 1540.573906] Set np->np_login_tpg to c1a58100
Feb 8 13:59:00 file1 kernel: [ 1540.574013]


iscsi_find_param_from_key:789: ***ERROR*** Unable to locate key
"X-com.cisco.PingTimeout".

Feb 8 13:59:00 file1 kernel: [ 1540.574019]


iscsi_find_param_from_key:789: ***ERROR*** Unable to locate key
"X-com.cisco.sendAsyncText".

Feb 8 13:59:00 file1 kernel: [ 1540.574024]


iscsi_find_param_from_key:789: ***ERROR*** Unable to locate key
"X-com.cisco.protocol".

Feb 8 13:59:00 file1 kernel: [ 1540.574071]
------------------------------------------------------------------
Feb 8 13:59:00 file1 kernel: [ 1540.574074] HeaderDigest: None
Feb 8 13:59:00 file1 kernel: [ 1540.574076] DataDigest: None
Feb 8 13:59:00 file1 kernel: [ 1540.574079] MaxRecvDataSegmentLength:
131072
Feb 8 13:59:00 file1 kernel: [ 1540.574082] IFMarker: No
Feb 8 13:59:00 file1 kernel: [ 1540.574084] OFMarker: No
Feb 8 13:59:00 file1 kernel: [ 1540.574090]
------------------------------------------------------------------
Feb 8 13:59:00 file1 kernel: [ 1540.574096]
------------------------------------------------------------------
Feb 8 13:59:00 file1 kernel: [ 1540.574101] MaxConnections: 1
Feb 8 13:59:00 file1 kernel: [ 1540.574104] TargetName:
iqn.2003-01.org.linux-iscsi.file1.i686:sn.ab90cd162ef1
Feb 8 13:59:00 file1 kernel: [ 1540.574107] InitiatorName:
iqn.1998-01.com.vmware:esx1-0dc0d7c5
Feb 8 13:59:00 file1 kernel: [ 1540.574110] TargetAlias: file1
Feb 8 13:59:00 file1 kernel: [ 1540.574112] InitiatorAlias:
esx1.wilsonmfg.com
Feb 8 13:59:00 file1 kernel: [ 1540.574114] TargetPortalGroupTag: 1
Feb 8 13:59:00 file1 kernel: [ 1540.574116] InitialR2T: No
Feb 8 13:59:00 file1 kernel: [ 1540.574118] ImmediateData: Yes
Feb 8 13:59:00 file1 kernel: [ 1540.574120] MaxBurstLength:
262144
Feb 8 13:59:00 file1 kernel: [ 1540.574122] FirstBurstLength: 65536
Feb 8 13:59:00 file1 kernel: [ 1540.574124] DefaultTime2Wait: 2
Feb 8 13:59:00 file1 kernel: [ 1540.574126] DefaultTime2Retain: 0
Feb 8 13:59:00 file1 kernel: [ 1540.574128] MaxOutstandingR2T: 1
Feb 8 13:59:00 file1 kernel: [ 1540.574130] DataPDUInOrder: Yes
Feb 8 13:59:00 file1 kernel: [ 1540.574132] DataSequenceInOrder: Yes
Feb 8 13:59:00 file1 kernel: [ 1540.574134] ErrorRecoveryLevel: 0
Feb 8 13:59:00 file1 kernel: [ 1540.574136] SessionType:
Normal
Feb 8 13:59:00 file1 kernel: [ 1540.574138]
------------------------------------------------------------------
Feb 8 13:59:00 file1 kernel: [ 1540.574150] iSCSI Login successful on
CID: 0 from 10.254.0.3 to 10.254.0.99:3260,1
Feb 8 13:59:00 file1 kernel: [ 1540.574153] Incremented iSCSI
Connection count to 1 from node: iqn.1998-01.com.vmware:esx1-0dc0d7c5
Feb 8 13:59:00 file1 kernel: [ 1540.574156] Established iSCSI session
from node: iqn.1998-01.com.vmware:esx1-0dc0d7c5
Feb 8 13:59:00 file1 kernel: [ 1540.574159] Incremented number of
active iSCSI sessions to 2 on iSCSI Target Portal Group: 1
Feb 8 13:59:00 file1 kernel: [ 1540.574181] Cleared np->np_login_tpg
Feb 8 13:59:00 file1 kernel: [ 1540.576554]


iscsi_handle_scsi_cmd:1479: ***ERROR*** R_BIT or W_BIT set when
Expected Data Transfer Length is 0. Bad iSCSI Initiator.

Feb 8 13:59:00 file1 kernel: [ 1540.576613] Decremented iSCSI
connection count to 0 from node: iqn.1998-01.com.vmware:esx1-0dc0d7c5
Feb 8 13:59:00 file1 kernel: [ 1540.576620] Released iSCSI session
from node: iqn.1998-01.com.vmware:esx1-0dc0d7c5
Feb 8 13:59:00 file1 kernel: [ 1540.576623] Decremented number of
active iSCSI Sessions on iSCSI TPG: 1 to 1


repeating every 5 secons.

ESX side logs:
Feb 8 14:00:31 esx1 vmkernel: 5:14:31:35.730 cpu3:1122)iSCSI: bus 0


target 3 trying to establish session 0x9e180a0 to portal 0, address
127.0.0.1 port 3261 group 1

Feb 8 14:00:31 esx1 vmkernel: 5:14:31:35.730 cpu3:1122)iSCSI: session


0x9e180a0 to iqn.2003-01.org.linux-iscsi.file1.i686:sn.ab90cd162ef1
failed to connect, rc -5, I/O error

Feb 8 14:00:31 esx1 vmkernel: 5:14:31:35.730 cpu3:1122)iSCSI: session
0x9e180a0 connect failed at 48429576
Feb 8 14:00:31 esx1 vmkernel: 5:14:31:35.730 cpu3:1122)<5>iSCSI:


session 0x9e180a0 iSCSI: session 0x9e180a0 retrying all the portals
again, since the portal list got exhausted

Feb 8 14:00:31 esx1 vmkernel: 5:14:31:35.730 cpu3:1122)iSCSI: session


0x9e180a0 to iqn.2003-01.org.linux-iscsi.file1.i686:sn.ab90cd162ef1
waiting 60 seconds before next login attempt

Feb 8 14:00:35 esx1 vmkernel: 5:14:31:39.892 cpu3:1126)iSCSI: bus 0
target 4 trying to establish session 0x9e2c1b0 to portal 0, address
10.254.0.99 port 3260 group 1
Feb 8 14:00:35 esx1 vmkernel: 5:14:31:39.893 cpu3:1126)iSCSI: bus 0
target 4 established session 0x9e2c1b0 #150 to portal 0, address
10.254.0.99 port 3260 group 1, alias file1
Feb 8 14:00:35 esx1 vmkernel: 5:14:31:39.901 cpu3:1126)iSCSI:
dropping session 0x9e2c1b0 because target rejected a PDU, reason 0x9,
dlength 48, rejected itt 2840930816, opcode 0x1
Feb 8 14:00:35 esx1 vmkernel: 5:14:31:39.901 cpu3:1126)iSCSI: session
0x9e2c1b0 has ended quickly 47 times, login delay 5 seconds
Feb 8 14:00:35 esx1 vmkernel: 5:14:31:39.901 cpu3:1126)iSCSI: session
0x9e2c1b0 to file1 dropped
Feb 8 14:00:35 esx1 vmkernel: 5:14:31:39.901 cpu3:1126)iSCSI: session
0x9e2c1b0 to file1 waiting 5 seconds before next login attempt

maza...@bellsouth.net

unread,
Feb 8, 2008, 4:00:33 PM2/8/08
to linux-iscsi...@googlegroups.com, Chris Weiss

Believe me, I'm not the expert at this, but I did see these two pieces of information:

 

From the original post I had cited:

 

This is where a problem might arise.  My experience with Vmware (at
least with Workstation 6) with bringing in storage LUNs into the guest
is that VMware still emulates the control CDBs.
 For example, the SCSI
Inquiry EVPD pages (that tell you things like the unit serial number of
the storage hardware) returns emulated, and not proper values for the
control CDBs.  This has been the case with Workstation 6 on Linux and
Win32, but I seem to recall Fusion on MacOSX passing CDBs, at least USB
storage devices, without the control emulation bit.

 

> Will this work?  Or do I need to run the Linux iSCSI target on real hardware with a real HBA?

This depends if ESX server does the same emulation of control CDBs for
real SCSI devices passed into the guests as Workstation 6 does.  If it
does, then you will most likely hit an exception.  
...

======

From your first error messages:

 

Feb  8 09:00:50 esx1 vmkernel: 5:09:31:59.393 cpu1:1037)ScsiScan: 395:
Path 'vmhba32:C0:T2:L0': Vendor: 'iSCSI   '  Model: 'DISK  '  Rev: '0 '
Feb  8 09:00:50 esx1 vmkernel: 5:09:31:59.393 cpu1:1037)ScsiScan: 396: Type: 0x0, ANSI rev: 4
Feb  8 09:00:50 esx1 vmkernel: 5:09:31:59.394 cpu1:1037)ScsiScan: 395:
Path 'vmhba32:C0:T4:L0': Vendor: 'SBEi-INC'  Model: 'FILEIO'  Rev: 'v2.8'
Feb  8 09:00:50 esx1 vmkernel: 5:09:31:59.394 cpu1:1037)ScsiScan: 396: Type: 0x0, ANSI rev: 2
Feb  8 09:00:50 esx1 vmkernel: 5:09:31:59.395 cpu1:1037)WARNING:
ScsiUid: 550: Path 'vmhba32:C0:T4:L0' : supports ANSI version

'SCSI-2' (0x2). In order to be used with ESX a device must support the SCSI 3 protocol.

 

Feb  8 09:00:50 esx1 vmkernel: 5:09:31:59.395 cpu1:1037)ScsiScan: 516:

Path 'vmhba32:C0:T4:L0': No standard UID: Failure

 

Feb  8 09:00:50 esx1 vmkernel: 5:09:31:59.403 cpu0:1035)SCSI: 861:

...

 

If you're stuck, I'd recommend mailing Nicholas off line and tell him you have either a problem (which will probably be fixed in the next release) or an emergency.   One thing about emergencies (getting someone else to drop what they are doing to help you get thru your problem) - there is usually some type of financial incentive and deal going on or there is no real reason to help you on an expedited basis.   I wouldn't expect any iSCSI expert to get excited about doing 2 hours worth of work modifying a 'special branch just for you' of the code to interoperate with VMWare at any price.   You need a relationship.  Your boss should understand this if you are going to live on the 'cutting edge'.  This is just my 2cents worth...

 

-Mike


 

 

Chris Weiss

unread,
Feb 8, 2008, 4:11:54 PM2/8/08
to linux-iscsi...@googlegroups.com
On Feb 8, 2008 3:00 PM, <maza...@bellsouth.net> wrote:
> From the original post I had cited:

I get where you are coming from, but my issue is that i don't get to
the point where a guest is powered on (except once), this is basically
just using a derivative of the cisco initiator so far. And even when
i do get there, I'm not using raw luns to the guest, but storing the
guest vmdk on a vmfs filesystem on a LUN the host has mounted. So in
this sense, it is not the same as the post you refered to at all.


>If you're stuck, I'd recommend mailing Nicholas...

well, it's no emergency, but I would like it to work, and it's my
impression that ESX compatibility is a goal of the project. thing is,
I'm not seeing much in the logs that seems helpful to report, so i'm
reporting what i have in hopes that some instructions or links to
documentation to something more than man target-ctl will come and we
can work to get this going.

Of course, if putting up a bounty is what it takes to get there, I'm
sure I can mange something, but for my part it will certainly have to
be less than buying HBA's, assuming that would even work better, but
does anyone know?

Ming Zhang

unread,
Feb 8, 2008, 4:42:06 PM2/8/08
to linux-iscsi...@googlegroups.com

On Fri, 2008-02-08 at 07:41 -0800, Chris wrote:
> my setup:
> I am using Ubuntu Gutsy with my own compile of 2.6.23 kernel (I'm also
> testing iscsi-scst which works better with a vanilla kernel). I have
> install the gutsy deb for target-utils and compiled the driver source
> deb with m-a.
> I have created an install.target based on this
> http://linux-iscsi.org/builds/conf/install.target but with my
> initiator names and only one hba and lun defined
>
> I created a 20GB sparse file with: dd of=testfile count=0 bs=4K
> seek=5M
> and added it with the createvirtdev (I'm not sure if creating the
> sparse file is necessary, is it?)
>
> I first tested with Windows 2k3 x64 and MS's initiator and it worked
> good.
>
> getting some errors with vmware ESX 3.5. the first error on the
> target is:
> Feb 8 09:33:21 file1 kernel: [ 898.900540] iscsi_handle_scsi_cmd:
> 1479: ***ERROR*** R_BIT or W_BIT set when Expected Data Transfer
> Length is 0. Bad iSCSI Initiator.


could you capture a tcpdump log for this and send here?

pure guess. looks like ESX is sending a Read/write request with request
size 0. this is perfectly valid. maybe linux-iscsi target did not handle
it right?

--
Ming Zhang


@#$%^ purging memory... (*!%
http://blackmagic02881.wordpress.com/
http://www.linkedin.com/in/blackmagic02881
--------------------------------------------

Nicholas A. Bellinger

unread,
Feb 8, 2008, 8:20:46 PM2/8/08
to linux-iscsi...@googlegroups.com

Hi Ming, :-)

I was not aware that this is legal in READ_* or WRITE_* (ie:
ICF_SCSI_DATA_SG_IO_CDB) cases. Since the original ESX v3.0 started
doing this, I ended up just checking for the RESERVE/RELEASE case they
added to cisco's initiator to detect this case, and let it pass:

The current logic from iscsi_target.c:iscsi_handle_scsi_cmd() looks
like:

if (((hdr->flags & R_BIT) || (hdr->flags & W_BIT)) &&
!hdr->exp_xfer_len) {
/*
* Vmware ESX v3.0 uses a modified Cisco Initiator (v3.4.2)
* that adds support for RESERVE/RELEASE. There is a bug
* add with this new functionality that sets R/W bits when
* neither CDB carries any READ or WRITE datapayloads.
*/
if ((hdr->cdb[0] == 0x16) || (hdr->cdb[0] == 0x17)) {
hdr->flags &= ~R_BIT;
hdr->flags &= ~W_BIT;
goto done;
}

TRACE_ERROR("R_BIT or W_BIT set when Expected Data Transfer"
" Length is 0. Bad iSCSI Initiator.\n");
return(iscsi_add_reject(REASON_INVALID_PDU_FIELD, 1, buf, conn));
}
done:

I am assuming that this would still not be legal for READ_* and WRITE_*,
so perhaps it should be checking for these CDBs, and only fail in that
case, and let everything else pass. Btw, is there any place in the RFC
that says that this check is bogus for all CDB cases..?

--nab

Nicholas A. Bellinger

unread,
Feb 8, 2008, 8:34:30 PM2/8/08
to linux-iscsi...@googlegroups.com
On Fri, 2008-02-08 at 15:11 -0600, Chris Weiss wrote:
> On Feb 8, 2008 3:00 PM, <maza...@bellsouth.net> wrote:
> > From the original post I had cited:
>
> I get where you are coming from, but my issue is that i don't get to
> the point where a guest is powered on (except once), this is basically
> just using a derivative of the cisco initiator so far. And even when
> i do get there, I'm not using raw luns to the guest, but storing the
> guest vmdk on a vmfs filesystem on a LUN the host has mounted. So in
> this sense, it is not the same as the post you refered to at all.
>

Hi Chris,

I am not sure if using a virtual LUN (FILEIO) compared to physical LUN
(say exporting the 3ware storage directly via PSCSI) would be causing an
issue here..

I noticed the following from your original post:

Feb 8 09:00:50 esx1 vmkernel: 5:09:31:59.395 cpu1:1037)WARNING:
ScsiUid: 550: Path 'vmhba32:C0:T4:L0' : supports ANSI version
'SCSI-2' (0x2). In order to be used with ESX a device must support the

SCSI 3 protocol. Perhaps the pre SCSI-3 revision is causing problems
with ESX..?

This is the default SCSI revision that is being returned for the virtual
LIO-SE objects (ie: IBLOCK, FILEIO and RAMDISK). I know that the 3ware
HBA returns the following for its own storage:

Host: scsi6 Channel: 00 Id: 00 Lun: 00
Vendor: 3ware Model: Logical Disk 00 Rev: 1.00
Type: Direct-Access ANSI SCSI revision: ffffffff

You can export the 3ware LUNs directly (it cannot be mounted local of
course) using:

target-ctl addhbatotarget hba_id=$NEXT_HBA_ID hba_type=1 scsi_host_id=
$3WARE_SCSI_HOST_ID

this will bring all of the LUNs that are not currently in use (ie: not
mounted or part of a MD or LVM, etc) into the SE. You can see which are
registered with the SE with: target-ctl listhbadevinfo hba_id=$HBA_ID.

>From there the 3ware LUN can be exported on a iSCSI portal group with:

target-ctl addluntodev tpgt=$TPGT iscsi_lun=$ISCSI_LUN hba_id=$HBA_ID
scsi_channel_id=$CHAN_ID scsi_target_id=$TARG_ID scsi_lun_id=$LUN_ID.

The last three values are located via /proc/scsi/scsi.

Note that I don't know if this is what is causing the issue with ESX,
but lets try to use this method for now to reduce the possible issues.

>
> >If you're stuck, I'd recommend mailing Nicholas...
>
> well, it's no emergency, but I would like it to work, and it's my
> impression that ESX compatibility is a goal of the project. thing is,
> I'm not seeing much in the logs that seems helpful to report, so i'm
> reporting what i have in hopes that some instructions or links to
> documentation to something more than man target-ctl will come and we
> can work to get this going.
>
> Of course, if putting up a bounty is what it takes to get there, I'm
> sure I can mange something, but for my part it will certainly have to
> be less than buying HBA's, assuming that would even work better, but
> does anyone know?
>

Here is the patch to disable this check for testing purposes. Depending
on what Ming says about this for the generic case (ie: all CDBs), I will
probably add checks to ensure that CDBs that really need to have a
expected transfer length set so that we can handle this generically for
all versions of ESX that do this. Btw, I have never seen any other
initiator (on non-Linux systems included) ever do this.

Index: iscsi_target.c
===================================================================
--- iscsi_target.c (revision 209)
+++ iscsi_target.c (working copy)
@@ -1460,7 +1460,7 @@
TRACE_ERROR("W_BIT & F_BIT not set. Bad iSCSI Initiator.\n");
return(iscsi_add_reject(REASON_INVALID_PDU_FIELD, 1, buf, conn));
}
-
+#if 0


if (((hdr->flags & R_BIT) || (hdr->flags & W_BIT)) &&
!hdr->exp_xfer_len) {
/*

@@ -1480,7 +1480,7 @@


return(iscsi_add_reject(REASON_INVALID_PDU_FIELD, 1, buf, conn));
}
done:

-
+#endif
if (!(hdr->flags & R_BIT) && !(hdr->flags & W_BIT) &&
(hdr->exp_xfer_len != 0)) {
TRACE_ERROR("R_BIT and/or W_BIT MUST be set if Expected"


> >

Nicholas A. Bellinger

unread,
Feb 8, 2008, 8:38:38 PM2/8/08
to linux-iscsi...@googlegroups.com
On Fri, 2008-02-08 at 07:41 -0800, Chris wrote:
> my setup:
> I am using Ubuntu Gutsy with my own compile of 2.6.23 kernel (I'm also
> testing iscsi-scst which works better with a vanilla kernel). I have
> install the gutsy deb for target-utils and compiled the driver source
> deb with m-a.
> I have created an install.target based on this
> http://linux-iscsi.org/builds/conf/install.target but with my
> initiator names and only one hba and lun defined
>
> I created a 20GB sparse file with: dd of=testfile count=0 bs=4K
> seek=5M
> and added it with the createvirtdev (I'm not sure if creating the
> sparse file is necessary, is it?)
>
> I first tested with Windows 2k3 x64 and MS's initiator and it worked
> good.
>
> I was running iometer from a win2k3 guest on esx 3.5 with a 2nd
> virtual disk added that's on the LIO virtdisk lun which is on an XFS
> mounted LVM volume on a 3ware7504 RAID5 of 4 older 120GB IDE disks.
> IOMeter test is 4 workers of max 64 sectos, 16 outstanding io per
> target, 32K block size, 75% read sequential and it was holding very
> steady at 113MB/s total and 3600 IOps. this is actually a bit better
> than I would have expect from these disks, guessing some target side
> caching is in play?
>

Yes, the LIO-SE FILEIO plugin has to use buffered I/O, because of
current limitiations with I/O to struct file in the kernel and O_DIRECT.
This is why I do not recommend folks use FILEIO w/o O_DIRECT in
production, because of the potential data integrity issues if the target
machine was to fail due to power issues, etc. Making O_DIRECT work with
kernel level struct file IO is something on the long term list.

Personally, I use and recommend PSCSI (for physical hardware) and IBLOCK
(for MD RAID and LVM) LIO-SE plugins for production usage because of
this reason.

--nab


> >

Nicholas A. Bellinger

unread,
Feb 8, 2008, 8:43:27 PM2/8/08
to linux-iscsi...@googlegroups.com
Hi again Chris,

Here is slightly difference patch to dump the received CDB that is
triggering the R_BIT || W_BIT && !hdr->exp_xfer_len check.

Index: iscsi_target.c
===================================================================
--- iscsi_target.c (revision 209)
+++ iscsi_target.c (working copy)

@@ -1476,8 +1476,8 @@


}

TRACE_ERROR("R_BIT or W_BIT set when Expected Data Transfer"

- " Length is 0. Bad iSCSI Initiator.\n");
- return(iscsi_add_reject(REASON_INVALID_PDU_FIELD, 1, buf, conn));
+ " Length is 0 for CDB: 0x%02x.\n", hdr->cdb[0]);
+// return(iscsi_add_reject(REASON_INVALID_PDU_FIELD, 1, buf, conn));
}
done:

I am still nervous to potentially pass READ_[6,10,12,16] or
WRITE_[6,10,12,16] with a zero ExpXferLen into the storage engine, but
lets see what other CDBs that ESX v3.5 has added that do this.

Thanks,

--nab

Chris Weiss

unread,
Feb 10, 2008, 7:23:01 PM2/10/08
to linux-iscsi...@googlegroups.com
I am also on the list...

On Feb 8, 2008 7:34 PM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
>
> On Fri, 2008-02-08 at 15:11 -0600, Chris Weiss wrote:
> > On Feb 8, 2008 3:00 PM, <maza...@bellsouth.net> wrote:
> > > From the original post I had cited:
> >

> You can export the 3ware LUNs directly (it cannot be mounted local of
> course) using:

I was hoping to use this partition for other things and fileio just
for some iscsi testing. this is not the hardware I will be using
permanently since it's really just too slow, and with my final
hardware I was planning to use LVM volumes primarily and also hoping
to have the option of using fileio if I felt like it. if I read what
you said correctly, using fileio reports scsi-2 and using direct
reports whatever the hba reports? does this mean that if I were to
grab an old uw-scsi card and export that it would also fail? (not
that I would, it would be even slower!)

> Here is the patch to disable this check for testing purposes. Depending

guess I'll need to get that svn checkout figured out again and hope
m-a compiles it this time. I tried a couple months ago just after you
released the appliance images and had great failures compiling. I
think joining the mailing list was broke at the time also, so maybe
there will be no problem this time.

Nicholas A. Bellinger

unread,
Feb 11, 2008, 2:38:10 AM2/11/08
to linux-iscsi...@googlegroups.com
On Sun, 2008-02-10 at 18:23 -0600, Chris Weiss wrote:
> I am also on the list...
>
> On Feb 8, 2008 7:34 PM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> >
> > On Fri, 2008-02-08 at 15:11 -0600, Chris Weiss wrote:
> > > On Feb 8, 2008 3:00 PM, <maza...@bellsouth.net> wrote:
> > > > From the original post I had cited:
> > >
>
> > You can export the 3ware LUNs directly (it cannot be mounted local of
> > course) using:
>
> I was hoping to use this partition for other things and fileio just
> for some iscsi testing. this is not the hardware I will be using
> permanently since it's really just too slow, and with my final
> hardware I was planning to use LVM volumes primarily and also hoping
> to have the option of using fileio if I felt like it. if I read what
> you said correctly, using fileio reports scsi-2 and using direct
> reports whatever the hba reports?

This is correct. SPC-2 is the current case for the virtual (ie: block
storage layer and above) v2.9 SE objects. You are correct to say that
FILEIO is very useful and transparent, but due the current limitiations
with this case for kernel I/O (because it lacks O_DIRECT) for production
cases. Using SE IBLOCK (exporting an block device from major/minor)
using is current v2.9 stable production setup that uses DMA scatterlist
chains that are queued to struct block_device.

The kernel-level O_DIRECT case looks pretty straight forward when I
checked it out last time, and this has been a topic of interest for
quite a while at the LIO project. For those interested, have a look at
fs/direct-io.c and the use of get_user_pages() to get what type of
changes to the VFS are required within the Linux kernel to get this up
and running..

I will put ESX v3.5 testing on my list (if someone wants to hook me up
with a free license, I would also be happy to have a look :-) Getting
Linux to support these CDBs generic properly for LIO-Target and
eventually supporting SPC-3 multipath with kernel.org initiator side
software stacks. This will certainly be of interest for the interopt
cases.

Also, perhaps another short-term solution would be running LVM/IBLOCK
loopback on kernel-level filesystem mountpoints would be a interesting
project. Does anyone use LVM loopback working with FILEIO on Linux
v2.6..?

> does this mean that if I were to
> grab an old uw-scsi card and export that it would also fail? (not
> that I would, it would be even slower!)
>

Well, the plan has always been to emulate SPC-3 generically for all
TRANSPORT_PLUGIN_VHBA_PDEV (iBlock to virtual or pscsi) and
TRANSPORT_PLUGIN_VHBA_VDEV (fileio, vtape, medium-changer, etc). If the
underlying physical HBA is a REAL SCSI bus transport, and their because
the cluster related CDB opcodes for persistent reservations are not
supported, I have had to historically emulated this case within the SE
in iscsi_target_pscsi.c for SATA HBAs that do not use libata. I did
just check drivers/scsi/3w-*, and it does NOT support SPC-3 persistent
reservations, so we are definately going to want to make this SE
transparent in the future to get this working for both all SE object
export cases.

A similar problem existed with exporting non PSCSI SPC-3 LIO-Target
storage to both Linux and non Linux SCSI subsystems for OS multipath,
when the SCSI LLD does not return EVPD information during INQUIRY
responses for T10 WWN information. This is the same as the virtual SE
plugins, and hence working OS dependent multipath depends on having this
emulated for BOTH for Linux and non Linux Initiator cases with just
about all SATA RAID HBA cases. Note that I currently use two logic caes
generically in LIO-SE v2.9, from which plugins call common code for
emulating EVPD WWN and SPC-2 RESERVE/RELEASE functionality if their port
endpoint does not support this internally. This can be due to either
not actually being a SCSI HBA, or as you mention, old enough where some
ancient SCSI hardware does not support this.

In any event, post your ethereal trace and I will have a look this week.

--nab


> > Here is the patch to disable this check for testing purposes. Depending
>
> guess I'll need to get that svn checkout figured out again and hope
> m-a compiles it this time. I tried a couple months ago just after you
> released the appliance images and had great failures compiling. I
> think joining the mailing list was broke at the time also, so maybe
> there will be no problem this time.
>

If you would be so kind to post an ethereal trace for you setup, I will
have a look and make a patch for the short term ESX v3.5 case to the
stable repos, as I think I have a pretty good handle on this one.

Also, if you find that there are remaining issues building target-source
with m-a, I am more than happy to field these questions. This is the
identical setup I am currently using for the stable debian and ubuntu
builds.

--nab

> >

Chris Weiss

unread,
Feb 11, 2008, 9:31:37 AM2/11/08
to linux-iscsi...@googlegroups.com
On Feb 11, 2008 1:38 AM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> I will put ESX v3.5 testing on my list (if someone wants to hook me up
> with a free license, I would also be happy to have a look :-) Getting
> Linux to support these CDBs generic properly for LIO-Target and
> eventually supporting SPC-3 multipath with kernel.org initiator side
> software stacks. This will certainly be of interest for the interopt
> cases.

they have a 60 day demo you can download, and as far as I can tell it
doesn't require a key so resetting the time is just a matter of
reinstalling.

> just check drivers/scsi/3w-*, and it does NOT support SPC-3 persistent
> reservations, so we are definately going to want to make this SE

interesting. maybe this is the real reason 3ware doesn't work with
vmware to make a driver and it's not just a sata thing...

> In any event, post your ethereal trace and I will have a look this week.

i should be able to try it today, and I'll should also be able to
compare with a host with an adaptec 31205 card.

Chris Weiss

unread,
Feb 11, 2008, 4:20:13 PM2/11/08
to linux-iscsi...@googlegroups.com
On Feb 11, 2008 1:38 AM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> Also, if you find that there are remaining issues building target-source
> with m-a, I am more than happy to field these questions. This is the
> identical setup I am currently using for the stable debian and ubuntu
> builds.

well that didn't take long:

./autoconfig --write-to-file ; cat .make_autoconfig
Unknown architecture: could not continue --- at ostype.pm line 47.
root@iscsi:~/dl/lio/trunk/target# file /bin/ls
/bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), for
GNU/Linux 2.6.8, dynamically linked (uses shared libs), stripped

ostype.pm is looking for 64.*(AMD|Intel) and for whatever reason mine
does have it. this is on x64 ubuntu gutsy.

changing to look for x84-64 continues.

some notes that maybe should be added to readme for debian builds,
install libtool and libsmnp-dev.

so I have it built,and I'm at a bit of a loss.

I have 2 raid arrays on my 31205 controller but it only seems to find one.

/proc/scsi/scsi includes:
Host: scsi7 Channel: 00 Id: 00 Lun: 00
Vendor: Adaptec Model: 31205 RAID6 Rev: V1.0
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi7 Channel: 00 Id: 01 Lun: 00
Vendor: Adaptec Model: raptors Rev: V1.0
Type: Direct-Access ANSI SCSI revision: 02

commands I'm running:
root@iscsi:/etc/iscsi# target-ctl addhbatotarget hba_id=0 hba_type=1
scsi_host_id=7
Parallel SCSI
STATUS: OK
root@iscsi:/etc/iscsi# target-ctl addluntodev tpgt=1 iscsi_lun=0
hba_id=0 scsi_channel_id=0 scsi_target_id=0 scsi_lun_id=0

Adding iSCSI Logical Unit Number to Storage Device on iSCSI Target Portal Group:
iSCSI Target Portal Group Tag: 1
iSCSI Logical Unit Number: 0
SCSI Channel ID: 0
SCSI Target ID: 0
SCSI LUN ID: 0
STATUS: ERROR
iSCSI_TPG[1]_LUN[0] - Unable to locate device
root@iscsi:/etc/iscsi# target-ctl addluntodev tpgt=1 iscsi_lun=1
hba_id=0 scsi_channel_id=0 scsi_target_id=1 scsi_lun_id=0

Adding iSCSI Logical Unit Number to Storage Device on iSCSI Target Portal Group:
iSCSI Target Portal Group Tag: 1
iSCSI Logical Unit Number: 1
SCSI Channel ID: 0
SCSI Target ID: 1
SCSI LUN ID: 0
STATUS: OK

it's also kernel panicing at seemingly random times.

I also have another issue, for whatever reason my controllers (I have
4 altogether including ide, sata, sas, and u160 for a tape drive) get
a random host ID when I boot up, so this H:C:I:L addressing scheme is
completely unusable. I'm sure it's ubuntu's fault somehow and I've
tried adding numbered scsi_hostadapter entries to
/etc/modprobe.d/aliases and it has no effect.

n...@linux-iscsi.org

unread,
Feb 12, 2008, 8:09:50 AM2/12/08
to Linux-iSCSI.org Target Development


On Feb 11, 1:20 pm, "Chris Weiss" <cwe...@gmail.com> wrote:
> On Feb 11, 2008 1:38 AM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
>
> > Also, if you find that there are remaining issues building target-source
> > with m-a, I am more than happy to field these questions. This is the
> > identical setup I am currently using for the stable debian and ubuntu
> > builds.
>
> well that didn't take long:
>
> ./autoconfig --write-to-file ; cat .make_autoconfig
> Unknown architecture: could not continue --- at ostype.pm line 47.
> root@iscsi:~/dl/lio/trunk/target# file /bin/ls
> /bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), for
> GNU/Linux 2.6.8, dynamically linked (uses shared libs), stripped
>
> ostype.pm is looking for 64.*(AMD|Intel) and for whatever reason mine
> does have it. this is on x64 ubuntu gutsy.
>

Please post a patch for ostype.pm and I will make this commit.

> changing to look for x84-64 continues.
>
> some notes that maybe should be added to readme for debian builds,
> install libtool and libsmnp-dev.
>

Good idea, I will do this.
Please post your dmesg and I will see why your first device is not
being detected.

Note that the SE plugins will skip devices that are current in use, by
wait of bd_claim(). This means that if a device is part of an RAID,
LVM, or is current mounted, the SE will skip this. There are some
cases where still allowing RO export (and even RW when you are using a
cluster filesystem). This is something that other folks are
interested in, so I will put it on my list.

Anyways, I am guessing this is why CTL == 0:0:0 is returning an
exception. You can list which devices are attached to the HBA with
target-ctl listhbadevinfo hba_id=0 as well as dmesg.

> it's also kernel panicing at seemingly random times.
>

Hmm, both you and Bart are using x86_64 ubuntu, and are seeing these
random kernel panics. With am using his config with 2.6.24, I am
seeing the same general protection fault during normal usage. The
x86_64 on CentOS 5u1 with 2.6.18-53.el5* is still fine, so I am
guessing we are either hitting x86 changes between that revision and
2.6.24.

I have LIO-target on 2.6.22-mactel running i686 for a number of
months. I am also going to put 2.6.24 on my ppc32, and make sure LIO-
Target works as expected. It looks like this issue (if it is all the
same issue) is involving x86_64 with >= 2.6.23 and Bart's (and I am
guessing yours as well) .config on x86_64 ubuntu. Again, 2.6.23-rc5
and 2.6.24 have been running stable of ppc64 on the ps3-linux..

Please post your kernel config for 2.6.23, and I will let you know
what I find.

> I also have another issue, for whatever reason my controllers (I have
> 4 altogether including ide, sata, sas, and u160 for a tape drive) get
> a random host ID when I boot up, so this H:C:I:L addressing scheme is
> completely unusable. I'm sure it's ubuntu's fault somehow and I've
> tried adding numbered scsi_hostadapter entries to
> /etc/modprobe.d/aliases and it has no effect.

Hmm, yeah this is the normal operation. Please feel free to ask the
Ubuntu folks and CC this list.

Also, the "real" SCSI hardware that is used with PSCSI (as you are
doing with hba_type=1), you can reference t10 WWN information instead
of SE plugin specific parameters (ie: hba_id= + scsi_channel_id=,
scsi_target_id=, scsi_lun_id=) for target-ctl ops (with addluntodev
for example).

These parameters are called evpd_serial= and evpd_devident=. Any
target-ctl operation that can accept these SE plugin specific
parameters, will also be able to accept this type. The actual
hardware values can be determined with scsi_id from udev with:

scsi_id -g -p 0x8[3,0] -s /block/sda

These values (if they are not supported by the hardware, or we are
using a virtual plugin) will be emulated by the SE, and the values
will be available from 'target-ctl listhbadevinfo'. They are only
really useful for real PSCSI hardware currently. I have been
considering adding the ability to add MD or LVM UUIDs as an option in
this context as well, instead of the major/minors which is what I am
currently using for production.

This is a good point for real-world cases, so I will be sure to update
the manual page for this. Also, adding some examples to the LIO-
Target wiki entry would probably also benefit everyone. I will look
at this in the upcoming days..

--nab

--nab

Chris Weiss

unread,
Feb 12, 2008, 12:30:26 PM2/12/08
to linux-iscsi...@googlegroups.com, n...@linux-iscsi.org
On Feb 12, 2008 7:09 AM, n...@linux-iscsi.org <n...@linux-iscsi.org> wrote:
> > ostype.pm is looking for 64.*(AMD|Intel) and for whatever reason mine
> > does have it. this is on x64 ubuntu gutsy.
> >
>
> Please post a patch for ostype.pm and I will make this commit.

i'm not sure what the patch would have, I hacked it to work for me but
this may not work for everyone. maybe it should just check uname -m
instead?

> Please post your dmesg and I will see why your first device is not
> being detected.

dmesg is filled with other stuff now, but attached is /var/log/dmesg

>
> Note that the SE plugins will skip devices that are current in use, by
> wait of bd_claim(). This means that if a device is part of an RAID,
> LVM, or is current mounted, the SE will skip this. There are some
> cases where still allowing RO export (and even RW when you are using a
> cluster filesystem). This is something that other folks are
> interested in, so I will put it on my list.

it is most certainly not in use, neither even has a partition, but one
is detected and the other not.

>
> > it's also kernel panicing at seemingly random times.
> >

> Please post your kernel config for 2.6.23, and I will let you know
> what I find.

also attached, though it's just Gutsy's with a make oldconifg on a
2.6.23 kernel plus scst's patches

> Also, the "real" SCSI hardware that is used with PSCSI (as you are
> doing with hba_type=1), you can reference t10 WWN information instead
> of SE plugin specific parameters (ie: hba_id= + scsi_channel_id=,
> scsi_target_id=, scsi_lun_id=) for target-ctl ops (with addluntodev
> for example).

interesting, I'll have a look. makes a lot more sense to use WWN's,
it's been a source of confusion on how to export a tape drive when the
host ID is random and there's no other way to address it.

>
> These parameters are called evpd_serial= and evpd_devident=. Any
> target-ctl operation that can accept these SE plugin specific
> parameters, will also be able to accept this type. The actual
> hardware values can be determined with scsi_id from udev with:
>
> scsi_id -g -p 0x8[3,0] -s /block/sda

so if I get:
SAdaptec 31205 RAID6 8464A378

then "8464A378" is the serial and the rest is the devident? and how
would I find the tape? tried /char/st0 and /block/st0 with and
without the -p option (man page says without will check 0x80 then 0x83
so it shouldn't matter)

chrisw-x64-config.tar.gz

n...@linux-iscsi.org

unread,
Feb 12, 2008, 9:50:10 PM2/12/08
to Linux-iSCSI.org Target Development


On Feb 12, 9:30 am, "Chris Weiss" <cwe...@gmail.com> wrote:
> On Feb 12, 2008 7:09 AM, n...@linux-iscsi.org <n...@linux-iscsi.org> wrote:
>
> > > ostype.pm is looking for 64.*(AMD|Intel) and for whatever reason mine
> > > does have it. this is on x64 ubuntu gutsy.
>
> > Please post a patch for ostype.pm and I will make this commit.
>
> i'm not sure what the patch would have, I hacked it to work for me but
> this may not work for everyone. maybe it should just check uname -m
> instead?
>
> > Please post your dmesg and I will see why your first device is not
> > being detected.
>
> dmesg is filled with other stuff now, but attached is /var/log/dmesg
>
>
>
> > Note that the SE plugins will skip devices that are current in use, by
> > wait of bd_claim(). This means that if a device is part of an RAID,
> > LVM, or is current mounted, the SE will skip this. There are some
> > cases where still allowing RO export (and even RW when you are using a
> > cluster filesystem). This is something that other folks are
> > interested in, so I will put it on my list.
>
> it is most certainly not in use, neither even has a partition, but one
> is detected and the other not.
>
>
>
> > > it's also kernel panicing at seemingly random times.
>
> > Please post your kernel config for 2.6.23, and I will let you know
> > what I find.
>
> also attached, though it's just Gutsy's with a make oldconifg on a
> 2.6.23 kernel plus scst's patches
>

I have been able to track down your random crashes to

-CONFIG_SLUB=y
+CONFIG_SLAB=y

I also changed the following first while debugging the issue:

-CONFIG_CRASH_DUMP=y
+# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x200000
-CONFIG_RELOCATABLE=y

Go ahead and leave this in and you should be fine.

--nab

> > Also, the "real" SCSI hardware that is used with PSCSI (as you are
> > doing with hba_type=1), you can reference t10 WWN information instead
> > of SE plugin specific parameters (ie: hba_id= + scsi_channel_id=,
> > scsi_target_id=, scsi_lun_id=) for target-ctl ops (with addluntodev
> > for example).
>
> interesting, I'll have a look. makes a lot more sense to use WWN's,
> it's been a source of confusion on how to export a tape drive when the
> host ID is random and there's no other way to address it.
>
>
>
> > These parameters are called evpd_serial= and evpd_devident=. Any
> > target-ctl operation that can accept these SE plugin specific
> > parameters, will also be able to accept this type. The actual
> > hardware values can be determined with scsi_id from udev with:
>
> > scsi_id -g -p 0x8[3,0] -s /block/sda
>
> so if I get:
> SAdaptec 31205 RAID6 8464A378
>
> then "8464A378" is the serial and the rest is the devident? and how
> would I find the tape? tried /char/st0 and /block/st0 with and
> without the -p option (man page says without will check 0x80 then 0x83
> so it shouldn't matter)
>
> chrisw-x64-config.tar.gz
> 34KDownload

n...@linux-iscsi.org

unread,
Feb 13, 2008, 12:36:43 AM2/13/08
to Linux-iSCSI.org Target Development
Hi Chris,

Here are my comments aside from the CONFIG_SLUB problem I mentioned
earlier..

On Feb 12, 9:30 am, "Chris Weiss" <cwe...@gmail.com> wrote:
> On Feb 12, 2008 7:09 AM, n...@linux-iscsi.org <n...@linux-iscsi.org> wrote:
>
> > > ostype.pm is looking for 64.*(AMD|Intel) and for whatever reason mine
> > > does have it. this is on x64 ubuntu gutsy.
>
> > Please post a patch for ostype.pm and I will make this commit.
>
> i'm not sure what the patch would have, I hacked it to work for me but
> this may not work for everyone. maybe it should just check uname -m
> instead?
>

ostype.pm:ostype() currently looks like:

if ($rval->{ARCH} eq "")
{
my $uname = `file $root/bin/ls`;
$uname =~ s/\n+//g;

$rval->{ARCH}="x86_64" if ($uname =~ /64.*(AMD|Intel)/);
$rval->{ARCH}="i386" if ($uname =~ /32.*(AMD|Intel)/);
$rval->{ARCH}="ppc" if ($uname =~ /32.*(PowerPC)/);
$rval->{ARCH}="ppc64" if ($uname =~ /64.*(PowerPC)/);
$rval->{ARCH}="alpha" if ($uname =~ /64.*(Alpha)/);
}
if ($rval->{ARCH} eq "")
{
die "Unknown architecture: could not continue --- $uname";
}

This script is used to detect both kernel and userspace packages
types. (In cases where the kernel is 64-bit and userspace is 32-
bit).

Using 'uname -m' would give us back the kernel architecture. I think
this probably needs to be determined from KERNEL_SOURCE, and leave
this
check in place for userspace packages.

Btw, what is your file /bin/ls returning on ubuntu x86_64..?

> > Please post your dmesg and I will see why your first device is not
> > being detected.
>
> dmesg is filled with other stuff now, but attached is /var/log/dmesg
>
>
>
> > Note that the SE plugins will skip devices that are current in use, by
> > wait of bd_claim(). This means that if a device is part of an RAID,
> > LVM, or is current mounted, the SE will skip this. There are some
> > cases where still allowing RO export (and even RW when you are using a
> > cluster filesystem). This is something that other folks are
> > interested in, so I will put it on my list.
>
> it is most certainly not in use, neither even has a partition, but one
> is detected and the other not.
>
>

Can you locally partition, format and mount the first device..?

Are they both TYPE_TAPE btw..?

>
> > > it's also kernel panicing at seemingly random times.
>

See my previous post about changing from SLUB -> SLAB.

> > Please post your kernel config for 2.6.23, and I will let you know
> > what I find.
>
> also attached, though it's just Gutsy's with a make oldconifg on a
> 2.6.23 kernel plus scst's patches
>
> > Also, the "real" SCSI hardware that is used with PSCSI (as you are
> > doing with hba_type=1), you can reference t10 WWN information instead
> > of SE plugin specific parameters (ie: hba_id= + scsi_channel_id=,
> > scsi_target_id=, scsi_lun_id=) for target-ctl ops (with addluntodev
> > for example).
>
> interesting, I'll have a look. makes a lot more sense to use WWN's,
> it's been a source of confusion on how to export a tape drive when the
> host ID is random and there's no other way to address it.
>
>
>
> > These parameters are called evpd_serial= and evpd_devident=. Any
> > target-ctl operation that can accept these SE plugin specific
> > parameters, will also be able to accept this type. The actual
> > hardware values can be determined with scsi_id from udev with:
>
> > scsi_id -g -p 0x8[3,0] -s /block/sda
>
> so if I get:
> SAdaptec 31205 RAID6 8464A378
>
> then "8464A378" is the serial and the rest is the devident?

target-ctl listhbadevinfo will tell you what it will be comparing
against for when evpd_*= parameters are passed..

I think that scsi_id may append the Vendor/Product field in it's
output in some cases..

and how
> would I find the tape?

Just using addluntodev with either CTL or evpd_*= will get your SE
objected export to a particular TPG.

tried /char/st0 and /block/st0 with and
> without the -p option (man page says without will check 0x80 then 0x83
> so it shouldn't matter)

I don't recall which types of tapes or medium changers having EVPD
information.

Have a look at libhbadevinfo and it will well you exactly what the
hardware is returning for EVPD.

--nab

Chris Weiss

unread,
Feb 13, 2008, 9:13:12 AM2/13/08
to linux-iscsi...@googlegroups.com
On Feb 12, 2008 11:36 PM, n...@linux-iscsi.org <n...@linux-iscsi.org> wrote:
>
> This script is used to detect both kernel and userspace packages
> types. (In cases where the kernel is 64-bit and userspace is 32-
> bit).

that sounds like a very odd fringe case to me, does anyone actually
run like this outside of a chroot?

> Btw, what is your file /bin/ls returning on ubuntu x86_64..?

/bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), for


GNU/Linux 2.6.8, dynamically linked (uses shared libs), stripped

I had this in my original report, if you missed it then I now
understand the confusion.

>
> Can you locally partition, format and mount the first device..?

of course

>
> Are they both TYPE_TAPE btw..?

they are both raid arrays on disks, I don't think either are tape.
tape is on a separate controller as well, haven't gotten to the point
of trying it yet.

Nicholas A. Bellinger

unread,
Feb 13, 2008, 9:32:56 AM2/13/08
to linux-iscsi...@googlegroups.com
Hi Chris,


On Wed, 2008-02-13 at 08:13 -0600, Chris Weiss wrote:
> On Feb 12, 2008 11:36 PM, n...@linux-iscsi.org <n...@linux-iscsi.org> wrote:
> >
> > This script is used to detect both kernel and userspace packages
> > types. (In cases where the kernel is 64-bit and userspace is 32-
> > bit).
>
> that sounds like a very odd fringe case to me, does anyone actually
> run like this outside of a chroot?

For an iSCSI Target module..?

> > Btw, what is your file /bin/ls returning on ubuntu x86_64..?
>
> /bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), for
> GNU/Linux 2.6.8, dynamically linked (uses shared libs), stripped
>
> I had this in my original report, if you missed it then I now
> understand the confusion.
>

Thanks for reposting..

> >
> > Can you locally partition, format and mount the first device..?
>
> of course
>

Ok, I will double check that this works in my x86_64 VM setup, but I am
guessing this is also an issue with SLUB (why only one of your LUNs is
getting registered). Btw, Bart sent me a trace using IPoIB and I am
currently looking at this so that I can get him up and running..

> >
> > Are they both TYPE_TAPE btw..?
>
> they are both raid arrays on disks, I don't think either are tape.
> tape is on a separate controller as well, haven't gotten to the point
> of trying it yet.

Ok, in that case, I will add a direct scsi_id output (the appended
inquiry information) as an parameter to pass into target-ctl. Using the
evpd_* options for production are nice when you are exporting the ENTIRE
storage object, instead of a Virtual HBA on an LIO SE target port.

I am still looking at the issue with CONFIG_SLUB and let you when I
update the ubuntu repository with target-source and target-module. If
you are interested in testing LIO-SE Stable, please have a look at
Debian or CentOS from the LIO Build Cluster.

--nab


>
> >

Chris Weiss

unread,
Feb 13, 2008, 12:01:11 PM2/13/08
to linux-iscsi...@googlegroups.com
On Feb 13, 2008 8:32 AM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> > This script is used to detect both kernel and userspace packages
> > > types. (In cases where the kernel is 64-bit and userspace is 32-
> > > bit).
> >
> > that sounds like a very odd fringe case to me, does anyone actually
> > run like this outside of a chroot?
>
> For an iSCSI Target module..?

for anything. I'm not aware of any distro's that set this up, so
someone would have to install a 32bit distro and manually toss in
compiler flags to make their own 64bit kernel on it, and having run a
32bit chroot before there are some things that just don't work
correctly when you have 32bit user space talking to 64bit drivers so
I'm wondering why anyone would even want such a setup.

Reply all
Reply to author
Forward
0 new messages