While trying to configure the iSCSI target such that it exports a RAM disk, I twice experienced a kernel crash (2.6.22.9 kernel). For the call stack (typed over from the console), see below. Can I do anything more to help finding the cause of this, like posting the kernel config or enabling kernel debug options ?
On Fri, 2008-02-01 at 15:45 +0100, Bart Van Assche wrote: > While trying to configure the iSCSI target such that it exports a RAM > disk, I twice experienced a kernel crash (2.6.22.9 kernel). For the > call stack (typed over from the console), see below. Can I do anything > more to help finding the cause of this, like posting the kernel config > or enabling kernel debug options ?
Hmm, smells a possible mismatch between the running kernel and compiled module. Is there anything particular about your configuration or kernel..? Does the gcc that is building said modules match the version string from /proc/version..? Also, what do your trunk/target/.make_autoconfig look like after generating your local version with trunk/target/autoconfig --write-to-file..?
On Feb 1, 2008 4:01 PM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> Hmm, smells a possible mismatch between the running kernel and compiled > module. Is there anything particular about your configuration or > kernel..? Does the gcc that is building said modules match the version > string from /proc/version..? Also, what do your > trunk/target/.make_autoconfig look like after generating your local > version with trunk/target/autoconfig --write-to-file..?
I built the kernel myself from the Ubuntu-server kernel source tree, and I had removed the directory /lib/modules/2.6.22.9 before starting the build. There were no complaints about version mismatches in the kernel log.
Note: I had to modify buildtools/ostype.pm before the software built.
On Fri, 2008-02-01 at 16:17 +0100, Bart Van Assche wrote: > On Feb 1, 2008 4:01 PM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote: > > Hmm, smells a possible mismatch between the running kernel and compiled > > module. Is there anything particular about your configuration or > > kernel..? Does the gcc that is building said modules match the version > > string from /proc/version..? Also, what do your > > trunk/target/.make_autoconfig look like after generating your local > > version with trunk/target/autoconfig --write-to-file..?
> I built the kernel myself from the Ubuntu-server kernel source tree, > and I had removed the directory /lib/modules/2.6.22.9 before starting > the build. There were no complaints about version mismatches in the > kernel log.
> Note: I had to modify buildtools/ostype.pm before the software built.
I seem to recall having to install a specific package to get the proper kernel sources for the default kernel, and I don't recall off the top of my head. These packages are available as 'target-modules' and 'target-source', with the former buildable using module-assistant, from the LIO ubuntu repository. There is an known issue with the PV-Ops enabled kernel and mpt-fusion driver with 2.6.22-14-virtual FYI.
Anyways, you should be fine as long as the running kernel's config matches the kernel source config from the KERNEL_*_DIR values in .make_autoconfig, and the gcc versions match as mentioned above. Out of curiousity, are you able to successfully build other out of tree modules that use KBuild..?
On Fri, 2008-02-01 at 07:36 -0800, Nicholas A. Bellinger wrote: > On Fri, 2008-02-01 at 16:17 +0100, Bart Van Assche wrote: > > On Feb 1, 2008 4:01 PM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote: > > > Hmm, smells a possible mismatch between the running kernel and compiled > > > module. Is there anything particular about your configuration or > > > kernel..? Does the gcc that is building said modules match the version > > > string from /proc/version..? Also, what do your > > > trunk/target/.make_autoconfig look like after generating your local > > > version with trunk/target/autoconfig --write-to-file..?
> > I built the kernel myself from the Ubuntu-server kernel source tree, > > and I had removed the directory /lib/modules/2.6.22.9 before starting > > the build. There were no complaints about version mismatches in the > > kernel log.
> > Note: I had to modify buildtools/ostype.pm before the software built.
> I seem to recall having to install a specific package to get the proper > kernel sources for the default kernel, and I don't recall off the top of > my head. These packages are available as 'target-modules' and > 'target-source', with the former buildable using module-assistant, from
Sorry, this should have been "with the latter buildable ..". The target-source package is intended to be built using module-assistant in debian compatible build environments. The target-modules package are prebuilt kernel modules for the shipping kernels that where built using module-assistant.
On Feb 1, 2008 4:36 PM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> I seem to recall having to install a specific package to get the proper > kernel sources for the default kernel, and I don't recall off the top of > my head. These packages are available as 'target-modules' and > 'target-source', with the former buildable using module-assistant, from > the LIO ubuntu repository. There is an known issue with the PV-Ops > enabled kernel and mpt-fusion driver with 2.6.22-14-virtual FYI.
> Anyways, you should be fine as long as the running kernel's config > matches the kernel source config from the KERNEL_*_DIR values > in .make_autoconfig, and the gcc versions match as mentioned above. > Out of curiousity, are you able to successfully build other out of tree > modules that use KBuild..?
The kernel I was using was built from the linux-source package. During the kernel build process (make modules_install) the /lib/modules/2.6.22.9/build directory was created correctly.
By this time I have build and installed kernel 2.6.23.14 (kernel.org) and rebuilt the LIO kernel module (iscsi_target_mod). The crash reappeared a few minutes after I had configured LIO. This excludes Ubuntu-specific kernel modifications as a possible cause of the crash.
These are the commands I used to configure LIO (not sure that these are correct):
On Fri, 2008-02-01 at 17:14 +0100, Bart Van Assche wrote: > On Feb 1, 2008 4:36 PM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote: > > I seem to recall having to install a specific package to get the proper > > kernel sources for the default kernel, and I don't recall off the top of > > my head. These packages are available as 'target-modules' and > > 'target-source', with the former buildable using module-assistant, from > > the LIO ubuntu repository. There is an known issue with the PV-Ops > > enabled kernel and mpt-fusion driver with 2.6.22-14-virtual FYI.
> > Anyways, you should be fine as long as the running kernel's config > > matches the kernel source config from the KERNEL_*_DIR values > > in .make_autoconfig, and the gcc versions match as mentioned above. > > Out of curiousity, are you able to successfully build other out of tree > > modules that use KBuild..?
> The kernel I was using was built from the linux-source package. During > the kernel build process (make modules_install) the > /lib/modules/2.6.22.9/build directory was created correctly.
> By this time I have build and installed kernel 2.6.23.14 (kernel.org) > and rebuilt the LIO kernel module (iscsi_target_mod). The crash > reappeared a few minutes after I had configured LIO. This excludes > Ubuntu-specific kernel modifications as a possible cause of the crash.
> These are the commands I used to configure LIO (not sure that these > are correct):
> rmmod iscsi_target_mod > modprobe iscsi_target_mod > target-ctl settargetname targetname=iqn.2007-05.com.example > target-ctl settpgattrib tpgt=1 authentication=0 > target-ctl addtpg tpgt=1 > target-ctl addnptotpg tpgt=1 dev=eth0 ip=$(ip -family inet addr show > dev eth0 | sed -n 's:.*inet \([0-9.]*\).*:\1:p') port=3260 > target-ctl addnptotpg tpgt=1 dev=ib0 ip=$(ip -family inet addr show > dev ib0 | sed -n 's:.*inet \([0-9.]*\).*:\1:p') port=3260
I never have personally tested this with IBoIP, so I am guessing this may have something to do with it. For the sake of debugging, can do a run across your ethernet interface first? (I am guessing this will work).
Note that dev= for target-ctl op coreaddnp and addnptotpg is used to obtain struct net_device which is used for checking network fabric dependent PHY status registers via netif_carrier_ok(). In order to determine link layer failures and notify the initiator via existing communication paths when available in iSCSI/iSER ERL=2. I had to do this originally because there was (I believe this is still the case) no method exists to obtain struct net_device from struct sock.
Anyways, these timeout values are configurable with target-ctl settpgattrib tpgt=$tpgt netif_timeout=$SECONDS for those interested. For the sake of debugging this issue, lets set these dev=NULL and disable the check for now.
You can remove the extra scsi_host_id= here, as it is only required with hba_type=1 and PSCSI plugin is used. Also for future reference, hba_type=5 is the ramdisk_dr and hba_type=6 is ramdisk_mcp. The former simply sets pointers from frontend iovecs (as with traditional iSCSI) to struct scatterlist mapped via the SE algoritim. The latter will do the data copy from SE plugin allocated memory to the frontend memory. ramdisk_mcp is required to maintain data integrity with shared storage filesystems from multiple nexuses with ramdisk tests.
Btw, the list of hba_type values can be displayed via target-ctl listgpluginfo.
Note that this should be rd_device_id=0, but target-ctl is assuming rd_device_id=0 for the SE HBA and the LUN is being attached to the SE object. This would not cause the problem btw.
> target-ctl enabletpg tpgt=1
One other thing for future reference; any target-ctl op that accepts tpgt= as a parameter, can also accept targetname=. If no targetname= is passed, the default that is registered via settargetname is used. Note that the default IQN cannot be changed once it is set with the legacy settargetname, or the new coreaddtiqn.
Also, can you tell me a bit more about your OFED and IB setup..?
On Fri, 2008-02-01 at 17:14 +0100, Bart Van Assche wrote: > On Feb 1, 2008 4:36 PM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote: > > I seem to recall having to install a specific package to get the proper > > kernel sources for the default kernel, and I don't recall off the top of > > my head. These packages are available as 'target-modules' and > > 'target-source', with the former buildable using module-assistant, from > > the LIO ubuntu repository. There is an known issue with the PV-Ops > > enabled kernel and mpt-fusion driver with 2.6.22-14-virtual FYI.
> > Anyways, you should be fine as long as the running kernel's config > > matches the kernel source config from the KERNEL_*_DIR values > > in .make_autoconfig, and the gcc versions match as mentioned above. > > Out of curiousity, are you able to successfully build other out of tree > > modules that use KBuild..?
> The kernel I was using was built from the linux-source package. During > the kernel build process (make modules_install) the > /lib/modules/2.6.22.9/build directory was created correctly.
> By this time I have build and installed kernel 2.6.23.14 (kernel.org) > and rebuilt the LIO kernel module (iscsi_target_mod). The crash > reappeared a few minutes after I had configured LIO. This excludes > Ubuntu-specific kernel modifications as a possible cause of the crash.
Did the backtrace look similar as the first one..?
On Feb 2, 2008 5:42 AM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> Did the backtrace look similar as the first one..?
By this time I have modified the LIO configuration as you proposed in your previous mail. I also found out that just configuring LIO is not enough to trigger the crash -- the crash is triggered by iSCSI discovery. This is how the crash can be reproduced on the test system: * Download and compile Linux kernel 2.6.23.14 from kernel.org. * Download and compile the Linux iSCSI target (IP: 10.100.100.10). * Configure LIO (see below). * Run the following command on another host on which open-iscsi is installed (10.100.100.12): for ((i=0;i<100;i++)); do iscsiadm -m discovery -t sendtargets -p 10.100.100.10; done
The following message appears in the kernel log (dmesg) of the target system. A few seconds later the target system freezes (blinking caps lock and scroll lock LEDs):
On Mon, 2008-02-04 at 08:39 +0100, Bart Van Assche wrote: > On Feb 2, 2008 5:42 AM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> > Did the backtrace look similar as the first one..?
> By this time I have modified the LIO configuration as you proposed in > your previous mail. I also found out that just configuring LIO is not > enough to trigger the crash -- the crash is triggered by iSCSI > discovery. This is how the crash can be reproduced on the test system: > * Download and compile Linux kernel 2.6.23.14 from kernel.org. > * Download and compile the Linux iSCSI target (IP: 10.100.100.10). > * Configure LIO (see below). > * Run the following command on another host on which open-iscsi is > installed (10.100.100.12): for ((i=0;i<100;i++)); do iscsiadm -m > discovery -t sendtargets -p 10.100.100.10; done
> The following message appears in the kernel log (dmesg) of the target > system. A few seconds later the target system freezes (blinking caps > lock and scroll lock LEDs):
iscsiadm is restarting a new discovery session each time, yes?
There where some changes for N->N mapping between network portals and and portal groups in the target frontend. I did have to make changes between v2.8 -> v2.9 for the iSCSI target codebase, so it could be breakage in Discovery SC/S iSCSI Login -> Discovery Session PDU/RSP -> Logout -> Repeat statemachine order. I will have a look and see what I can find.
The same many/constant Login -> PDU/RSP -> Logout statemachine case has been tested recently against v2.9 with normal (non discovery) traditional iSCSI sessions and connections and was fine, so I am guessing this is related to the changes. Thanks for reporting this.
--nab
I updated trunk/target to 2.6.24 btw, and will posting some more details on the changes to LIO SE, as I think it may be interesting for discussion purposes of the target mode storage engine. Also have you been able to move stable SessionType=Normal traffic on your setup with IPoIB..?
On Feb 4, 2008 1:28 PM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> I updated trunk/target to 2.6.24 btw, and will posting some more details > on the changes to LIO SE, as I think it may be interesting for > discussion purposes of the target mode storage engine. Also have you > been able to move stable SessionType=Normal traffic on your setup with > IPoIB..?
I have not yet been able to let the open-iscsi initiator talk to the LIO iSCSI target. Were the LIO configuration commands that I previously posted correct ?
I'm afraid that it will take some time before I can perform definitive IPoIB measurements: my impression from the tests I have done with iperf is that there is still a bug present in the Linux IPoIB implementation that negatively impacts performance. See also http://bugzilla.kernel.org/show_bug.cgi?id=9883.
On Mon, 2008-02-04 at 14:37 +0100, Bart Van Assche wrote: > On Feb 4, 2008 1:28 PM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> > I updated trunk/target to 2.6.24 btw, and will posting some more details > > on the changes to LIO SE, as I think it may be interesting for > > discussion purposes of the target mode storage engine. Also have you > > been able to move stable SessionType=Normal traffic on your setup with > > IPoIB..?
> I have not yet been able to let the open-iscsi initiator talk to the > LIO iSCSI target. Were the LIO configuration commands that I > previously posted correct ?
>> snip > * Run the following command on another host on which open-iscsi is > installed (10.100.100.12): for ((i=0;i<100;i++)); do iscsiadm -m > discovery -t sendtargets -p 10.100.100.10; done
Can you get up and running with a single discovery session ok..?
> I'm afraid that it will take some time before I can perform definitive > IPoIB measurements: my impression from the tests I have done with > iperf is that there is still a bug present in the Linux IPoIB > implementation that negatively impacts performance. See also > http://bugzilla.kernel.org/show_bug.cgi?id=9883.
I recall someone getting iWARP + SDP working on a hardware RNIC OpenIB-General recently. I have a set of the AMSO110 boards that I have been meaning to give a shot with with the latest OFA code.
On Feb 4, 2008 2:47 PM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> > * Run the following command on another host on which open-iscsi is > > installed (10.100.100.12): for ((i=0;i<100;i++)); do iscsiadm -m > > discovery -t sendtargets -p 10.100.100.10; done
> Can you get up and running with a single discovery session ok..?
iscsiadm did not report any targets, so I have not yet been able to log in. Furthermore, the kernel crash reports seem to indicate memory corruption, so I'd like to see this fixed before performing further tests.
> > I'm afraid that it will take some time before I can perform definitive > > IPoIB measurements: my impression from the tests I have done with > > iperf is that there is still a bug present in the Linux IPoIB > > implementation that negatively impacts performance. See also > > http://bugzilla.kernel.org/show_bug.cgi?id=9883.
> I recall someone getting iWARP + SDP working on a hardware RNIC > OpenIB-General recently. I have a set of the AMSO110 boards that I have > been meaning to give a shot with with the latest OFA code.
On Mon, 2008-02-04 at 14:59 +0100, Bart Van Assche wrote: > On Feb 4, 2008 2:47 PM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> > > * Run the following command on another host on which open-iscsi is > > > installed (10.100.100.12): for ((i=0;i<100;i++)); do iscsiadm -m > > > discovery -t sendtargets -p 10.100.100.10; done
> > Can you get up and running with a single discovery session ok..?
> iscsiadm did not report any targets, so I have not yet been able to > log in. Furthermore, the kernel crash reports seem to indicate memory > corruption, so I'd like to see this fixed before performing further > tests.
So this is throwing the general protection fault on the very first discovery session on your non IPoIB portal..? Also, what does your open-iscsi version info look like btw..?
I will have a look at this on in VM later today.
> > > I'm afraid that it will take some time before I can perform definitive > > > IPoIB measurements: my impression from the tests I have done with > > > iperf is that there is still a bug present in the Linux IPoIB > > > implementation that negatively impacts performance. See also > > > http://bugzilla.kernel.org/show_bug.cgi?id=9883.
> > I recall someone getting iWARP + SDP working on a hardware RNIC > > OpenIB-General recently. I have a set of the AMSO110 boards that I have > > been meaning to give a shot with with the latest OFA code.
On Feb 4, 2008 3:30 PM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> So this is throwing the general protection fault on the very first > discovery session on your non IPoIB portal..? Also, what does your > open-iscsi version info look like btw..?
> I will have a look at this on in VM later today.
Sometimes the GPF happens on the first discovery session, sometimes later on. According to dpkg-query the open-iscsi version is 2.0.865-1:
$ dpkg-query -s open-iscsi Package: open-iscsi Status: install ok installed Priority: optional Section: net Installed-Size: 592 Maintainer: Ubuntu MOTU Developers <ubuntu-m...@lists.ubuntu.com> Architecture: amd64 Version: 2.0.865-1 Depends: libc6 (>= 2.5-5) Conffiles: /etc/init.d/open-iscsi 321465094b3226ea4cfce3614da220b5 /etc/iscsi/iscsid.conf 305a0f218fe196cdad62f6ac3a1e7e45 /etc/iscsi/initiatorname.iscsi 300e739ab922027433765db3a88921c1 Description: High performance, transport independent iSCSI implementation iSCSI is a network protocol standard that allows the use of the SCSI protocol over TCP/IP networks. This implementation follows RFC3720. . Homepage: http://www.open-iscsi.org/ Original-Maintainer: Philipp Hug <deb...@hug.cx>