I have retested the issue I encountered a few days ago with kernel 2.6.24 and LIO 2.9.0.209, but unfortunately, this still triggers a kernel crash ... This happens systematically after the first attempt to perform iSCSI discovery with the open-iscsi command line tools.
Received iSCSI login request from 192.168.102.12 on TCP Network Portal 192.168.102.10:3260 Set np->np_login_tpg to ffff81015f37ba00 ------------------------------------------------------------------ HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 32768 IFMarker: No OFMarker: No ------------------------------------------------------------------ ------------------------------------------------------------------ InitiatorName: iqn.1993-08.org.debian:01:b5698b924985 TargetAlias: iSBE Target InitiatorAlias: INF012 TargetPortalGroupTag: 0 DefaultTime2Wait: 2 DefaultTime2Retain: 0 ErrorRecoveryLevel: 0 SessionType: Discovery ------------------------------------------------------------------ iSCSI Login successful on CID: 0 from 192.168.102.12 to 192.168.102.10:3260,0 Incremented iSCSI Connection count to 1 from node: iqn.1993-08.org.debian:01:b5698b924985 Established iSCSI session from node: iqn.1993-08.org.debian:01:b5698b924985 Incremented number of active iSCSI sessions to 1 on iSCSI Target Portal Group: 0 Cleared np->np_login_tpg Decremented iSCSI connection count to 0 from node: iqn.1993-08.org.debian:01:b5698b924985 Released iSCSI session from node: iqn.1993-08.org.debian:01:b5698b924985 Decremented number of active iSCSI Sessions on iSCSI TPG: 0 to 0
On Fri, 2008-02-08 at 15:40 +0100, Bart Van Assche wrote: > I have retested the issue I encountered a few days ago with kernel > 2.6.24 and LIO 2.9.0.209, but unfortunately, this still triggers a > kernel crash ... This happens systematically after the first attempt > to perform iSCSI discovery with the open-iscsi command line tools.
for ((i=0;i<1000;i++)); do iscsiadm -m discovery -t sendtargets -p 172.16.201.129; done
on a stock ubuntu 7.10 i386 running 2.6.22-14-generic with the default Open/iSCSI implementation. I just ran the 1000 loop iteration a dozen times, and everything looks fine against with the Debian LIO-VM which is running 2.9.0.188. As I mentioned previously, nothing has changed in that related to SendTargets from .180 to CURRENT, or anything else that I can think of that would cause these types of reproduceable general protection faults in the discovery path with any initiator..
This leads me to believe that there is still something messed up about your LIO builds if running over traditional iSCSI with a known working configuration is still not working. (ie: non IPoIB, for the moment, but as we can both agree, this should not make any difference). Can you triple check that your running kernel and the source that the module is being built against match kernel .config and compiler versions..?
Also, please send me your kernel configuration for 2.6.24 (are you copying your .config between versions..?), as your issue seems to be irrelivent of kernel version. I will go ahead and do a 2.6.24 build using your .config and see if I can see an issue inside of a LIO-VM.
> for ((i=0;i<1000;i++)); do > iscsiadm -m discovery -t sendtargets -p 172.16.201.129; > done
> on a stock ubuntu 7.10 i386 running 2.6.22-14-generic with the default > Open/iSCSI implementation. I just ran the 1000 loop iteration a dozen > times, and everything looks fine against with the Debian LIO-VM which is > running 2.9.0.188. As I mentioned previously, nothing has changed in > that related to SendTargets from .180 to CURRENT, or anything else that > I can think of that would cause these types of reproduceable general > protection faults in the discovery path with any initiator..
> This leads me to believe that there is still something messed up about > your LIO builds if running over traditional iSCSI with a known working > configuration is still not working. (ie: non IPoIB, for the moment, but > as we can both agree, this should not make any difference). Can you > triple check that your running kernel and the source that the module is > being built against match kernel .config and compiler versions..?
I have tested LIO-SE on an x86_64 system, not on an i386 system. I saw several compiler warnings during compilation of the LIO-SE kernel module (iscsi_target_mod). Should I send these compiler warnings to you ?
Regarding potential mismatches between running kernel and the source that the module is being built against: we can safely exclude this as a potential cause. I have seen the kernel crashes triggered by iscsi_target_mod with three different kernel versions (2.6.22.9, 2.6.23.14 and 2.6.24). Each time I had recompiled kernel and iscsi_target_mod by removing the whole source tree before starting any compilation steps. Note: before starting compilation, I first applied the following patches (obtained from https://scst.svn.sourceforge.net/svnroot/scst/trunk): * scst_exec_req_fifo-2.6.24.patch * iscsi-scst/kernel/patches/put_page_callback-2.6.24.patch
> Also, please send me your kernel configuration for 2.6.24 (are you > copying your .config between versions..?), as your issue seems to be > irrelevant of kernel version. I will go ahead and do a 2.6.24 build > using your .config and see if I can see an issue inside of a LIO-VM.
You can find the kernel config I used below. This config was obtained by updating an older kernel config via "make oldconfig".
Bart.
$ wc .config 3495 5941 76131 .config $ cat .config # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24 # Thu Feb 7 15:15:01 2008 # CONFIG_64BIT=y # CONFIG_X86_32 is not set CONFIG_X86_64=y CONFIG_X86=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y # CONFIG_QUICKLIST is not set CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_RWSEM_GENERIC_SPINLOCK=y # CONFIG_RWSEM_XCHGADD_ALGORITHM is not set # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_ARCH_SUPPORTS_OPROFILE=y CONFIG_ZONE_DMA32=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_X86_HT=y # CONFIG_KTIME_SCALAR is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
# # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set # CONFIG_PID_NS is not set CONFIG_AUDIT=y # CONFIG_AUDITSYSCALL is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=17 # CONFIG_CGROUPS is not set CONFIG_FAIR_GROUP_SCHED=y CONFIG_FAIR_USER_SCHED=y # CONFIG_FAIR_CGROUP_SCHED is not set CONFIG_SYSFS_DEPRECATED=y CONFIG_RELAY=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y CONFIG_EMBEDDED=y CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y # CONFIG_SLUB_DEBUG is not set # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_SLABINFO=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set CONFIG_MODVERSIONS=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y CONFIG_BLK_DEV_IO_TRACE=y # CONFIG_BLK_DEV_BSG is not set CONFIG_BLOCK_COMPAT=y
# # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set CONFIG_DEFAULT_DEADLINE=y # CONFIG_DEFAULT_CFQ is not set # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="deadline" CONFIG_PREEMPT_NOTIFIERS=y
# # Processor type and features # # CONFIG_TICK_ONESHOT is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_SMP=y CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_X86_VSMP is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set # CONFIG_MPSC is not set # CONFIG_MCORE2 is not set CONFIG_GENERIC_CPU=y CONFIG_X86_L1_CACHE_BYTES=128 CONFIG_X86_INTERNODE_CACHE_BYTES=128 CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_CACHE_SHIFT=7 CONFIG_X86_GOOD_APIC=y CONFIG_X86_TSC=y CONFIG_X86_MINIMUM_CPU_FAMILY=64 CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_GART_IOMMU=y CONFIG_CALGARY_IOMMU=y CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT=y CONFIG_SWIOTLB=y CONFIG_NR_CPUS=4 CONFIG_SCHED_SMT=y CONFIG_SCHED_MC=y CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set # CONFIG_PREEMPT_BKL is not set CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_MCE=y CONFIG_X86_MCE_INTEL=y CONFIG_X86_MCE_AMD=y CONFIG_MICROCODE=m CONFIG_MICROCODE_OLD_INTERFACE=y CONFIG_X86_MSR=m CONFIG_X86_CPUID=m # CONFIG_NUMA is not set CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_RESOURCES_64BIT=y CONFIG_ZONE_DMA_FLAG=1 CONFIG_BOUNCE=y CONFIG_VIRT_TO_BUS=y CONFIG_MTRR=y CONFIG_SECCOMP=y # CONFIG_CC_STACKPROTECTOR is not set CONFIG_HZ_100=y # CONFIG_HZ_250 is not set # CONFIG_HZ_300 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=100 CONFIG_KEXEC=y CONFIG_CRASH_DUMP=y CONFIG_PHYSICAL_START=0x200000 CONFIG_RELOCATABLE=y CONFIG_PHYSICAL_ALIGN=0x200000 CONFIG_HOTPLUG_CPU=y CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
# # Power management options # # CONFIG_PM is not set CONFIG_SUSPEND_SMP_POSSIBLE=y CONFIG_HIBERNATION_SMP_POSSIBLE=y
# # CPU Frequency scaling # # CONFIG_CPU_FREQ is not set # CONFIG_CPU_IDLE is not set
# # Bus options (PCI etc.) # CONFIG_PCI=y CONFIG_PCI_DIRECT=y CONFIG_PCI_DOMAINS=y CONFIG_PCIEPORTBUS=y CONFIG_HOTPLUG_PCI_PCIE=m CONFIG_PCIEAER=y CONFIG_ARCH_SUPPORTS_MSI=y CONFIG_PCI_MSI=y CONFIG_PCI_LEGACY=y CONFIG_HT_IRQ=y CONFIG_ISA_DMA_API=y CONFIG_K8_NB=y CONFIG_PCCARD=m # CONFIG_PCMCIA_DEBUG is not set CONFIG_PCMCIA=m CONFIG_PCMCIA_LOAD_CIS=y CONFIG_PCMCIA_IOCTL=y CONFIG_CARDBUS=y
# # Executable file formats / Emulations # CONFIG_BINFMT_ELF=y CONFIG_BINFMT_MISC=m CONFIG_IA32_EMULATION=y # CONFIG_IA32_AOUT is not set CONFIG_COMPAT=y CONFIG_COMPAT_FOR_U64_ALIGNMENT=y CONFIG_SYSVIPC_COMPAT=y
# # Networking # CONFIG_NET=y
# # Networking options # CONFIG_PACKET=m CONFIG_PACKET_MMAP=y CONFIG_UNIX=y CONFIG_XFRM=y CONFIG_XFRM_USER=m # CONFIG_XFRM_SUB_POLICY is not set # CONFIG_XFRM_MIGRATE is not set CONFIG_NET_KEY=m # CONFIG_NET_KEY_MIGRATE is not set CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_ADVANCED_ROUTER=y CONFIG_ASK_IP_FIB_HASH=y # CONFIG_IP_FIB_TRIE is not set CONFIG_IP_FIB_HASH=y CONFIG_IP_MULTIPLE_TABLES=y CONFIG_IP_ROUTE_MULTIPATH=y CONFIG_IP_ROUTE_VERBOSE=y # CONFIG_IP_PNP is not set CONFIG_NET_IPIP=m CONFIG_NET_IPGRE=m CONFIG_NET_IPGRE_BROADCAST=y CONFIG_IP_MROUTE=y CONFIG_IP_PIMSM_V1=y CONFIG_IP_PIMSM_V2=y # CONFIG_ARPD is not set CONFIG_SYN_COOKIES=y CONFIG_INET_AH=m
...
On Mon, 2008-02-11 at 08:55 +0100, Bart Van Assche wrote: > On Feb 9, 2008 2:04 AM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote: > > I just ran:
> > for ((i=0;i<1000;i++)); do > > iscsiadm -m discovery -t sendtargets -p 172.16.201.129; > > done
> > on a stock ubuntu 7.10 i386 running 2.6.22-14-generic with the default > > Open/iSCSI implementation. I just ran the 1000 loop iteration a dozen > > times, and everything looks fine against with the Debian LIO-VM which is > > running 2.9.0.188. As I mentioned previously, nothing has changed in > > that related to SendTargets from .180 to CURRENT, or anything else that > > I can think of that would cause these types of reproduceable general > > protection faults in the discovery path with any initiator..
> > This leads me to believe that there is still something messed up about > > your LIO builds if running over traditional iSCSI with a known working > > configuration is still not working. (ie: non IPoIB, for the moment, but > > as we can both agree, this should not make any difference). Can you > > triple check that your running kernel and the source that the module is > > being built against match kernel .config and compiler versions..?
> I have tested LIO-SE on an x86_64 system, not on an i386 system. I saw > several compiler warnings during compilation of the LIO-SE kernel > module (iscsi_target_mod). Should I send these compiler warnings to > you ?
Sure, you can send me these offline if you like..
> Regarding potential mismatches between running kernel and the source > that the module is being built against: we can safely exclude this as > a potential cause. I have seen the kernel crashes triggered by > iscsi_target_mod with three different kernel versions (2.6.22.9, > 2.6.23.14 and 2.6.24). Each time I had recompiled kernel and > iscsi_target_mod by removing the whole source tree before starting any > compilation steps. Note: before starting compilation, I first applied > the following patches (obtained from > https://scst.svn.sourceforge.net/svnroot/scst/trunk): > * scst_exec_req_fifo-2.6.24.patch > * iscsi-scst/kernel/patches/put_page_callback-2.6.24.patch
Hmm, the first patch does touch code that LIO-SE uses for the PSCSI plugin (the SCST patch makes scsi_execute_async() inlined), but I can't see why this would be an issue. I am will be using a fresh 2.6.24 from kernel.org for my first test with your config, without these SCST two patches.
> > Also, please send me your kernel configuration for 2.6.24 (are you > > copying your .config between versions..?), as your issue seems to be > > irrelevant of kernel version. I will go ahead and do a 2.6.24 build > > using your .config and see if I can see an issue inside of a LIO-VM.
> You can find the kernel config I used below. This config was obtained > by updating an older kernel config via "make oldconfig".
I can't think of why copying the config would be an issue, but we are basically down to seperate patches causing the issue, or something with your config. I will have a look at your config and give it a try on a x86_64 VM with the same Open/iSCSI test.
--nat PS: I am still running fine on 2.6.24 ppc64 from ps3-linux from last week with typical usage btw..
> > Regarding potential mismatches between running kernel and the source > > that the module is being built against: we can safely exclude this as > > a potential cause. I have seen the kernel crashes triggered by > > iscsi_target_mod with three different kernel versions (2.6.22.9, > > 2.6.23.14 and 2.6.24). Each time I had recompiled kernel and > > iscsi_target_mod by removing the whole source tree before starting any > > compilation steps. Note: before starting compilation, I first applied > > the following patches (obtained from > > https://scst.svn.sourceforge.net/svnroot/scst/trunk): > > * scst_exec_req_fifo-2.6.24.patch > > * iscsi-scst/kernel/patches/put_page_callback-2.6.24.patch
> Hmm, the first patch does touch code that LIO-SE uses for the PSCSI > plugin (the SCST patch makes scsi_execute_async() inlined), but I can't > see why this would be an issue. I am will be using a fresh 2.6.24 from > kernel.org for my first test with your config, without these SCST two > patches.
By this time I have been able to reproduce the crash with an unmodified 2.6.24 kernel, so the crash is not related to the SCST patches.
On Feb 11, 2008 10:02 AM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> PS: I am still running fine on 2.6.24 ppc64 from ps3-linux from last > week with typical usage btw..
Hello Nicholas,
Have you tested the LIO-SE kernel module with kernel debugging enabled on the 2.6.24 kernel ? This is what I get while configuring the LIO-SE kernel module on a 2.6.24.2 kernel with kernel debugging enabled (no iSCSI discovery has yet been attempted):
Bart Van Assche wrote:
> On Feb 11, 2008 10:02 AM, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote:
> > PS: I am still running fine on 2.6.24 ppc64 from ps3-linux from last
> > week with typical usage btw..
> Hello Nicholas,
> Have you tested the LIO-SE kernel module with kernel debugging enabled
> on the 2.6.24 kernel ? This is what I get while configuring the LIO-SE
> kernel module on a 2.6.24.2 kernel with kernel debugging enabled (no
> iSCSI discovery has yet been attempted):
I added proper usage of sg_init_table() and sg_mark_end() within the
LIO code, which is what is causing the BUG() with createvirtdev. I am
now recompiling with CONFIG_DEBUG_SG and will let you know what I
find.
I believe this BUG() only exists with the changes in scatterlist.h in
2.6.24 btw, so I still can't explain the general protection faults..
Also, I was able to reproduce the GFPs with discovery and your
config. I will let you know if my changes resolve the other issue or
if I need to enable more debug code and put kdb into the VM.
Thanks again for spending the extra time to help track this down..