Strange mptbase / mptscsih kernel messages

Bart Van Assche

unread,

May 8, 2008, 2:36:30 AM5/8/08

to LKML

Hello,

I have made a setup with four servers, where each server is configured
as follows:
* Four Intel Xeon E5130 CPU cores.
* 8 GB RAM.
* SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
PCI-Express Fusion-MPT SAS (rev 04),
* 16 disks in a RAID6 setup (md).
* Linux 2.6.24.6 kernel.

There is a lot of data being written to the RAID6 array: about 50 MB/s
on each server. There are two kinds of messages that appear:

(1)
[74887.117650] mptbase: ioc0: LogInfo(0x31120403): Originator={PL},
Code={Abort}, SubCode(0x0403)

(2)
[74917.081454] mptscsih: ioc0: attempting task abort! (sc=ffff8100a18a7180)
[74917.081461] sd 0:0:15:0: [sdp] CDB: Write(10): 2a 00 1c fe 61 93 00 00 20 00
[74918.409801] mptbase: ioc0: LogInfo(0x31140000): Originator={PL},
Code={IO Executed}, SubCode(0x0000)
[74918.639801] mptscsih: ioc0: task abort: SUCCESS (sc=ffff8100a18a7180)

These messages appear a few times per day. Anyone any idea what the
meaning is of these messages and what the cause of these messages is ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Andrew Morton

unread,

May 8, 2008, 4:27:43 AM5/8/08

to Bart Van Assche, LKML, Moore, Eric Dean, linux...@vger.kernel.org

On Thu, 8 May 2008 08:36:03 +0200 "Bart Van Assche" <bart.va...@gmail.com> wrote:

> Hello,
>
> I have made a setup with four servers, where each server is configured
> as follows:
> * Four Intel Xeon E5130 CPU cores.
> * 8 GB RAM.
> * SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
> PCI-Express Fusion-MPT SAS (rev 04),
> * 16 disks in a RAID6 setup (md).
> * Linux 2.6.24.6 kernel.
>
> There is a lot of data being written to the RAID6 array: about 50 MB/s
> on each server. There are two kinds of messages that appear:
>
> (1)
> [74887.117650] mptbase: ioc0: LogInfo(0x31120403): Originator={PL},
> Code={Abort}, SubCode(0x0403)
>
> (2)
> [74917.081454] mptscsih: ioc0: attempting task abort! (sc=ffff8100a18a7180)
> [74917.081461] sd 0:0:15:0: [sdp] CDB: Write(10): 2a 00 1c fe 61 93 00 00 20 00
> [74918.409801] mptbase: ioc0: LogInfo(0x31140000): Originator={PL},
> Code={IO Executed}, SubCode(0x0000)
> [74918.639801] mptscsih: ioc0: task abort: SUCCESS (sc=ffff8100a18a7180)
>
> These messages appear a few times per day. Anyone any idea what the
> meaning is of these messages and what the cause of these messages is ?
>

(suitable cc's added)

Prakash, Sathya

unread,

May 8, 2008, 4:35:34 AM5/8/08

to Andrew Morton, Bart Van Assche, LKML, Moore, Eric, linux...@vger.kernel.org

Hi,
The meaning of 1 message is some fram transmit error encountered by
hardware and the I/O request is aborted by firmware because of the
error,
The second message indicates, some I/O got timed out and the SML tries
to abort the request and the firmware completes the I/O before aborting
that. Hence returns IO executed message and the driver completes the
abort as success.
Suspecting some bad hardware in the topology(cables?)
Thanks
sathya

(suitable cc's added)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in

Bart Van Assche

unread,

Jun 5, 2008, 9:58:23 AM6/5/08

to Prakash, Sathya, Andrew Morton, LKML, Moore, Eric, linux...@vger.kernel.org

On Thu, May 8, 2008 at 10:33 AM, Prakash, Sathya <Sathya....@lsi.com> wrote:
> The meaning of 1 message is some fram transmit error encountered by
> hardware and the I/O request is aborted by firmware because of the
> error,
> The second message indicates, some I/O got timed out and the SML tries
> to abort the request and the firmware completes the I/O before aborting
> that. Hence returns IO executed message and the driver completes the
> abort as success.
> Suspecting some bad hardware in the topology(cables?)

Hello Sathya,

It took some time before I could have a closer look at the system on
which I observed the strange kernel messages. Apparently the RAID
controller (LSISAS3081E ?) is not connected directly to the 16 disks
but via a SAS expander (Super Micro SC836 SAS Backplane with two LSI
SASX28 Expander Chips --
http://www.supermicro.com/products/chassis/3U/836/SC836E2-R800.cfm).
It will be a challenge to find out which component triggered the
kernel messages and how to make the storage subsystem work perfectly.
Any hint is welcome.

Bart.