exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

li...@dusted.dk

unread,

Oct 10, 2007, 4:30:13 AM10/10/07

to

I get this on brand new hardware, 2xHitachi Deathstar 320gb SATA2
(sata_via driver)

I get this a lot, the disk makes some sound after heavy IO and then the
system hangs for a few seconds, then this comes up:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd 25/00:00:3f:76:30/00:04:00:00:00/e0 tag 0 cdb 0x0 data 524288 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1: port is slow to respond, please be patient (Status 0xd0)
ata1: soft resetting port
ata1.00: configured for UDMA/133
ata1: EH complete
sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
raid1: Disk failure on sdb1, disabling device.

This is on kernel 2.6.23

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Greg Cormier

unread,

Oct 10, 2007, 3:20:16 PM10/10/07

to

I'd like to hop in on this, and add my similar problem. This is my
first post so please excuse me if I'm doing something wrong.

I've been having issues recently (couple of weeks?) with my server. I
have three WD5000YS (500gb) drives in RAID5, on an Asus A8N
motherboard which is nForce 4. I've even RMA'd one of the drives, but
now I'm thinking the drives are fine.

The drive seems to have issues under heavy to moderate IO. I unmounted
my raid, and forced an e2fsck. e2fsck didn't even print anything out,
I got this.

Oct 10 14:50:40 zeus kernel: ata3: EH in ADMA mode, notifier 0x0
notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0
next cpb idx 0x0
Oct 10 14:50:40 zeus kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x2
Oct 10 14:50:40 zeus kernel: ata3: timeout waiting for ADMA IDLE, stat=0x400
Oct 10 14:50:40 zeus kernel: ata3: timeout waiting for ADMA LEGACY, stat=0x400
Oct 10 14:50:40 zeus kernel: ata3.00: exception Emask 0x0 SAct 0x1
SErr 0x1c00000 action 0x2 frozen
Oct 10 14:50:40 zeus kernel: ata3.00: cmd
61/08:00:bf:4b:38/00:00:3a:00:00/40 tag 0 cdb 0x0 data 4096 out
Oct 10 14:50:40 zeus kernel: res
40/00:f2:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 10 14:50:40 zeus kernel: ata3: soft resetting port
Oct 10 14:50:40 zeus kernel: ata3: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Oct 10 14:50:40 zeus kernel: ata3.00: configured for UDMA/133
Oct 10 14:50:40 zeus kernel: ata3: EH complete
Oct 10 14:50:40 zeus kernel: sd 2:0:0:0: [sdb] 976773168 512-byte
hardware sectors (500108 MB)
Oct 10 14:50:40 zeus kernel: sd 2:0:0:0: [sdb] Write Protect is off
Oct 10 14:50:40 zeus kernel: sd 2:0:0:0: [sdb] Write cache: enabled,

read cache: enabled, doesn't support DPO or FUA

Oct 10 14:51:40 zeus kernel: ata3: EH in ADMA mode, notifier 0x0
notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0
next cpb idx 0x0
Oct 10 14:51:40 zeus kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x2
Oct 10 14:51:40 zeus kernel: ata3: timeout waiting for ADMA IDLE, stat=0x400
Oct 10 14:51:40 zeus kernel: ata3: timeout waiting for ADMA LEGACY, stat=0x400
Oct 10 14:51:40 zeus kernel: ata3.00: exception Emask 0x0 SAct 0x1
SErr 0x400000 action 0x2 frozen
Oct 10 14:51:40 zeus kernel: ata3.00: cmd
61/08:00:bf:4b:38/00:00:3a:00:00/40 tag 0 cdb 0x0 data 4096 out
Oct 10 14:51:40 zeus kernel: res
40/00:f2:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 10 14:51:41 zeus kernel: ata3: soft resetting port
Oct 10 14:51:41 zeus kernel: ata3: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Oct 10 14:51:41 zeus kernel: ata3.00: configured for UDMA/133
Oct 10 14:51:41 zeus kernel: ata3: EH complete
Oct 10 14:51:41 zeus kernel: sd 2:0:0:0: [sdb] 976773168 512-byte
hardware sectors (500108 MB)
Oct 10 14:51:41 zeus kernel: sd 2:0:0:0: [sdb] Write Protect is off
Oct 10 14:51:41 zeus kernel: sd 2:0:0:0: [sdb] Write cache: enabled,

read cache: enabled, doesn't support DPO or FUA

Oct 10 14:52:19 zeus kernel: device eth0 left promiscuous mode
Oct 10 14:52:41 zeus kernel: ata3: EH in ADMA mode, notifier 0x0
notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0
next cpb idx 0x0
Oct 10 14:52:41 zeus kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x2
Oct 10 14:52:41 zeus kernel: ata3: timeout waiting for ADMA IDLE, stat=0x400
Oct 10 14:52:41 zeus kernel: ata3: timeout waiting for ADMA LEGACY, stat=0x400
Oct 10 14:52:41 zeus kernel: ata3.00: exception Emask 0x0 SAct 0x1
SErr 0x400000 action 0x2 frozen
Oct 10 14:52:41 zeus kernel: ata3.00: cmd
61/08:00:bf:4b:38/00:00:3a:00:00/40 tag 0 cdb 0x0 data 4096 out
Oct 10 14:52:41 zeus kernel: res
40/00:f2:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 10 14:52:41 zeus kernel: ata3: soft resetting port
Oct 10 14:52:42 zeus kernel: ata3: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Oct 10 14:52:42 zeus kernel: ata3.00: configured for UDMA/133
Oct 10 14:52:42 zeus kernel: ata3: EH complete
Oct 10 14:52:42 zeus kernel: sd 2:0:0:0: [sdb] 976773168 512-byte
hardware sectors (500108 MB)
Oct 10 14:52:42 zeus kernel: sd 2:0:0:0: [sdb] Write Protect is off
Oct 10 14:52:42 zeus kernel: sd 2:0:0:0: [sdb] Write cache: enabled,

read cache: enabled, doesn't support DPO or FUA

These errors have been happening on various .22 kernels, and this
message is from the hot-off-the-press .23 kernel. This message is
followed by a hard freeze.

I'm in the process of figuring out why netconsole isn't quite working,
so hopefully I can provide more information soon. The server is
currently frozen, when I get home I can perhaps provide more
information? lspci?

Looks like another rebuild of the array when I get home.

Thanks,
Greg

Andrew Morton

unread,

Oct 12, 2007, 8:50:09 PM10/12/07

to

On Wed, 10 Oct 2007 10:28:45 +0200 (CEST)
li...@dusted.dk wrote:

> I get this on brand new hardware, 2xHitachi Deathstar 320gb SATA2
> (sata_via driver)
>
> I get this a lot, the disk makes some sound after heavy IO and then the
> system hangs for a few seconds, then this comes up:
>
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata1.00: cmd 25/00:00:3f:76:30/00:04:00:00:00/e0 tag 0 cdb 0x0 data 524288 in
> res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata1: port is slow to respond, please be patient (Status 0xd0)
> ata1: soft resetting port
> ata1.00: configured for UDMA/133
> ata1: EH complete
> sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> raid1: Disk failure on sdb1, disabling device.
>
> This is on kernel 2.6.23
>

(added linux-ide)

Andrew Morton

unread,

Oct 12, 2007, 9:00:07 PM10/12/07

to

On Wed, 10 Oct 2007 15:17:19 -0400
"Greg Cormier" <gcor...@gmail.com> wrote:

> I'd like to hop in on this, and add my similar problem. This is my
> first post so please excuse me if I'm doing something wrong.

Please cc linu...@vger.kernel.org on ide, sata and pata reports.

A "hard freeze" is fairly serious.

Tejun Heo

unread,

Oct 23, 2007, 6:00:20 AM10/23/07

to

Hello,

Steen Eugen Poulsen wrote:
> Sep 28 04:32:40 locker ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0
> action 0x2 frozen
> Sep 28 04:32:40 locker ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00
> tag 0 cdb 0x0 data 123392 in
> Sep 28 04:32:40 locker res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask
> 0x202 (HSM violation)
[--snip--]
> Another machine:
>
> Sep 28 03:47:55 dragonslair ata1.00: exception Emask 0x0 SAct 0x0 SErr
> 0x0 action 0x2 frozen
> Sep 28 03:47:55 dragonslair ata1.00: cmd
> b0/db:f8:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 126976 in
>
> Sep 28 03:47:55 dragonslair res 50/00:f8:00:4f:c2/00:00:00:00:00/00
> Emask 0x202 (HSM violation)
[--snip--]
> Sep 28 04:33:52 liferaft kernel: ata1.00: exception Emask 0x0 SAct 0x0

> SErr 0x0 action 0x2 frozen

> Sep 28 04:33:55 liferaft kernel: ata1.00: cmd
> b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 123392 in
> Sep 28 04:33:55 liferaft kernel: res
> 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM violation)
> Sep 28 04:33:55 liferaft kernel: ata1: soft resetting port
> Sep 28 04:33:55 liferaft kernel: ata1: SATA link up 3.0 Gbps (SStatus
> 123 SControl 300)
> Sep 28 04:33:55 liferaft kernel: ata1.00: configured for UDMA/133
> Sep 28 04:33:55 liferaft kernel: ata1: EH complete

All these are caused by smartd. Updating should fix the problem.

> Note 2: The hardware didn't freeze for me and I believe the freeze is do
> to swap breaking due to the errors.

Above HSM violations should be harmless other than those messages.
libata resets the devices and should just go on.

> Note 3: dragonslair's harddisk actually crashed, kernel didn't die, it
> just remounted read only. Reboot and the disk was missing, more reboot
> and the machine started with all disks running again, been stable since
> the 28th Sep. (knock on wood)

libata EH can't really recover from actual hardware failures but some
drives come back on if you hot unplug and then replug it.

--
tejun

Tejun Heo

unread,

Oct 25, 2007, 9:50:06 PM10/25/07

to

[please don't drop cc. restored]

Steen Eugen Poulsen wrote:
> Tejun Heo skrev:

>> All these are caused by smartd. Updating should fix the problem.
>

> Okay, but there is no newer smartd than what I'm using. (5.37)

Bruce? Original thread can be read from...

http://thread.gmane.org/gmane.linux.kernel/588972

Jim Paris

unread,

Oct 26, 2007, 12:10:08 AM10/26/07

to

Tejun Heo wrote:
> [please don't drop cc. restored]
>
> Steen Eugen Poulsen wrote:
> >Tejun Heo skrev:
> >>All these are caused by smartd. Updating should fix the problem.
> >
> >Okay, but there is no newer smartd than what I'm using. (5.37)
>
> Bruce? Original thread can be read from...
>
> http://thread.gmane.org/gmane.linux.kernel/588972

The fixes were added in smartmontools CVS, but there hasn't been a
release since then.

-jim

Bruce Allen

unread,

Nov 6, 2007, 5:10:03 AM11/6/07

to

>>>> All these are caused by smartd. Updating should fix the problem.
>>>
>>> Okay, but there is no newer smartd than what I'm using. (5.37)
>>
>> Bruce? Original thread can be read from...
>>
>> http://thread.gmane.org/gmane.linux.kernel/588972
>
> The fixes were added in smartmontools CVS, but there hasn't been a
> release since then.

I think we'll do a new smartmontools release fairly soon.

Cheers,
Bruce