disk error / reboot / 6.3

0 views
Skip to first unread message

jerome

unread,
Dec 21, 2008, 5:04:02 PM12/21/08
to freebsd-...@freebsd.org


Hi,

We are running 6.3 on a fileserver with a couple of data disks.

Once the server encounters an error on a data disk (os disk is separate) the server will reset itself without warning.

We can usually identify the problem disk with a smartctl, the disk will show 'Offline uncorrectable errors'.

The fact that the server reboots itself, is this normal? Can we prevent this from happening?
The disks are attached to the on-board sata ports of the mainboard itself, so no (raid)controllers whatsoever.
We also do not use software raid.

Best regards

Jerome
_______________________________________________
freebsd-...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questi...@freebsd.org"

Paul B. Mahol

unread,
Dec 21, 2008, 6:35:04 PM12/21/08
to jerome, freebsd-...@freebsd.org
On 12/21/08, jerome <jer...@code-monkey.nl> wrote:
> Hi,
>
> We are running 6.3 on a fileserver with a couple of data disks.
>
> Once the server encounters an error on a data disk (os disk is separate) the
> server will reset itself without warning.

It just reset or it panic? There is known panic on bad block on some FreeBSD
versions but I don't think that such regression hit 6.X.


--
Paul

jerome

unread,
Dec 21, 2008, 7:44:44 PM12/21/08
to Paul B. Mahol, freebsd-...@freebsd.org
Hi Paul,

The server resets while running, like pressing the reset button...

-Jerome
_____

Paul B. Mahol

unread,
Dec 22, 2008, 7:15:12 AM12/22/08
to jerome, freebsd-...@freebsd.org
On 12/22/08, jerome <jer...@code-monkey.nl> wrote:
> Hi Paul,
>
> The server resets while running, like pressing the reset button...

Try this patch:

--- src/sys/dev/ata/ata-queue.c 2008/10/27 09:26:24 1.74
+++ src/sys/dev/ata/ata-queue.c 2008/11/27 03:37:46 1.75
@@ -357,7 +357,7 @@ ata_completed(void *context, int dummy)
"\6MEDIA_CHANGED\5NID_NOT_FOUND"
"\4MEDIA_CHANGE_REQEST"
"\3ABORTED\2NO_MEDIA\1ILLEGAL_LENGTH");
- if ((request->flags & ATA_R_DMA) &&
+ if ((request->flags & ATA_R_DMA) && request->dma &&
(request->dma->status & ATA_BMSTAT_ERROR))
printf(" dma=0x%02x", request->dma->status);
if (!(request->flags & (ATA_R_ATAPI | ATA_R_CONTROL)))

jerome

unread,
Dec 22, 2008, 5:07:17 PM12/22/08
to Paul B. Mahol, freebsd-...@freebsd.org
Hi Paul,

Ok, thanks.
Will let you know the outcome.

-Jerome
_____

From: Paul B. Mahol [mailto:one...@gmail.com]
To: jerome [mailto:jer...@code-monkey.nl]
Cc: freebsd-...@freebsd.org
Sent: Mon, 22 Dec 2008 13:15:12 +0100
Subject: Re: disk error / reboot / 6.3

jerome

unread,
Dec 28, 2008, 1:55:11 PM12/28/08
to Paul B. Mahol, freebsd-...@freebsd.org
Hi Paul,

The patch worked (almost).

At first a program accessing a disk that reported an uncorrectable error, the program just segfaulted.

Another instance let to the situation that I was only able to ping the server.
No ssh or console access was possible anymore.

-Pat

Paul B. Mahol

unread,
Dec 28, 2008, 2:37:02 PM12/28/08
to jerome, freebsd-...@freebsd.org
On 12/28/08, jerome <jer...@code-monkey.nl> wrote:
> Hi Paul,
>
> The patch worked (almost).
>
> At first a program accessing a disk that reported an uncorrectable error,
> the program just segfaulted.
>
> Another instance let to the situation that I was only able to ping the
> server.
> No ssh or console access was possible anymore.

That is somehow to be expected, the point of patch is to fix panic, not
trashing due to faulty disk/drivers/something else ...

Reply all
Reply to author
Forward
0 new messages