Unknown Issue.

4 vues
Accéder directement au premier message non lu

Patrick

non lue,
12 déc. 2004, 16:21:3512/12/2004
à linux-...@vger.kernel.org
Hi,

I've got a computer running gentoo, on a clean install where i've got
an odd problem :

after a while, the computer refuses to spawn processes anymore :

-/bin/bash: /bin/ps: Input/output error
-/bin/bash: /usr/bin/w: Input/output error
-/bin/bash: /bin/df: Input/output error
-/bin/bash: /bin/mount: Input/output error

It happen's randomly, i've tried everything from changing the computer
from running software raid ( scsi ) to running a hardware solution and
reinstalling, I've run the memory through memtest as well as i've
remounted the drives and i've tested the ram to make sure it was
properly mounted.

The only thing running on this box is mysql, which runs perfectly at
7500 q/s ( running super smack ) now, i'm not sure if this is a linux
kernel thing, or a gentoo thing, or a hardware thing.

I've checked and i'm not running out of file descriptors ( by looking
in /proc/sys/fs/file-nr ) and i've increased the ammount in (
/proc/sys/fs/file-max ( if i member correctly ) ) by adding a 0 after
the end of the value thus increasing it alot.

It's running XFS on the root partition with a single partition, dual
xeon 2.66 with hyperthreading enabled, dual intel gbe and a adaptec
2120S AACraid card. Dual 36gb 10krpm scsi drives in raid1.

Does anyone have any ideas on what i can do, what i can test, if it's
hardware ? software ?

guys ?

P

--
</N>

------
In the beginning, there was nothing. And God said, 'Let there be
Light.' And there was still nothing, but you could see a bit better.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Piszcz, Justin Michael

non lue,
13 déc. 2004, 09:02:4813/12/2004
à Patrick,linux-...@vger.kernel.org,linu...@oss.sgi.com,Andrew Morton,Kristofer T. Karas,Jeff Garzik,Linus Torvalds
Patrick,

I had the same problem on two machines with XFS. Both slackware-current
machines. The kernel on the Dell GX1 was built with GCC-3.4.2 and on my
main box was GCC-3.4.3.

There seems to be a bug in XFS with some configurations of 2.6.9 and
2.6.10-rc series.

After re-installing Slackware-10.0 and upgrading to -current, I have
installed 2.6.10-rc3 and so far, I have not been able to reproduce the
problem.

Some questions for you:

1] What kernel are you running?
2] What did you last change before you started getting these errors?

As far as severity goes, I ran XFS' fsck from a KNOPPIX CD and as a
result, I had about 500-600mb of files in my /lost+found directory when
it was finished. Files were missing from all parts of the file system.
I had to restore from backup. I would say stick with your previous
2.6.9 configuration (if you were running it) or go back to 2.6.8.1, some
2.6.9 configurations and 2.6.10-rc1 and/or 2.6.10-rc2 definitely cause
file corruption with XFS. So far, however, I have not been able to
reproduce the error with 2.6.10-rc3.

Justin.

Eric Sandeen

non lue,
13 déc. 2004, 12:07:0913/12/2004
à Piszcz, Justin Michael,Patrick,linux-...@vger.kernel.org,linu...@oss.sgi.com,Andrew Morton,Kristofer T. Karas,Jeff Garzik,Linus Torvalds
My first thought is that perhaps the filesystem has shut down due to
some error (memory corruption, bad disk, xfs bug...); did you check your
log messages?

Justin, when you mentioned that you used xfs' fsck, I guess you used
xfs_repair. Was the log clean when you ran it, or did you force repair
to zero out the log? That could explain the large lost+found/ when you
were done...

Patrick, can you reproduce on a non-gentoo kernel? That'd be the first
step for this audience.

-Eric

-

Piszcz, Justin Michael

non lue,
13 déc. 2004, 12:24:3113/12/2004
à Eric Sandeen,Patrick,linux-...@vger.kernel.org,linu...@oss.sgi.com,Andrew Morton,Kristofer T. Karas,Jeff Garzik,Linus Torvalds
> My first thought is that perhaps the filesystem has shut down due to
> some error (memory corruption, bad disk, xfs bug...); did you check
your
> log messages?

Yes, there was nothing relevant on either machine.

> Justin, when you mentioned that you used xfs' fsck, I guess you used
> xfs_repair. Was the log clean when you ran it, or did you force
repair
> to zero out the log? That could explain the large lost+found/ when
you
> were done...

Ah, good question, yes I used xfs_repair, at this point I knew I had to
restore from backup and answered "y" to all questions. I am not sure
but I do not recall the log being dirty.

In the logs on my main machine, it showed the following when it
attempted to mount the two filesystems (root and boot, /dev/hde4 and
/dev/hde1 respectively).

As far as bad disk/memory, I have tested both systems with memtest86 and
the result was 0 errors, as far as the disks go, I have not experienced
any problems with either of them until I moved to 2.6.9/2.6.10-rc{1,2}.


Justin.

Dec 5 08:23:53 jpiszcz kernel: XFS internal error
XFS_WANT_CORRUPTED_GOTO at line 1583 of file fs/xfs/xfs_alloc.c. Caller
0xc021de57
Dec 5 08:23:53 jpiszcz kernel: [xfs_free_ag_extent+1237/2065]
xfs_free_ag_extent+0x4d5/0x811
Dec 5 08:23:53 jpiszcz kernel: [xfs_free_extent+207/242]
xfs_free_extent+0xcf/0xf2
Dec 5 08:23:53 jpiszcz kernel: [xlog_grant_push_ail+279/400]
xlog_grant_push_ail+0x117/0x190
Dec 5 08:23:53 jpiszcz kernel: [xfs_free_extent+207/242]
xfs_free_extent+0xcf/0xf2
Dec 5 08:23:53 jpiszcz kernel: [xfs_trans_get_efd+56/70]
xfs_trans_get_efd+0x38/0x46
Dec 5 08:23:53 jpiszcz kernel: [xlog_recover_process_efi+402/508]
xlog_recover_process_efi+0x192/0x1fc
Dec 5 08:23:53 jpiszcz kernel: [xlog_recover_process_efis+77/129]
xlog_recover_process_efis+0x4d/0x81
Dec 5 08:23:53 jpiszcz kernel: [xlog_recover_finish+26/194]
xlog_recover_finish+0x1a/0xc2
Dec 5 08:23:53 jpiszcz kernel: [xfs_rtmount_inodes+193/230]
xfs_rtmount_inodes+0xc1/0xe6
Dec 5 08:23:53 jpiszcz kernel: [xfs_log_mount_finish+44/48]
xfs_log_mount_finish+0x2c/0x30
Dec 5 08:23:53 jpiszcz kernel: [xfs_mountfs+2459/3995]
xfs_mountfs+0x99b/0xf9b
Dec 5 08:23:53 jpiszcz kernel: [pagebuf_iostart+143/159]
pagebuf_iostart+0x8f/0x9f
Dec 5 08:23:53 jpiszcz kernel: [atomic_dec_and_lock+39/68]
atomic_dec_and_lock+0x27/0x44
Dec 5 08:23:53 jpiszcz kernel: [xfs_readsb+417/559]
xfs_readsb+0x1a1/0x22f
Dec 5 08:23:53 jpiszcz kernel: [xfs_ioinit+27/46] xfs_ioinit+0x1b/0x2e
Dec 5 08:23:53 jpiszcz kernel: [xfs_mount+934/1646]
xfs_mount+0x3a6/0x66e
Dec 5 08:23:53 jpiszcz kernel: [linvfs_fill_super+155/486]
linvfs_fill_super+0x9b/0x1e6
Dec 5 08:23:53 jpiszcz kernel: [snprintf+39/43] snprintf+0x27/0x2b
Dec 5 08:23:53 jpiszcz kernel: [disk_name+98/191] disk_name+0x62/0xbf
Dec 5 08:23:53 jpiszcz kernel: [sb_set_blocksize+46/94]
sb_set_blocksize+0x2e/0x5e
Dec 5 08:23:53 jpiszcz kernel: [get_sb_bdev+258/342]
get_sb_bdev+0x102/0x156
Dec 5 08:23:53 jpiszcz kernel: [alloc_vfsmnt+156/215]
alloc_vfsmnt+0x9c/0xd7
Dec 5 08:23:53 jpiszcz kernel: [linvfs_get_sb+47/51]
linvfs_get_sb+0x2f/0x33
Dec 5 08:23:53 jpiszcz kernel: [linvfs_fill_super+0/486]
linvfs_fill_super+0x0/0x1e6
Dec 5 08:23:53 jpiszcz kernel: [do_kern_mount+99/235]
do_kern_mount+0x63/0xeb
Dec 5 08:23:53 jpiszcz kernel: [do_new_mount+158/247]
do_new_mount+0x9e/0xf7
Dec 5 08:23:53 jpiszcz kernel: [do_mount+413/443] do_mount+0x19d/0x1bb
Dec 5 08:23:53 jpiszcz kernel: [copy_mount_options+96/183]
copy_mount_options+0x60/0xb7
Dec 5 08:23:53 jpiszcz kernel: [sys_mount+191/291]
sys_mount+0xbf/0x123
Dec 5 08:23:53 jpiszcz kernel: [do_mount_root+47/158]
do_mount_root+0x2f/0x9e
Dec 5 08:23:53 jpiszcz kernel: [mount_block_root+96/305]
mount_block_root+0x60/0x131
Dec 5 08:23:53 jpiszcz kernel: [mount_root+101/135]
mount_root+0x65/0x87
Dec 5 08:23:53 jpiszcz kernel: [prepare_namespace+25/178]
prepare_namespace+0x19/0xb2
Dec 5 08:23:53 jpiszcz kernel: [flush_workqueue+136/180]
flush_workqueue+0x88/0xb4
Dec 5 08:23:53 jpiszcz kernel: [init+427/475] init+0x1ab/0x1db
Dec 5 08:23:53 jpiszcz kernel: [init+0/475] init+0x0/0x1db
Dec 5 08:23:53 jpiszcz kernel: [kernel_thread_helper+5/11]
kernel_thread_helper+0x5/0xb
Dec 5 08:23:53 jpiszcz kernel: VFS: Mounted root (xfs filesystem)
readonly.

Patrick

non lue,
13 déc. 2004, 12:25:3713/12/2004
à Eric Sandeen,Piszcz, Justin Michael,linux-...@vger.kernel.org,linu...@oss.sgi.com,Andrew Morton,Kristofer T. Karas,Jeff Garzik,Linus Torvalds
Hi,

> Patrick, can you reproduce on a non-gentoo kernel? That'd be the first
> step for this audience.

I've not tried to reproduce it on a non-gentoo kernel as the original
one that i had the problem was a vanilla kernel ;) ( as i know your
fondness of gentoo's patch-o-lotic )

I've been abusing the box the entire day with FreeBSD, the same mysql
config and version of the mysqld as well as the same operations ( and
some more ... serious ones ( e.g. forkbomb, iozone, etc. ) and no
problem's.

There were no messages in the log, and nothing in kmesg. Anything else
i could try ? Also, as far as i know i was running kernel 2.6.10_rc3
and i'd reinstalled the box twice with new XFS filesystems both times.

P

Piszcz, Justin Michael

non lue,
13 déc. 2004, 12:29:4513/12/2004
à Patrick,Eric Sandeen,linux-...@vger.kernel.org,linu...@oss.sgi.com,Andrew Morton,Kristofer T. Karas,Jeff Garzik,Linus Torvalds
So your problem was only temporary?

Or?

After I began having the problem, I was trying to edit some files and
then I got the same errors as you, ie: /usr/bin/vi Input/Ouput error,
and then I tried to run or edit different programs and files and nothing
was working.

Were you also forced to re-install, or does this only happen sometimes?

-----Original Message-----
From: Patrick [mailto:nawt...@gmail.com]
Sent: Monday, December 13, 2004 12:14 PM
To: Eric Sandeen
Cc: Piszcz, Justin Michael; linux-...@vger.kernel.org;
linu...@oss.sgi.com; Andrew Morton; Kristofer T. Karas; Jeff Garzik;
Linus Torvalds
Subject: Re: Unknown Issue.

Patrick

non lue,
13 déc. 2004, 12:29:4613/12/2004
à Piszcz, Justin Michael,Eric Sandeen,linux-...@vger.kernel.org,linu...@oss.sgi.com,Andrew Morton,Kristofer T. Karas,Jeff Garzik,Linus Torvalds
Hi,

> Yes, there was nothing relevant on either machine.

Same here.

Ok, well i couldn't pinpoint it at FS and looked like hardware to me,
i suppose i could redo the box with 2.6.10 and XFS again to see if i
can redo the problem, although i'm partially leaning towards hardware,
but that's the easiest thing to blame :)

I figure i'm going to try out another FS, maby reiser, that should
either do the same, or not, if not, then we know where to start ?

P

Patrick

non lue,
13 déc. 2004, 12:35:0913/12/2004
à Piszcz, Justin Michael,Eric Sandeen,linux-...@vger.kernel.org,linu...@oss.sgi.com,Andrew Morton,Kristofer T. Karas,Jeff Garzik,Linus Torvalds
Hi,

> So your problem was only temporary?

No, it happened randomly though, and all the time. Generally within an hour.

> After I began having the problem, I was trying to edit some files and
> then I got the same errors as you, ie: /usr/bin/vi Input/Ouput error,
> and then I tried to run or edit different programs and files and nothing
> was working.
>
> Were you also forced to re-install, or does this only happen sometimes?

I moved to freebsd as i require the box to actually work, which it
seems to be doing at the moment, even after a bit-o-nailing, but that
still doesn't solve the problem.

Eric Sandeen

non lue,
13 déc. 2004, 12:52:4513/12/2004
à Piszcz, Justin Michael,Patrick,linux-...@vger.kernel.org,linu...@oss.sgi.com,Andrew Morton,Kristofer T. Karas,Jeff Garzik,Linus Torvalds
Piszcz, Justin Michael wrote:

> Ah, good question, yes I used xfs_repair, at this point I knew I had to
> restore from backup and answered "y" to all questions. I am not sure
> but I do not recall the log being dirty.

Hm, xfs_repair does not ask any questions.

> In the logs on my main machine, it showed the following when it
> attempted to mount the two filesystems (root and boot, /dev/hde4 and
> /dev/hde1 respectively).

> Dec 5 08:23:53 jpiszcz kernel: XFS internal error


> XFS_WANT_CORRUPTED_GOTO at line 1583 of file fs/xfs/xfs_alloc.c. Caller
> 0xc021de57

(having trouble replaying the log here)

Ok, so XVM has found something wrong at this point. Any chance the box
had a power failure? Write caches on ide drives can wreak havoc with
journaling filesystems... i.e. what happened between "the filesystem
was working" and "i remounted the filesystem and got this"

>
> As far as bad disk/memory, I have tested both systems with memtest86 and
> the result was 0 errors, as far as the disks go, I have not experienced
> any problems with either of them until I moved to 2.6.9/2.6.10-rc{1,2}.

ok

-Eric

Eric Sandeen

non lue,
13 déc. 2004, 12:59:1413/12/2004
à Eric Sandeen,Piszcz, Justin Michael,Patrick,linux-...@vger.kernel.org,linu...@oss.sgi.com,Kristofer T. Karas
Eric Sandeen wrote:

>> Dec 5 08:23:53 jpiszcz kernel: XFS internal error
>> XFS_WANT_CORRUPTED_GOTO at line 1583 of file fs/xfs/xfs_alloc.c. Caller
>> 0xc021de57
>
> (having trouble replaying the log here)
>
> Ok, so XVM has found something wrong at this point.

urk, make that "XFS has found..." of course :)

Piszcz, Justin Michael

non lue,
13 déc. 2004, 15:54:2013/12/2004
à Eric Sandeen,Patrick,linux-...@vger.kernel.org,linu...@oss.sgi.com,Andrew Morton,Kristofer T. Karas,Jeff Garzik,Linus Torvalds
> Ok, so XVM has found something wrong at this point. Any chance the
box
> had a power failure? Write caches on ide drives can wreak havoc with
> journaling filesystems... i.e. what happened between "the filesystem
> was working" and "i remounted the filesystem and got this"

For main system: To make a long story short, I was attempting to hook up
a cd burner and dvd reader to SATA via SATA<->PATA adapters and enable
SATA in the kernel for the Intel ICH5 chipset and I was trying different
drivers/options in an attempt to get them to work. However, please note
during the entire time, the disk that suffered FS corruption was always
hooked to a Ultra ATA/133 Promise Controller. I believe I had a kernel
panic once and at another time during either loading SATA drivers or IDE
drivers I had a lockup somewhere along the lines and I rebooted
improperly.

For Dell GX1 system: No, all I did was upgrade the kernel [2.6.9 ->
2.6.10-rc2] and reboot, no power outages or crashes at all. After about
an hour or so, I began to experience these problems.


-----Original Message-----
From: Eric Sandeen [mailto:san...@sgi.com]
Sent: Monday, December 13, 2004 12:50 PM
To: Piszcz, Justin Michael

Cc: Patrick; linux-...@vger.kernel.org; linu...@oss.sgi.com; Andrew
Morton; Kristofer T. Karas; Jeff Garzik; Linus Torvalds
Subject: Re: Unknown Issue.

Piszcz, Justin Michael wrote:

> Ah, good question, yes I used xfs_repair, at this point I knew I had
to
> restore from backup and answered "y" to all questions. I am not sure
> but I do not recall the log being dirty.

Hm, xfs_repair does not ask any questions.

> In the logs on my main machine, it showed the following when it
> attempted to mount the two filesystems (root and boot, /dev/hde4 and
> /dev/hde1 respectively).

> Dec 5 08:23:53 jpiszcz kernel: XFS internal error


> XFS_WANT_CORRUPTED_GOTO at line 1583 of file fs/xfs/xfs_alloc.c.
Caller
> 0xc021de57
(having trouble replaying the log here)

Ok, so XVM has found something wrong at this point. Any chance the box

had a power failure? Write caches on ide drives can wreak havoc with
journaling filesystems... i.e. what happened between "the filesystem
was working" and "i remounted the filesystem and got this"

>
> As far as bad disk/memory, I have tested both systems with memtest86
and
> the result was 0 errors, as far as the disks go, I have not
experienced
> any problems with either of them until I moved to
2.6.9/2.6.10-rc{1,2}.

ok

-Eric

Répondre à tous
Répondre à l'auteur
Transférer
0 nouveau message