Google Groepen ondersteunt geen nieuwe Usenet-berichten of -abonnementen meer. Historische content blijft zichtbaar.

[problem captured] Re: cerberus on 2.4.17-rc2 UP

0 weergaven
Naar het eerste ongelezen bericht

Alan Cox

ongelezen,
8 jan 2002, 11:13:5108-01-2002
aan
> end_request: buffer-list destroyed
> hda1: bad access: block=12440, count=-8
> end_request: I/O error, dev 03:01 (hda), sector 12440
> hda1: bad access: block=12448, count=-16

That looks like a race in the IDE/block layer (or somewhere above it maybe)
Someone trashed a request in progress.

> Is this a bug or could it be the hardware's fault? The hardware is new lspci

Other people have reported it too. Its clearly a kernel race
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

marc. h.

ongelezen,
8 jan 2002, 10:48:1708-01-2002
aan
ok,

I managed to get it to do it again. Captured the problem on serial console:

--------------------------------------------

end_request: buffer-list destroyed
hda1: bad access: block=12440, count=-8
end_request: I/O error, dev 03:01 (hda), sector 12440
hda1: bad access: block=12448, count=-16

end_request: I/O error, dev 03:01 (hda), sector 12448
hda: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
hda: drive not ready for command
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt <-- they never stop appearing
[..more of the same..]

------------------------------------------

Is this a bug or could it be the hardware's fault? The hardware is new lspci

and hdparm -iv follows (I also have the output of sysrq T+M+P is that would be
useful to anyone just ask, I'd rather save the bandwidth and not send it over
the list):


/dev/hda:
multcount = 16 (on)
I/O support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 2501/255/63, sectors = 40188960, start = 0

Model=IC35L020AVER07-0, FwRev=ER2OA45A, SerialNo=SVPTVFQ8610
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=40
BuffType=DualPortCache, BuffSize=1916kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=-66060037, LBA=yes, LBAsects=40188960
IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=yes: disabled (255)
Drive Supports : ATA/ATAPI-5 T13 1321D revision 1 : ATA-2 ATA-3 ATA-4 ATA-5

ontroller Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82815 CGC [Chipset Graphics Controller] (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801BAM PCI (rev 11)
00:1f.0 ISA bridge: Intel Corporation 82801BA ISA Bridge (ICH2) (rev 11)
00:1f.1 IDE interface: Intel Corporation 82801BA IDE U100 (rev 11)
00:1f.2 USB Controller: Intel Corporation 82801BA(M) USB (Hub A) (rev 11)
00:1f.3 SMBus: Intel Corporation 82801BA(M) SMBus (rev 11)
00:1f.4 USB Controller: Intel Corporation 82801BA(M) USB (Hub B) (rev 11)
01:00.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 74)
01:04.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)01:08.0 Ethernet controller: Intel Corporation 82801BA(M) Ethernet (rev 03)

-m


On Mon, Jan 07, 2002 at 12:14:24PM +0100, marc. h. wrote:
> On Fri, Dec 21, 2001 at 02:56:34PM -0200, Marcelo Tosatti wrote:
> >
> > Can you please run Cerberus again and give me more information ?
>
> ok, I *finally* got it to deadlock again.. trick is to run 2 simultaneous
> cerberus runs.. same symptoms, pings, can change VC's, hard drive light
> constantly on but silent and no blinks. I had sysrq turned on this time (tested
> before the run), but once deadlocked, doing Alt+SysRQ+8, Alt+SysRQ+T, etc would
> print nothing at all..
>
> -m
>
> >
> > I want Alt+SysRQ+T, Alt+SysRQ+M and Alt+SysRQ+P output.
> >
> > If those keys simply print the sysrq header, please try Alt+SysRQ+8 then
> > the above again.
> >
> > Thanks
> >
> > On Thu, 20 Dec 2001, marc. h. wrote:
> >
> > > I tried out the latest cerberus from
> > > http://people.redhat.com/bmatthews/cerberus/ on a UP redhat-7.2 box. I ran the
> > > standard non-destructive RedHat tests.
> > >
> > > It ran for about 14 hours and then became unresponsive.. machine still ping'ed
> > > , I could switch VC's scroll up on console, but that's it. Could not log in,
> > > etc.. Another point is that the hard drive light remained on but it was not
> > > seeking, it seemed dead silent.
> >
>
> --
> C3C5 9226 3C03 CDF7 2EF1 029F 4CAD FBA4 F5ED 68EB
> key: http://people.hbesoftware.com/~heckmann/

--
C3C5 9226 3C03 CDF7 2EF1 029F 4CAD FBA4 F5ED 68EB
key: http://people.hbesoftware.com/~heckmann/

Andrew Morton

ongelezen,
8 jan 2002, 15:33:3308-01-2002
aan
Alan Cox wrote:
>
> > end_request: buffer-list destroyed
> > hda1: bad access: block=12440, count=-8
> > end_request: I/O error, dev 03:01 (hda), sector 12440
> > hda1: bad access: block=12448, count=-16
>
> That looks like a race in the IDE/block layer (or somewhere above it maybe)
> Someone trashed a request in progress.
>
> > Is this a bug or could it be the hardware's fault? The hardware is new lspci
>
> Other people have reported it too. Its clearly a kernel race

Yes, I can generate it at will on two quite different IDE machines
with the run-bash-shared-mapping script from
http://www.zip.com.au/~akpm/ext3-tools.tar.gz

It's on my list of things-to-do, filed under "hard". It even happens
on uniprocessor, with unmask_irq=0.

Interestingly, I _think_ it only ever occurs against the
swap device. But I need to confirm this. Marc, do you
have swap on /dev/hda1?

-

Alex Scheele

ongelezen,
8 jan 2002, 16:05:2408-01-2002
aan
Andrew Morton wrote:
>
>
> Alan Cox wrote:
> >
> > > end_request: buffer-list destroyed
> > > hda1: bad access: block=12440, count=-8
> > > end_request: I/O error, dev 03:01 (hda), sector 12440
> > > hda1: bad access: block=12448, count=-16
> >
> > That looks like a race in the IDE/block layer (or somewhere
> above it maybe)
> > Someone trashed a request in progress.
> >
> > > Is this a bug or could it be the hardware's fault? The
> hardware is new lspci
> >
> > Other people have reported it too. Its clearly a kernel race
>
> Yes, I can generate it at will on two quite different IDE machines
> with the run-bash-shared-mapping script from
> http://www.zip.com.au/~akpm/ext3-tools.tar.gz
>
> It's on my list of things-to-do, filed under "hard". It even happens
> on uniprocessor, with unmask_irq=0.
>
> Interestingly, I _think_ it only ever occurs against the
> swap device. But I need to confirm this. Marc, do you
> have swap on /dev/hda1?

I have had this problem on several machines to. But not only
against the swap device. I have 1 machine with a SCSI disk as
root disk /dev/sda1, the swap device is /dev/sda2.
Then there is a 4 disk ide raid0 (software raid) mounted
on /mnt and if i run it there i have the same problem.
This machine is a SMP machine, tho is has also happend
on UP machines.

Hope it helps.

--
Alex (al...@packetstorm.nu)

Manfred Spraul

ongelezen,
8 jan 2002, 17:15:0108-01-2002
aan

Content-Type: multipart/mixed;
boundary="------------E7C7C7293155B3E004E01A0C"

This is a multi-part message in MIME format.
--------------E7C7C7293155B3E004E01A0C
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

> Yes, I can generate it at will on two quite different IDE machines
> with the run-bash-shared-mapping script from
> http://www.zip.com.au/~akpm/ext3-tools.tar.gz

Could you apply the attached patch and try to reproduce it?
Enable CONFIG_DEBUG_SLAB.

The patch poisons all objects I could find that might have something
to do with the bug. (all slab caches, struct request, struct page,
struct filp, partially struct buffer_head).

My test box survives the run-bash_shared-mapping script (~30 min, 128
MB memory).

Thanks,
--
Manfred
--------------E7C7C7293155B3E004E01A0C
Content-Type: text/plain; charset=us-ascii;
name="patch-poison"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="patch-poison"

// $Header$
// Kernel Version:
// VERSION = 2
// PATCHLEVEL = 4
// SUBLEVEL = 18
// EXTRAVERSION = pre2
--- 2.4/drivers/block/ll_rw_blk.c Tue Jan 8 17:53:56 2002
+++ build-2.4/drivers/block/ll_rw_blk.c Tue Jan 8 22:02:44 2002
@@ -348,6 +348,8 @@
}
memset(rq, 0, sizeof(struct request));
rq->rq_status = RQ_INACTIVE;
+ poison_obj(&rq->elevator_sequence, sizeof(struct request)
+ -offsetof(struct request,elevator_sequence));
list_add(&rq->queue, &q->rq[i&1].free);
q->rq[i&1].count++;
}
@@ -428,8 +430,12 @@
if (!list_empty(&rl->free)) {
rq = blkdev_free_rq(&rl->free);
list_del(&rq->queue);
+ /* FIXME: clear or not clear? */
+ check_poison(&rq->elevator_sequence, sizeof(struct request)
+ - offsetof(struct request,elevator_sequence));
rl->count--;
rq->rq_status = RQ_ACTIVE;
+ rq->cmd = rw;
rq->special = NULL;
rq->q = q;
}
@@ -560,6 +566,8 @@
*/
if (q) {
list_add(&req->queue, &q->rq[rw].free);
+ poison_obj(&req->elevator_sequence, sizeof(struct request)
+ -offsetof(struct request,elevator_sequence));
if (++q->rq[rw].count >= batch_requests && waitqueue_active(&q->wait_for_request))
wake_up(&q->wait_for_request);
}
--- 2.4/fs/file_table.c Sun Sep 30 16:25:45 2001
+++ build-2.4/fs/file_table.c Tue Jan 8 22:02:44 2002
@@ -39,6 +39,8 @@
used_one:
f = list_entry(free_list.next, struct file, f_list);
list_del(&f->f_list);
+ check_poison (&f->f_dentry, sizeof(struct file) -
+ offsetof(struct file, f_dentry));
files_stat.nr_free_files--;
new_one:
memset(f, 0, sizeof(*f));
@@ -118,6 +120,8 @@
file->f_dentry = NULL;
file->f_vfsmnt = NULL;
list_del(&file->f_list);
+ poison_obj(&file->f_dentry, sizeof(struct file) -
+ offsetof(struct file, f_dentry));
list_add(&file->f_list, &free_list);
files_stat.nr_free_files++;
file_list_unlock();
@@ -147,6 +151,8 @@
file_list_lock();
list_del(&file->f_list);
list_add(&file->f_list, &free_list);
+ poison_obj(&file->f_dentry, sizeof(struct file) -
+ offsetof(struct file, f_dentry));
files_stat.nr_free_files++;
file_list_unlock();
}
--- 2.4/include/linux/slab.h Tue Dec 25 17:12:07 2001
+++ build-2.4/include/linux/slab.h Tue Jan 8 22:02:44 2002
@@ -78,6 +78,40 @@
extern kmem_cache_t *fs_cachep;
extern kmem_cache_t *sigact_cachep;

+
+#ifdef CONFIG_DEBUG_SLAB
+extern void __check_poison(void *obj, int size, char *file, int line);
+
+#define check_and_clear_poison(obj, size) \
+ do { \
+ __check_poison(obj, size, __FILE__, __LINE__); \
+ memset(obj, 0, size); \
+ } while(0)
+
+#define check_poison(obj, size) \
+ do { \
+ __check_poison(obj, size, __FILE__, __LINE__); \
+ memset(obj, 0x2C, size); \
+ } while(0)
+
+#define poison_obj(obj, size) \
+ memset(obj, 0xe3, size); \
+
+#else
+static inline void check_and_clear_poison(void *obj, int size)
+{
+ memset(obj, 0, size);
+}
+static inline void check_poison(void *obj, int size)
+{
+ /* nop */
+}
+static inline void poison_obj(void *obj, int size)
+{
+ /* nop */
+}
+#define
+#endif
#endif /* __KERNEL__ */

#endif /* _LINUX_SLAB_H */
--- 2.4/mm/slab.c Tue Dec 25 17:12:07 2001
+++ build-2.4/mm/slab.c Tue Jan 8 22:02:44 2002
@@ -1196,8 +1196,11 @@

if (objnr >= cachep->num)
BUG();
- if (objp != slabp->s_mem + objnr*cachep->objsize)
+ if (objp != slabp->s_mem + objnr*cachep->objsize) {
+ printk("cache %s: got objp %p, objnr %d, s_mem %ph.\n",
+ cachep->name, objp, objnr, slabp->s_mem);
BUG();
+ }

/* Check slab's freelist to see if this obj is there. */
for (i = slabp->free; i != BUFCTL_END; i = slab_bufctl(slabp)[i]) {
@@ -1210,6 +1213,9 @@

static inline void kmem_cache_alloc_head(kmem_cache_t *cachep, int flags)
{
+#ifdef DEBUG
+ if (in_interrupt() && (flags & SLAB_LEVEL_MASK) != SLAB_ATOMIC)
+ BUG();
if (flags & SLAB_DMA) {
if (!(cachep->gfpflags & GFP_DMA))
BUG();
@@ -1217,6 +1223,7 @@
if (cachep->gfpflags & GFP_DMA)
BUG();
}
+#endif
}

static inline void * kmem_cache_alloc_one_tail (kmem_cache_t *cachep,
@@ -1347,6 +1354,15 @@
objp = kmem_cache_alloc_one(cachep);
#endif
local_irq_restore(save_flags);
+#if DEBUG
+ if (cachep->flags & SLAB_RED_ZONE) {
+ kmem_extra_free_checks(cachep, GET_PAGE_SLAB(virt_to_page(objp)),
+ objp-BYTES_PER_WORD);
+ } else {
+ kmem_extra_free_checks(cachep, GET_PAGE_SLAB(virt_to_page(objp)),
+ objp);
+ }
+#endif
return objp;
alloc_new_slab:
#ifdef CONFIG_SMP
@@ -1475,6 +1491,15 @@
#ifdef CONFIG_SMP
cpucache_t *cc = cc_data(cachep);

+#if DEBUG
+ if (cachep->flags & SLAB_RED_ZONE) {
+ kmem_extra_free_checks(cachep, GET_PAGE_SLAB(virt_to_page(objp)),
+ objp-BYTES_PER_WORD);
+ } else {
+ kmem_extra_free_checks(cachep, GET_PAGE_SLAB(virt_to_page(objp)),
+ objp);
+ }
+#endif
CHECK_PAGE(virt_to_page(objp));
if (cc) {
int batchcount;
@@ -2039,3 +2064,17 @@
#endif
}
#endif
+
+#ifdef CONFIG_DEBUG_SLAB
+void __check_poison(void *obj, int size, char *file, int line)
+{
+ int i;
+ for (i=0;i<size;i++) {
+ if (((unsigned char*)obj)[i] != 0xe3) {
+ printk(KERN_INFO "poison error in obj %p, len %d, file %s, line %d, offset %d is: 0x%x\n",
+ obj, size, file, line, i, ((unsigned char*)obj)[i]);
+ }
+ }
+}
+#endif
+
--- 2.4/mm/page_alloc.c Fri Nov 23 20:35:40 2001
+++ build-2.4/mm/page_alloc.c Tue Jan 8 22:02:44 2002
@@ -89,7 +89,16 @@
if (current->flags & PF_FREE_PAGES)
goto local_freelist;
back_local_freelist:
-
+#ifdef CONFIG_DEBUG_SLAB
+ page->mapping = (void*)0xdeadbeef;
+ page->index = 0xbaadc0de;
+ page->next_hash = (void*)0xbeefdead;
+ atomic_set(&page->count,1);
+ page->lru.next = (void*)0xbaadf00d;
+ page->lru.prev = (void*)0xf00dbaad;
+ page->pprev_hash = (void*)0xdeadbeef;
+ page->buffers = (void*)0xdeadbeef;
+#endif
zone = page->zone;

mask = (~0UL) << order;
@@ -200,6 +209,26 @@
page = expand(zone, page, index, order, curr_order, area);
spin_unlock_irqrestore(&zone->lock, flags);

+#ifdef CONFIG_DEBUG_SLAB
+ if (page->mapping != (void*)0xdeadbeef)
+ printk(KERN_ERR"got mapping %p.\n", page->mapping);
+ if (page->index != 0xbaadc0de)
+ printk(KERN_ERR "got index %lxh.\n", page->index);
+ if (page->next_hash != (void*)0xbeefdead)
+ printk(KERN_ERR "got next_hash %p.\n", page->next_hash);
+ if (atomic_read(&page->count) != 1)
+ printk(KERN_ERR "bad page count %d.\n", atomic_read(&page->count));
+ if (page->lru.next != (void*)0xbaadf00d)
+ printk(KERN_ERR" bad lru_next %p.\n", page->lru.next);
+ if (page->lru.prev != (void*)0xf00dbaad)
+ printk(KERN_ERR" bad lru_prev %p.\n", page->lru.prev);
+ if (page->pprev_hash != (void*)0xdeadbeef)
+ printk(KERN_ERR" bad pprev_hash %p.\n", page->pprev_hash);
+ if (page->buffers != (void*)0xdeadbeef)
+ printk(KERN_ERR" bad page->buffers %p.\n", page->buffers);
+ page->mapping = NULL;
+ page->buffers = NULL;
+#endif
set_page_count(page, 1);
if (BAD_RANGE(zone,page))
BUG();
--- 2.4/fs/buffer.c Tue Dec 25 17:12:03 2001
+++ build-2.4/fs/buffer.c Tue Jan 8 22:43:35 2002
@@ -1197,6 +1197,7 @@
bh->b_dev = B_FREE;
bh->b_blocknr = -1;
bh->b_this_page = NULL;
+ bh->b_size = 0xbaad;

nr_unused_buffer_heads++;
bh->b_next_free = unused_list;
@@ -1227,6 +1228,8 @@
unused_list = bh->b_next_free;
nr_unused_buffer_heads--;
spin_unlock(&unused_list_lock);
+ if (bh->b_size != 0xbaad)
+ printk(KERN_ERR "wrong size %lxh.\n", bh->b_size);
return bh;
}
spin_unlock(&unused_list_lock);
@@ -1251,6 +1254,8 @@
unused_list = bh->b_next_free;
nr_unused_buffer_heads--;
spin_unlock(&unused_list_lock);
+ if (bh->b_size != 0xbaad)
+ printk(KERN_ERR "wrong size %lxh.\n", bh->b_size);
return bh;
}
spin_unlock(&unused_list_lock);

--------------E7C7C7293155B3E004E01A0C--

marc. h.

ongelezen,
9 jan 2002, 04:37:0609-01-2002
aan
On Tue, Jan 08, 2002 at 12:33:33PM -0800, Andrew Morton wrote:
> Alan Cox wrote:
> >
> > > end_request: buffer-list destroyed
> > > hda1: bad access: block=12440, count=-8
> > > end_request: I/O error, dev 03:01 (hda), sector 12440
> > > hda1: bad access: block=12448, count=-16
> >
> > That looks like a race in the IDE/block layer (or somewhere above it maybe)
> > Someone trashed a request in progress.
> >
> > Other people have reported it too. Its clearly a kernel race
>
> Yes, I can generate it at will on two quite different IDE machines
> with the run-bash-shared-mapping script from
> http://www.zip.com.au/~akpm/ext3-tools.tar.gz

does this mean it's an ext3 bug? (haven't tried to reproduce it using ext2)

> It's on my list of things-to-do, filed under "hard". It even happens
> on uniprocessor, with unmask_irq=0.

yes. my machine is UP and unmask_irq is 0.

> Interestingly, I _think_ it only ever occurs against the
> swap device. But I need to confirm this. Marc, do you
> have swap on /dev/hda1?

nope. hda1 is /.

-m

--
C3C5 9226 3C03 CDF7 2EF1 029F 4CAD FBA4 F5ED 68EB
key: http://people.hbesoftware.com/~heckmann/

Andrew Morton

ongelezen,
9 jan 2002, 04:04:0109-01-2002
aan
Manfred Spraul wrote:
>
> > Yes, I can generate it at will on two quite different IDE machines
> > with the run-bash-shared-mapping script from
> > http://www.zip.com.au/~akpm/ext3-tools.tar.gz
>
> Could you apply the attached patch and try to reproduce it?

Nice patch.

> Enable CONFIG_DEBUG_SLAB.
>
> The patch poisons all objects I could find that might have something
> to do with the bug. (all slab caches, struct request, struct page,
> struct filp, partially struct buffer_head).
>
> My test box survives the run-bash_shared-mapping script (~30 min, 128
> MB memory).
>

Mine survives only a few minutes. Once it only lasted a second.
That's with mem=64m. It lasts much, much longer with more memory.

The patch, alas, sheds no light. I'll delve into it fairly soon,
I expect.

EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 212k freed
end_request: buffer-list destroyed
hda6: bad access: block=86256, count=-8
end_request: I/O error, dev 03:06 (hda), sector 86256


hda: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
hda: drive not ready for command
hda: lost interrupt

and:

end_request: buffer-list destroyed
hda6: bad access: block=93608, count=-8
end_request: I/O error, dev 03:06 (hda), sector 93608
hda6: bad access: block=93616, count=-16
end_request: I/O error, dev 03:06 (hda), sector 93616
hda6: bad access: block=93624, count=-24
end_request: I/O error, dev 03:06 (hda), sector 93624
hda6: bad access: block=93632, count=-32
end_request: I/O error, dev 03:06 (hda), sector 93632
hda6: bad access: block=93640, count=-40
end_request: I/O error, dev 03:06 (hda), sector 93640
hda6: bad access: block=93648, count=-48
end_request: I/O error, dev 03:06 (hda), sector 93648
hda6: bad access: block=93656, count=-56
end_request: I/O error, dev 03:06 (hda), sector 93656
hda6: bad access: block=93664, count=-64
end_request: I/O error, dev 03:06 (hda), sector 93664
hda6: bad access: block=93672, count=-72
end_request: I/O error, dev 03:06 (hda), sector 93672
hda6: bad access: block=93680, count=-80
end_request: I/O error, dev 03:06 (hda), sector 93680
hda6: bad access: block=93688, count=-88
end_request: I/O error, dev 03:06 (hda), sector 93688
hda6: bad access: block=93696, count=-96
end_request: I/O error, dev 03:06 (hda), sector 93696
hda6: bad access: block=93704, count=-104
end_request: I/O error, dev 03:06 (hda), sector 93704


hda: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
hda: drive not ready for command
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt

hmm.. hda6 is the root filesystem. The test was hitting hda8
and hda5(swap). The only activity happening on hda6 would be
a bit of pagein, maybe syslog. hmm.

Always hda6:

end_request: buffer-list destroyed
hda6: bad access: block=90704, count=-8
end_request: I/O error, dev 03:06 (hda), sector 90704
hda6: bad access: block=90712, count=-16
end_request: I/O error, dev 03:06 (hda), sector 90712
hda6: bad access: block=90720, count=-24
end_request: I/O error, dev 03:06 (hda), sector 90720
hda6: bad access: block=90728, count=-32

Interestingly, 2.4.13-ac8 doesn't fail. Well, it eventually takes
oopses in do_IRQ()'s get_current() - %cr2 has value 0x4017a000.

That kernel has the new IDE drivers, but I've seen the problem with
Andre's latest patches on PIIX, on VIA, and there are reports of it
on SCSI. And "buffer-list destroyed" is always the first message.
It doesn't feel like a driver problem. I'll go do a binary search
through some kernel revs.

-

0 nieuwe berichten