Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Aiee, killing interrupt handle!

0 views
Skip to first unread message

genkai wa doko da

unread,
Oct 5, 2000, 3:00:00 AM10/5/00
to
ok in recent weeks my main linux box has been panicing every day or 2
for no reason the msg is usually:

--ton of stack dump crap--
Aiee, killing interrupt handle!
panic: Attempted to kill the idle task!

(the above statements are in kernel/error.c which doesn't really tell me
anything at all.)

nothing appears in /var/log/message ever (I grepped every file under
/var no mention of panic anywhere.) I have syslogd logging *.err which
as far as I know should catch panics. Is there a way to get more verbose
logging?

This is a Slackware 7.1 box btw.

--
RCS/RI, Retro Computing Society: http://www.osfn.org/rcs/
RIFUG, RI Free Unix Group: http://www.rifug.org/
Dropdead, my band: http://www.dropdead.org/
my videogame stuff: http://www.gloom.org/~gauze/


Sent via Deja.com http://www.deja.com/
Before you buy.

Eric Y. Chang

unread,
Oct 5, 2000, 3:00:00 AM10/5/00
to
This message is caused by a buggy device or driver. These kinds of things
are very hard to debug. Here is a log of some of my experiences:

Hi. I am the person who posted the message complaining about the 3c505
and the "could not send first PCB" error message. There is a file
called 3c505.txt in the /usr/src/linux/Documentation directory (which
I did read, so don't tell me to RTFM :-) ). It did not say much. I
went over the PCB with a fine toothed comb, but did not find anything
wrong. I checked for slops and holidays and flux spray, but did not
see any problems. Actually, this is a through hole PCB,
so slops and holidays should not be too much of a problem.

Since I did not receive a reply from the newsgroup for awhile, I
decided to hack a little bit. PCB actually does not mean "printed
circuit board". It has something to do with a block of data used to
control the interface. The file in question is 3c505.c in
/usr/src/linux/drivers/net. The error message is generated when the
board does not correctly acknowledge a test block of data that is sent
to it when booting. Looking at the code, it seemed from the comments
that the outer loop probably had a timing problem if there is a lot of
boot/DMA activity. There may be a problem with getting a response.
There is a known problem (known at least to 3com) with busmastering DMA
SCSI cards on the bus. So, I decided to change the outer loop retry
number from 3 to 8 and recompile as a module so it will not try to send
the PCB right after the Buslogic SCSI card is detected and initialized.
It now comes up just fine!


Now just try that with Windows 95! Parenthetically, I should point out
that this kind of problem should be noted in a FAQ, or even better, in
that 3c505.txt file, or even better, there should be a snippet of code
in the driver that senses for and corrects this condition (or tells the
user to either dump the busmastering SCSI or load 3c505.o as a module).

##### This did not work for long. Eventually it started crashing again.
##### Here is the final resolution.

Ref: 0083CEF1
Title: Bus Mastering SCSI Controller Problem with 3C507 and 3C507TP
Adapters
Date: 05/05/92

Copyright 3Com Corporation, 1992. All rights reserved.

An incompatibility has been discovered using the EtherLink 16 (3C507) or

EtherLink16 TP (3C507TP) adapters and bus mastering SCSI hard disk

controllers in an Intel EISA-based PC. A few examples of these
controllers

are the Adaptec 1540 or 1542 and the BusTek SCSI controllers.

Note: Any SCSI device that performs bus mastering will have these

problems, not just devices from Adaptec or BusTek.

Symptoms reported to 3Com Tech Support include timeouts or lockups between

clients and servers during the LOGIN process, and data corruption on

large file transfers to and from the server.

The reason for the incompatibility is that the 3C507 adapter relies heavily

on the PC bus for timing information and when a bus mastering device takes

over the PC bus, it holds onto it for an indefinite period of time, causing

the 3C507 to "go to sleep" waiting for a timing response.

3Com has created a hardware fix for this problem, which requires replacing

the PAL on the 3C507 adapter to lengthen the amount of time the adapter

will wait for a signal from the bus. If you experience the above
symptoms,

call 1-800-876-3COM and select option 2 for the RMA department to have the

adapter retro-fitted with the fix.


######## Note that this took a good part of a year to resolve.
######## You may have better luck with a logic analyzer.

genkai wa doko da (ga...@my-deja.com) wrote:
: ok in recent weeks my main linux box has been panicing every day or 2

0 new messages