I have a stack of these laying around and I have tried obsd installs on 4
of them. On all of them there is weirdness. obsd freezes up after a short
time on all of them, sometimes you can still ping the machine but can't
use the console or log in remotly or even use a simple network service
such as finger. This happens [the freezes] when I do certain things like
this:
start setiathome
pkd_add 'name of large package'
The machine is still pingable 12 hours later but control never comes back.
The only thing in common between all these machines is the hard drive,
which might be the trouble, I suppose. It's a 406M HD and I don't have any
other HDs big enough to install on. :-( (I could try a minimal install on
a 120M drive I suppose)
The only other mention of this line of computers on this list that I can
find was someone mentioning they have one of these and obsd only sees
16M of 24M of RAM (same thing happens to me on one of the boxes)
If anyone can tip me off on how to get more debug info short of compiling
a new kernel since I doubt I could compile anything on there so I could
submit more info on this it'd be appreciated. (In my mind there should be
a kernel with verbose debugging facilities available for every port)
thanks,
brian
I have experienced a similar problem with an old 486MB using a Hypertec
AMD-586 not that this hardware is a problem as I can run Solaris 7 and
various flavours of Linux on it without any problem. However OpenBSD 2.7
is another story. I haven't filed a bug report on it yet as I have other
matters to attend to so it isn't a priority but I do get an error message
"aha_scsi_cmd cannot map" which comes from the Adaptec AHA-1542 driver.
The symptom I observe is identical to yours and I can reproduce when
running two concurrent compiles on this computer. I can also make it occur
when Samba, Squid & Postfix are running but doing absolutly nothing (i.e.
they are idle processes) and comiling a kernel. It took me ages to workout
why I was having this problem when compiling a kernel. I was able to
successfuly compile without a freeze when the above services were all
shutdown.
Now, down to what the cause of my problem is. Fortunatly I was running
performance meter on my Sun monitoring the OpenBSD system. During a
compilation of the kernel I saw "swap" activity and at this precise
instant almost all the lines dropped to zero. Of course the network was
still running as I was continuing to receive updates from rstatd but the
console appeared frozen. I could CTRL-C out of the make process and land
back to the prompt however doing an ls or running any other command that
wants to access the disc it then freezes. Remember that error message I
get! Basically disc I/O operations have come to a grinding halt so nothing
more functions other than processes that are running from physical memory.
I did leave the computer overnight once after it had frozen to find that
in the morning it had paniced - to be expected if a process was to be
swapped back in and the disk wasn't accessable.
In summary of my observation, there is very little swap activiy in my
system but maybe after 1, sometimes 2 or 3 swap occurences the system will
freeze.
If you have a Solaris system on your network you could use the performance
meter to monitor your OpenBSD system. It would be interesting to see if
you observe the same as I have when the freeze occurs.
Do you have any SCSI controllers in your Compaqs or are you using the IDE
interface.
Cheers,
Larry.
Right, I get exactly the same problem on my -current box (P-III, 256mb
ram, TritonII, 40gb EIDE RAID-1 via Duplidisk).
I've pinned it down to be reproducable whenever I SCP around five large files
across to it. Completely freezes down to the console, but is still
pingable.
Now that it's a reproducable fault, I'll try to pin it down to exactly
where the problem is. It's quite hard though, given that the darn
console freezes up too. Anyone give me tips on any verbose kernel output
I can enable to help track the problem down?
--
Anil Madhavapeddy / "I told you not to flush that!"
an...@recoil.org / Stern Lecture Plumbing
I suspect this isn't a SCSI-only problem, as the same thing
happens to me with an EIDE RAID-1 setup.
I don't get any console error messages at all :-/
(btw, softupdates are off)
On Thu, Oct 05, 2000 at 04:20:48PM +0100, Anil Madhavapeddy wrote:
> gaw zay wrote:
> >=20
> > I have a stack of these laying around and I have tried obsd installs on=
4
> > of them. On all of them there is weirdness. obsd freezes up after a sho=
rt
> > time on all of them, sometimes you can still ping the machine but can't
> > use the console or log in remotly or even use a simple network service
> > such as finger. This happens [the freezes] when I do certain things like
> > this:
> >=20
>=20
> Right, I get exactly the same problem on my -current box (P-III, 256mb
> ram, TritonII, 40gb EIDE RAID-1 via Duplidisk).
>=20
> I've pinned it down to be reproducable whenever I SCP around five large f=
iles
> across to it. Completely freezes down to the console, but is still
> pingable.
>=20
> Now that it's a reproducable fault, I'll try to pin it down to exactly
> where the problem is. It's quite hard though, given that the darn
> console freezes up too. Anyone give me tips on any verbose kernel output
> I can enable to help track the problem down?
>=20
I noticed the same thing on my PII-350 with 128MB and AHA-2940UW.
After about 10 minutes (sometimes more, sometimes less) the
keyboard becomes unresponsive. The system is still working
(I started a kernel compile and the system still compiled fine
after the keyboard dies) I didn't really investigate the cause of
the problem but found out that starting X avoids the problem. The
keyboard doesn't lock up in X.
Regards
Martin
--=20
Remember, God could only create the world in 6 days because he didn't
have an established user base.
--AqsLC8rIMeq19msA
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.3 (GNU/Linux)
Comment: For info see http://www.gnupg.org
iD8DBQE53RvYtOa6aqYVgUYRAsijAJ4iDh2XHHPkYRRKC5NNSW7HUmv2OQCgjjjA
gE6RA58n/SsBpIfORqIec74=
=JwDn
-----END PGP SIGNATURE-----
--AqsLC8rIMeq19msA--
> > I believe it was caused by my disk partitions not being aligned on
> > cylinder boundaries.
> >
>
> Hmmm, thats a thought. I had considered the possibility that perhaps /
> and/or /usr overlapped the swap partition hence causing corruption so I
> repartitioned swap to be one cylinder further away from its adjacent
> partitions, alas this didn't help.
>
> Running "newfs -N" on my / & swap partitions doesn't reveal any problems
> however /usr does say that "288 sector(s) in the last cylinder
> unallocated"
>
> The OS shouldn't spit the dummy because of this during normal operation,
> unless I have misunderstood your point. Apart from which it is the file
> system that is loosing out on some space.
just to mention it I also had a "XXX sectors in the last cylander
unallocated" message and the same freezing problems. perhaps there is
something to this.
brian
On Thu, 5 Oct 2000, Neil Darlow wrote:
>
> I had a problem with userland compiles failing in this manner.
>
> I believe it was caused by my disk partitions not being aligned on
> cylinder boundaries.
>
Hmmm, thats a thought. I had considered the possibility that perhaps /
and/or /usr overlapped the swap partition hence causing corruption so I
repartitioned swap to be one cylinder further away from its adjacent
partitions, alas this didn't help.
Running "newfs -N" on my / & swap partitions doesn't reveal any problems
however /usr does say that "288 sector(s) in the last cylinder
unallocated"
The OS shouldn't spit the dummy because of this during normal operation,
unless I have misunderstood your point. Apart from which it is the file
system that is loosing out on some space.
Cheers,
Larry.
Symptom: hard lock, no console response, no network response, no serial port response
no 'DDB>' kernel panic, nothing in /var/log/* to suggest anything wrong. No core.
Not even an offer for a game of hangman to reward my crash.
Things I've tried:
Replaced hardware (power supply,MLB,SCSI controller,ethernet board)
Dropping back to GENERIC.
Rebuilding GENERIC.
Stripping GENERIC of evrything not obvoiusly usefull (ahc, xl ,and vt)
Help.. eek..
> I suspect this isn't a SCSI-only problem, as the same thing
> happens to me with an EIDE RAID-1 setup.
> I don't get any console error messages at all :-/
I had a problem with userland compiles failing in this manner.
I believe it was caused by my disk partitions not being aligned on
cylinder boundaries.
Regards,
Neil Darlow.
Nope. Coincidence. With modern disk drives the whole concept of
`cylinders' is somewhat bogus. Your disk has n blocks. The OS picks
a somewhat arbitrary blocks/cylinder value. If that value is not modulo
the number of blocks you'll get a remainder... the XXX sectors in the last
cylinder message just means that it's easier to ignore the `short' cylinder
than add special case code for it.
// marc
oh well.
how modern is modern anyway? This drive I have is 406M which is a
far cry from 'modern' but it ain't st-506 either. Still my 3rd biggest HD
out of 7 computers (w/9HDs).
brian