Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

catman results in kernel panic and crashes system

141 views
Skip to first unread message

Fritz Wuehler

unread,
Jan 26, 2012, 12:23:06 AM1/26/12
to
Didn't have windex built and after googling a bit I saw you can issue
"catman" to build them. I did that and next thing I know my system
rebooted. Looking in the messages i see

reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe80003f57c0
addr=400d8 occurred in module "unix" due to an illegal access to a user
address

Is catman so dangerous it can knock down the mighty Solaris 10? On my system
the answer is yes. Anyone got a better (non destructive) idea how to build
windex files so I can appropos? Thanks.

Casper H.S. Dik

unread,
Jan 26, 2012, 5:39:20 AM1/26/12
to
catman creates a lot of I/O and that may stress the system to the
point that dormant hardware issues pop up.

If it happens all the time but with different stracktraces, then there
is a hardware issue. If the stacktrace is the same, it is most likely a
driver of kernel issue.

Casper

Nomen Nescio

unread,
Jan 26, 2012, 9:23:39 AM1/26/12
to
and...@cucumber.demon.co.uk (Andrew Gabriel) wrote:

> In article <d8ed5473b983332b...@msgid.frell.theremailer.net>,
> Fritz Wuehler <fr...@spamexpire-201201.rodent.frell.theremailer.net> writes:
> > Didn't have windex built and after googling a bit I saw you can issue
> > "catman" to build them. I did that and next thing I know my system
> > rebooted. Looking in the messages i see
> >
> > reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe80003f57c0
> > addr=400d8 occurred in module "unix" due to an illegal access to a user
> > address
> >
> > Is catman so dangerous it can knock down the mighty Solaris 10? On my system
> > the answer is yes. Anyone got a better (non destructive) idea how to build
> > windex files so I can appropos? Thanks.
>
> You haven't posted enough of the crash message for anyone to know
> what the problem is.

Ok..it was not intentional, just didn't know what to include.

> Also, you haven't said what release and patch level.

Release is 10U8 installed from the DVD. Didn't patch anything.

> Also, does it happen every time, or just a once-off?

Dunno, once I poke myself in the eye with a sharp stick I'm usually smart
enough not to do it again without a good reason. That's why I asked here.

> i.e. nowhere near enough info if you were looking for some help.

Depends...if somebody else tried catman and it crashed his system and
figured it out already then it might be enough info :-) Smart people learn
from their mistakes. Brilliant people learn from other people's mistakes.


Here's all I could find from /var/adm/messages

Jan 21 08:23:35 localhost unix: [ID 836849 kern.notice]
Jan 21 08:23:35 localhost ^Mpanic[cpu0]/thread=ffffffff845a5c20:
Jan 21 08:23:35 localhost genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=fffffe80003f57c0 addr=400d8 occurred in module "unix" due to an illegal access to a user address
Jan 21 08:23:35 localhost unix: [ID 100000 kern.notice]
Jan 21 08:23:35 localhost unix: [ID 839527 kern.notice] init:
Jan 21 08:23:35 localhost unix: [ID 753105 kern.notice] #pf Page fault
Jan 21 08:23:35 localhost unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x400d8
Jan 21 08:23:35 localhost unix: [ID 243837 kern.notice] pid=1, pc=0xfffffffffb83d5fd, sp=0xfffffe80003f58b8, eflags=0x10206
Jan 21 08:23:35 localhost unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f0<xmme,fxsr,pge,mce,pae,pse>
Jan 21 08:23:35 localhost unix: [ID 354241 kern.notice] cr2: 400d8 cr3: 13ffcd000 cr8: c
Jan 21 08:23:35 localhost unix: [ID 592667 kern.notice] rdi: fffffffff4ef4ca0 rsi: 1 rdx: 80
Jan 21 08:23:35 localhost unix: [ID 592667 kern.notice] rcx: 2 r8: 1 r9: fffffe80003f5810
Jan 21 08:23:35 localhost unix: [ID 592667 kern.notice] rax: ffffffff845a5c20 rbx: fffffffffbc29cc0 rbp: fffffe80003f5910
Jan 21 08:23:35 localhost unix: [ID 592667 kern.notice] r10: 7c r11: 40000 r12: 40000
Jan 21 08:23:35 localhost unix: [ID 592667 kern.notice] r13: 4 r14: fffffffff4ef4ca0 r15: 1
Jan 21 08:23:35 localhost unix: [ID 592667 kern.notice] fsb: ffffffff80000000 gsb: fffffffffbc29cc0 ds: 43
Jan 21 08:23:35 localhost unix: [ID 592667 kern.notice] es: 43 fs: 0 gs: 1c3
Jan 21 08:23:35 localhost unix: [ID 592667 kern.notice] trp: e err: 0 rip: fffffffffb83d5fd
Jan 21 08:23:35 localhost unix: [ID 592667 kern.notice] cs: 28 rfl: 10206 rsp: fffffe80003f58b8
Jan 21 08:23:35 localhost unix: [ID 266532 kern.notice] ss: 0
Jan 21 08:23:35 localhost unix: [ID 100000 kern.notice]
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f56d0 unix:die+da ()
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f57b0 unix:trap+5e6 ()
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f57c0 unix:_cmntrap+140 ()
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5910 unix:mutex_owner_running+d ()
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5970 genunix:get_free_vpmap+122 ()
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f59b0 genunix:get_vpmap+30 ()
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5a10 genunix:vpm_pagecreate+a8 ()
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5bb0 genunix:vpm_map_pages+308 ()
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5c80 genunix:vpm_data_copy+72 ()
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5d60 tmpfs:wrtmp+115 ()
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5db0 tmpfs:tmp_write+6e ()
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5e00 genunix:fop_write+31 ()
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5eb0 genunix:write+287 ()
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5ec0 genunix:write32+e ()
Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5f10 unix:brand_sys_syscall32+1a3 ()
Jan 21 08:23:35 localhost unix: [ID 100000 kern.notice]
Jan 21 08:23:35 localhost genunix: [ID 672855 kern.notice] syncing file systems...
Jan 21 08:23:35 localhost genunix: [ID 904073 kern.notice] done
Jan 21 08:23:36 localhost genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel


Thank you.

John D Groenveld

unread,
Jan 26, 2012, 9:53:58 AM1/26/12
to
In article <4f212d58$0$6909$e4fe...@news2.news.xs4all.nl>,
Casper H.S. Dik <Caspe...@OrSPaMcle.COM> wrote:
>If it happens all the time but with different stracktraces, then there
>is a hardware issue. If the stacktrace is the same, it is most likely a
>driver of kernel issue.

Al Hopper's article might be a good starting point:
<URL:http://solaris-x86.org/documents/tutorials/crashdiag.mhtml>

I have used memtest86 on the Ultimate Boot CD to hunt
down bad memory modules:
<URL:http://ubcd.sourceforge.net/>

John
groe...@acm.org

Nomen Nescio

unread,
Jan 26, 2012, 10:24:38 AM1/26/12
to
Casper H.S. Dik <Caspe...@OrSPaMcle.COM> wrote:

> catman creates a lot of I/O and that may stress the system to the
> point that dormant hardware issues pop up.

This is a home file server and development system under Solaris 10/U8, no
patches, serving NFS and webpages and a few ssh users. Could catman stress
the system in a way it hasn't already been stressed?

>
> If it happens all the time but with different stracktraces, then there
> is a hardware issue. If the stacktrace is the same, it is most likely a
> driver of kernel issue.

That is helpful thank you. If nobody else posts anything specific to catman
I'll try it a few times and compare the logs. The system hasn't shown any
problems before this and the zpool status shows 0 errors and all devices
ONLINE.

John D Groenveld

unread,
Jan 26, 2012, 10:40:43 AM1/26/12
to
In article <7968ac595f543077...@dizum.com>,
Nomen Nescio <nob...@dizum.com> wrote:
>I'll try it a few times and compare the logs. The system hasn't shown any
>problems before this and the zpool status shows 0 errors and all devices
>ONLINE.

After you test memory and your processor, you can test the
storage system via a filesystem benchmark:
<URL:http://www.coker.com.au/bonnie++/>
<URL:http://www.iozone.org/>

John
groe...@acm.org

cindy

unread,
Jan 26, 2012, 10:49:13 AM1/26/12
to
On Jan 26, 7:23 am, Nomen Nescio <nob...@dizum.com> wrote:
> and...@cucumber.demon.co.uk (Andrew Gabriel) wrote:
> > In article <d8ed5473b983332bc69b3abe9c457...@msgid.frell.theremailer.net>,
What does fmdump say? If fmdump is empty, try fmdump -eV.

Thanks,

Cindy

Andrew Gabriel

unread,
Jan 26, 2012, 4:03:28 AM1/26/12
to
In article <d8ed5473b983332b...@msgid.frell.theremailer.net>,
You haven't posted enough of the crash message for anyone to know
what the problem is. Also, you haven't said what release and patch
level. Also, does it happen every time, or just a once-off?

i.e. nowhere near enough info if you were looking for some help.

--
Andrew Gabriel
[email address is not usable -- followup in the newsgroup]

Fritz Wuehler

unread,
Jan 26, 2012, 4:04:13 PM1/26/12
to
Sounds interesting even without crashing the system thanks.

Andrew Gabriel

unread,
Jan 26, 2012, 5:39:00 PM1/26/12
to
In article <2f2f3d06ce92debc...@dizum.com>,
Nomen Nescio <nob...@dizum.com> writes:
> and...@cucumber.demon.co.uk (Andrew Gabriel) wrote:
>
>> In article <d8ed5473b983332b...@msgid.frell.theremailer.net>,
>> Fritz Wuehler <fr...@spamexpire-201201.rodent.frell.theremailer.net> writes:
>> > Didn't have windex built and after googling a bit I saw you can issue
>> > "catman" to build them. I did that and next thing I know my system
>> > rebooted. Looking in the messages i see
>> >
>> > reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe80003f57c0
>> > addr=400d8 occurred in module "unix" due to an illegal access to a user
>> > address
>> >
>> > Is catman so dangerous it can knock down the mighty Solaris 10? On my system
>> > the answer is yes. Anyone got a better (non destructive) idea how to build
>> > windex files so I can appropos? Thanks.
>>
>> You haven't posted enough of the crash message for anyone to know
>> what the problem is.
>
> Ok..it was not intentional, just didn't know what to include.
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f56d0 unix:die+da ()
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f57b0 unix:trap+5e6 ()
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f57c0 unix:_cmntrap+140 ()
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5910 unix:mutex_owner_running+d ()
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5970 genunix:get_free_vpmap+122 ()
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f59b0 genunix:get_vpmap+30 ()
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5a10 genunix:vpm_pagecreate+a8 ()
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5bb0 genunix:vpm_map_pages+308 ()
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5c80 genunix:vpm_data_copy+72 ()
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5d60 tmpfs:wrtmp+115 ()
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5db0 tmpfs:tmp_write+6e ()
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5e00 genunix:fop_write+31 ()
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5eb0 genunix:write+287 ()
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5ec0 genunix:write32+e ()
> Jan 21 08:23:35 localhost genunix: [ID 655072 kern.notice] fffffe80003f5f10 unix:brand_sys_syscall32+1a3 ()

So the kernel has crashed whilst performing a write to tmpfs (probably a
file in /tmp) by catman.

As Cindy said, check fmdump to see if Solaris detected anything starting to
go wrong before the eventual failure.

If you have a hardware problem, you will likely see more crashes or hangs
when you put the system under load, but they are usually all different
backtraces, because it's fairly random what gets hit.

If it's a kernel bug, then the backtraces will all be the same or
a small number of similar variations.

It's not a bug in catman itself - the kernel should protect itself from
anything nasty which any application does.

Nomen Nescio

unread,
Feb 1, 2012, 2:27:35 PM2/1/12
to
Hi Andrew & Cindy:

> As Cindy said, check fmdump to see if Solaris detected anything starting
> to go wrong before the eventual failure.

fmdump doesn't show anything

fmdump -eV shows i/o errors on a drive that started going bad a few weeks
ago and has been replaced already with no new errors showing. No fmdump
-eV output more recent than about two weeks ago.

> If you have a hardware problem, you will likely see more crashes or hangs
> when you put the system under load, but they are usually all different
> backtraces, because it's fairly random what gets hit.

Good!

>
> If it's a kernel bug, then the backtraces will all be the same or
> a small number of similar variations.

Had a spontaneous panic just now after the box had been up for a day or
so, nothing special running (didn't do catman again). I had prstat windows
open on another box via ssh and saw nothing out of the ordinary. Load was
about 5% CPU and 8% memory.

The backtrace is virtually identical to the one when catman failed. Does
this smell like a kernel bug? Can non-customers check against some bug
tracking system? This box is running Update 8 (last Sun release) and doesn't
seem to be able to install anything newer. Thanks guys.

John D Groenveld

unread,
Feb 1, 2012, 2:45:42 PM2/1/12
to
In article <1d81f117cc02fe64...@dizum.com>,
Nomen Nescio <nob...@dizum.com> wrote:
>Had a spontaneous panic just now after the box had been up for a day or
>so, nothing special running (didn't do catman again). I had prstat windows
>open on another box via ssh and saw nothing out of the ordinary. Load was
>about 5% CPU and 8% memory.

Did you run memtest86 from Ultimate Boot CD?

>The backtrace is virtually identical to the one when catman failed. Does

What is the stack trace?

John
groe...@acm.org

Chris Ridd

unread,
Feb 1, 2012, 3:48:20 PM2/1/12
to
On 2012-02-01 19:27:35 +0000, Nomen Nescio said:

> Had a spontaneous panic just now after the box had been up for a day or
> so, nothing special running (didn't do catman again). I had prstat windows
> open on another box via ssh and saw nothing out of the ordinary. Load was
> about 5% CPU and 8% memory.
>
> The backtrace is virtually identical to the one when catman failed. Does
> this smell like a kernel bug? Can non-customers check against some bug
> tracking system? This box is running Update 8 (last Sun release) and doesn't
> seem to be able to install anything newer. Thanks guys.

No, it smells like a hardware (most likely memory) problem. Boot
memtest and run it overnight, and longer if possible.
--
Chris

GreyCloud

unread,
Feb 1, 2012, 5:42:13 PM2/1/12
to
In all cases that I was involved with in regards to kernel panics is
about 95% of the time it was memory... the most vulnerable piece of
hardware.

Andrew Gabriel

unread,
Feb 1, 2012, 6:06:44 PM2/1/12
to
In article <pO2dnQk4YY1bIrTS...@bresnan.com>,
GreyCloud <mi...@cumulus.com> writes:
> In all cases that I was involved with in regards to kernel panics is
> about 95% of the time it was memory... the most vulnerable piece of
> hardware.

...following by cooling failure of some part of the system,
e.g. siezed fan, clogged vents, etc. Particularly white
box systems.

Nomen Nescio

unread,
Feb 2, 2012, 10:20:05 AM2/2/12
to
> Did you run memtest86 from Ultimate Boot CD?

Not yet.

>
> >The backtrace is virtually identical to the one when catman failed. Does
>
> What is the stack trace?

Not sure how to get it yet I will look.

John D Groenveld

unread,
Feb 2, 2012, 10:50:55 AM2/2/12
to
In article <e49ac468d0835e6a...@dizum.com>,
Nomen Nescio <nob...@dizum.com> wrote:
>Not sure how to get it yet I will look.

What is the output of dumpadm(1M)?

John
groe...@acm.org

Fritz Wuehler

unread,
Feb 2, 2012, 4:08:36 PM2/2/12
to
and...@cucumber.demon.co.uk (Andrew Gabriel) wrote:

> In article <pO2dnQk4YY1bIrTS...@bresnan.com>,
> GreyCloud <mi...@cumulus.com> writes:
> > In all cases that I was involved with in regards to kernel panics is
> > about 95% of the time it was memory... the most vulnerable piece of
> > hardware.
>
> ...following by cooling failure of some part of the system,
> e.g. siezed fan, clogged vents, etc. Particularly white
> box systems.

This is a custom build and has extra cooling and everything is clean and
tidy.

Nomen Nescio

unread,
Feb 2, 2012, 7:10:58 PM2/2/12
to
GreyCloud <mi...@cumulus.com> wrote:

> In all cases that I was involved with in regards to kernel panics is
> about 95% of the time it was memory... the most vulnerable piece of
> hardware.

Thanks I hope it is not a hardware problem but right now RAM is cheaper than
disk drives....

cindy swearingen

unread,
Feb 3, 2012, 10:43:28 AM2/3/12
to
On Feb 2, 5:10 pm, Nomen Nescio <nob...@dizum.com> wrote:
Nomen,

I agree with the other comments that you need to run diagnostics on
this system
and memory. In addition, if you have the crash dump available, you can
get the
stack trace, something like this:

# cd /var/crash/system-name
# mdb vmcore.0
> ::status
> ::stack

Thanks,

Cindy

Nomen Nescio

unread,
Feb 5, 2012, 3:25:13 AM2/5/12
to
Update: catman is working ok. I tried it a few times once building the main
indexes and then again later when it missed something installed in a
nonstandard path. No panics. From Andrew's comments some kind of kernel bug
seems more likely or maybe RAM. Couldn't get memtest86+ to boot from floppy
I'll be making a USB boot and try again.

David Combs

unread,
Mar 10, 2012, 6:50:50 PM3/10/12
to
In article <jgcgi4$bc1$1...@dont-email.me>,
Andrew Gabriel <and...@cucumber.demon.co.uk> wrote:
>...
>...following by cooling failure of some part of the system,
>e.g. siezed fan, clogged vents, etc. Particularly white
>box systems.
>
>--
>Andrew Gabriel
>[email address is not usable -- followup in the newsgroup]

Stupid question: what is a "white box" system?

Thanks,

David

David Combs

unread,
Mar 10, 2012, 6:53:57 PM3/10/12
to
In article <dfe26809d6e734d7...@dizum.com>,
RAM for Sparc? (eg sunblade 100, 2500, ...) If so, where do you buy it?

David

Ian Collins

unread,
Mar 10, 2012, 7:08:01 PM3/10/12
to
On 03/11/12 12:50 PM, David Combs wrote:
> In article<jgcgi4$bc1$1...@dont-email.me>,
> Andrew Gabriel<and...@cucumber.demon.co.uk> wrote:
>> ...
>> ...following by cooling failure of some part of the system,
>> e.g. siezed fan, clogged vents, etc. Particularly white
>> box systems.
>
> Stupid question: what is a "white box" system?

GIYF

--
Ian Collins

Fritz Wuehler

unread,
Mar 11, 2012, 1:42:40 AM3/11/12
to
No, David. Scroll up. SPARC never fails. It's shitty INTEL that never works!

John D Groenveld

unread,
Mar 11, 2012, 10:38:38 AM3/11/12
to
[follow-up's to comp.sys.sun.wanted]
In article <jjgpil$fo$2...@reader1.panix.com>,
David Combs <dkc...@panix.com> wrote:
>RAM for Sparc? (eg sunblade 100, 2500, ...) If so, where do you buy it?

Micron's Crucial website is a quick reference for
memory specifications for Sun systems:
<URL:http://www.crucial.com/>

John
groe...@acm.org

Thad Floryan

unread,
Mar 11, 2012, 8:16:31 PM3/11/12
to
On 3/10/2012 3:53 PM, David Combs wrote:
> [...]
> RAM for Sparc? (eg sunblade 100, 2500, ...) If so, where do you buy it?

Here:

<http://www.memoryx.net/>

<http://www.memoryx.net/sunblade.html>

<http://www.memoryx.net/location.html> location addresses

I've been buying Sun-compatible RAM from them for what seems like decades
for my Sun IPXs, SS 10s, SS20s, etc etc and have been very pleased.

They also were the suppliers of RAM upgrades for my HP LaserJet printers
such as LaserJet 4050n, LaserJet P2015dn, etc.

Greg Andrews

unread,
Mar 12, 2012, 1:03:29 PM3/12/12
to
David Combs <dkc...@panix.com> wrote:
>Andrew Gabriel <and...@cucumber.demon.co.uk> wrote:
>>...
>>...following by cooling failure of some part of the system,
>>e.g. siezed fan, clogged vents, etc. Particularly white
>>box systems.
>>
>
>Stupid question: what is a "white box" system?
>

A computer that a person has assembled/integrated from various
components. The most common example being commodity PC motherboard
and parts mounted in a generic case colored white. A system in
a white box.

-Greg
--
Do NOT reply via e-mail.
Reply in the newsgroup.

David Combs

unread,
Mar 17, 2012, 10:48:10 PM3/17/12
to
In article <jjla91$4ug$1...@reader1.panix.com>,
Thank you!

David

David Combs

unread,
Mar 17, 2012, 10:57:30 PM3/17/12
to
In article <4F5D405F...@thadlabs.com>,
Thanks to you both!

David

0 new messages