/>ifconfig -a
0821-515 ifconfig: error loading
/usr/lib/drivers/if_/=.q/=.|/=.Æ/=.Ñ/usr/lib/drivers/if_: A file or
directory in the path name does not exist.
If I go into smit and reconfigure my network interfaces everything seems
okay, but my system still locks up. I also get the following messages in my
errpt from restarting the network (I have to stop and start after a lock
up/hard boot because of entries in /etc/sm.bak)
Can someone please tell me what I have to do to get my box back to normal?
Thanks in advance!
David Cook
---------------------------------------------------------------------------
LABEL: SRC
IDENTIFIER: E18E984F
Date/Time: Tue Jun 25 08:25:34
Sequence Number: 2060
Machine Id: 000706054C00
Node Id: platinum
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
PERFORM PROBLEM RECOVERY PROCEDURES
Detail Data
SYMPTOM CODE
134
SOFTWARE ERROR CODE
-9017
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'342'
FAILING MODULE
gated
---------------------------------------------------------------------------
LABEL: CORE_DUMP
IDENTIFIER: C60BB505
Date/Time: Tue Jun 25 08:25:34
Sequence Number: 2059
Machine Id: 000706054C00
Node Id: platinum
Class: S
Type: PERM
Resource Name: SYSPROC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SOFTWARE PROGRAM
User Causes
USER GENERATED SIGNAL
Recommended Actions
CORRECT THEN RETRY
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
6
USER'S PROCESS ID:
24774
FILE SYSTEM SERIAL NUMBER
5
INODE NUMBER
2
PROGRAM NAME
gated
ADDITIONAL INFORMATION
raise 4C
??
abort B8
task_quit 284
rip_init 42C
task_prot C8
main 74C
__start 8C
Symptom Data
REPORTABLE
1
INTERNAL ERROR
1
SYMPTOM CODE
PIDS/5765c3403 LVLS/430 PCSS/SPI2 FLDS/gated SIG/6 FLDS/task_quit VALU/284
---------------------------------------------------------------------------
LABEL: SRC
IDENTIFIER: E18E984F
Date/Time: Tue Jun 25 08:25:33
Sequence Number: 2058
Machine Id: 000706054C00
Node Id: platinum
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
PERFORM PROBLEM RECOVERY PROCEDURES
Detail Data
SYMPTOM CODE
-256
SOFTWARE ERROR CODE
-9017
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'342'
FAILING MODULE
mrouted
---------------------------------------------------------------------------
LABEL: SRC
IDENTIFIER: E18E984F
Date/Time: Tue Jun 25 08:25:33
Sequence Number: 2057
Machine Id: 000706054C00
Node Id: platinum
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
PERFORM PROBLEM RECOVERY PROCEDURES
Detail Data
SYMPTOM CODE
256
SOFTWARE ERROR CODE
-9017
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'342'
FAILING MODULE
dhcpsd
---------------------------------------------------------------------------
LABEL: SRC
IDENTIFIER: E18E984F
Date/Time: Tue Jun 25 08:25:33
Sequence Number: 2056
Machine Id: 000706054C00
Node Id: platinum
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
PERFORM PROBLEM RECOVERY PROCEDURES
Detail Data
SYMPTOM CODE
256
SOFTWARE ERROR CODE
-9017
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'342'
FAILING MODULE
dhcpcd
---------------------------------------------------------------------------
LABEL: SRC
IDENTIFIER: E18E984F
Date/Time: Tue Jun 25 08:25:33
Sequence Number: 2055
Machine Id: 000706054C00
Node Id: platinum
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
PERFORM PROBLEM RECOVERY PROCEDURES
Detail Data
SYMPTOM CODE
256
SOFTWARE ERROR CODE
-9017
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'342'
FAILING MODULE
dhcprd
---------------------------------------------------------------------------
LABEL: SRC
IDENTIFIER: E18E984F
Date/Time: Tue Jun 25 08:25:33
Sequence Number: 2054
Machine Id: 000706054C00
Node Id: platinum
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
PERFORM PROBLEM RECOVERY PROCEDURES
Detail Data
SYMPTOM CODE
256
SOFTWARE ERROR CODE
-9017
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'342'
FAILING MODULE
xntpd
---------------------------------------------------------------------------
LABEL: SRC
IDENTIFIER: E18E984F
Date/Time: Tue Jun 25 08:25:32
Sequence Number: 2053
Machine Id: 000706054C00
Node Id: platinum
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
PERFORM PROBLEM RECOVERY PROCEDURES
Detail Data
SYMPTOM CODE
256
SOFTWARE ERROR CODE
-9017
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'342'
FAILING MODULE
iptrace
>If I go into smit and reconfigure my network interfaces everything seems
>okay, but my system still locks up. I also get the following messages in my
>errpt from restarting the network (I have to stop and start after a lock
>up/hard boot because of entries in /etc/sm.bak)
>
>Can someone please tell me what I have to do to get my box back to normal?
I don't know what the problem is, but I would suggest doing a fsck on all
filesystems in single user mode, preferrably booted off a CDROM. May also
want to run full system diagnostics, as well as certifying the drives with
a non-invasive read-only scan.
Then when you say 'locking up', does it lock up for your network sessions,
but it is ok on its serial port or console sessions?
Anything in errpt about duplicate IP addresses?
If it locks up only for network sessions, check the ARP table on the
host *and* the switch, and make sure it matches what your MAC address
is ('entstat en0' assuming it's en0, to see the MAC address on your box
and probably 'show port <portnum>' if on a Cisco switch).
Any errpt complaints about buffer thresholds too low?
What does your paging space use look like? (network buffers may use some
of this, thanks to the mbufs heritage which was one of the AIX V3 / BSD
design holdovers) 'lsps -a' should help.
How much physical memory does your box have? Then tell us the 'thewall'
setting -- ie, 'no -o thewall' will spit that out.
When it locks up... do you have tons of connections (like 1000+) in
netstat?
# netstat -n
and maybe...
# netstat -n|grep "^tcp4"|wc -l
Finally, does the output of '/etc/ifconfig -a' look sane?
Some food for thought. Post here with details if you're still stuck.
-Dan
(mailed to original poster as a courtesy and posted to comp.unix.aix)
>My box keeps locking up. RS/6000 43P AIX 4.3.2 with the latest levels of
>everything. I get corrupted looking output when I run ifconfig -a
>
>/>ifconfig -a
>0821-515 ifconfig: error loading
>/usr/lib/drivers/if_/=.q/=.|/=.Æ/=.Ñ/usr/lib/drivers/if_: A file or
>directory in the path name does not exist.
>
>
AFAIK 4.3.2 does not have a '-a' option for ifconfig - has it ever
worked on that system? Try "netstat -i" and/or "netstat -in" instead.
It locks up everything including serial port connections.
> Anything in errpt about duplicate IP addresses?
Nothing.
> Any errpt complaints about buffer thresholds too low?
>
> What does your paging space use look like? (network buffers may use some
> of this, thanks to the mbufs heritage which was one of the AIX V3 / BSD
> design holdovers) 'lsps -a' should help.
>
> How much physical memory does your box have? Then tell us the 'thewall'
> setting -- ie, 'no -o thewall' will spit that out.
I've got 512MB RAM and thewall is set to 256MB. I set it to 384MB as well
as setting sockthresh to 95 and the box still locked up.
> When it locks up... do you have tons of connections (like 1000+) in
> netstat?
I would not be able to look.
> # netstat -n
>
> and maybe...
>
> # netstat -n|grep "^tcp4"|wc -l
>
> Finally, does the output of '/etc/ifconfig -a' look sane?
It does now. I turned off ppp and haven't had any problems for a few days.
platinum@/etc>ifconfig -a
lo0:
flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1/0 en0:
flags=4e080863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64B
IT,PSEG>
inet 192.168.0.2 netmask 0xffffff00 broadcast 192.168.0.255 et0:
flags=4e080863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64B
IT,PSEG>
inet 192.168.0.2 netmask 0xffffff00 broadcast 192.168.0.255
netstat -n|grep "^tcp4"|wc -l = 25 at the moment.
I will run fsck tonight and see what comes up. Thank you for your response!
- David Cook
Well... normally that kind of lock-up (*entire* system freezes, on network,
serial, and console connections) has two basic causes:
A bug in the kernel or a kernel module somewhere
or
Bad hardware
I'm more inclined to think bad hardware -- possibly a bad memory DIMM --
because buggy kernel would usually result in at least *something* -- errpt
entry, crash dump, 888/0c?, something.
A bad memory DIMM does produce these kind of symptoms when you access
memory. Someone had similar problems with a 'tar' job some time ago and
discovered it to be a bad DIMM in his case. Worth trying replacing memory
with equivalents that perhaps you can temporarily borrow or cannibalize?
Possible reasons for increased stability lately is either that it just
hasn't warmed up (ie, DIMM hasn't flexed which might interrupt the contacts)
or the memory space served by it just hasn't been touched yet.
I also find it interesting that you correlate the increased system stability
to not using the PPP stuff (which is traditionally kernel mode-resident).
-Dan