Connection refused during opmnctl startall

Nick

unread,

Feb 24, 2006, 2:29:31 PM2/24/06

to

I am trying to restart AS10g on SPARC Solaris 9. opmnctl command hangs
and only starts 1 of 4 processes (HTTP_Server only)... the output of
the command is below.

$ ./opmnctl startall
opmnctl: starting opmn and all managed processes...
================================================================================
opmn id=XXXX:6200
0 of 3 processes started.

ias-instance id=XXXXXXX
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ias-component/process-type/process-set:
OC4J/oca/default_island

Error
--> Process (pid=1620)
time out while waiting for a managed process to start
Log:
$ORACLE_HOME/opmn/logs/OC4J~oca~default_island~1

--------------------------------------------------------------------------------
ias-component/process-type/process-set:
OC4J/OC4J_SECURITY/default_island

Error
--> Process (pid=1621)
time out while waiting for a managed process to start
Log:
$ORACLE_HOME/opmn/logs/OC4J~OC4J_SECURITY~default_island~1

--------------------------------------------------------------------------------
ias-component/process-type/process-set:
OID/OID/OID

Error
--> Process (pid=0)
database dependency failed
SID
failed to start a managed process because a dependency check failed
Log:
none

The sqlnet.log file (below) shows the following entry over and over
again, as the connection is being refused. If I log in to the metadata
repository I receive an error indicating the listener failed to start a
dedicated process. I have tweaked my kernel and oracle settings
(semaphores, rlim_max, etc.) and they should be more than adequate. I
initially thought it was the oidmon process crashing, but I do not
think that's it.

Can anyone offer some insight here?

TNS for Solaris: Version 10.1.0.4.0 - Production
TCP/IP NT Protocol Adapter for Solaris: Version 10.1.0.4.0 -
Production
Time: 24-FEB-2006 14:16:12
Tracing not turned on.
Tns error struct:
ns main err code: 12564
TNS-12564: TNS:connection refused

TIA...

Frank van Bortel

unread,

Feb 24, 2006, 3:00:10 PM2/24/06

to

Nick wrote:
> I am trying to restart AS10g on SPARC Solaris 9. opmnctl command hangs
> and only starts 1 of 4 processes (HTTP_Server only)... the output of
> the command is below.
>
> $ ./opmnctl startall
> opmnctl: starting opmn and all managed processes...
> ================================================================================
> opmn id=XXXX:6200
> 0 of 3 processes started.
>
> ias-instance id=XXXXXXX
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ias-component/process-type/process-set:
> OC4J/oca/default_island
>
> Error
> --> Process (pid=1620)
> time out while waiting for a managed process to start
> Log:
> $ORACLE_HOME/opmn/logs/OC4J~oca~default_island~1
>
>

What if you just retry? Those Java processes are not known
for their speed...

--
Regards,
Frank van Bortel

Top-posting is one way to shut me up...

Frank van Bortel

unread,

Feb 24, 2006, 3:02:04 PM2/24/06

to

Nick wrote:
> TNS for Solaris: Version 10.1.0.4.0 - Production
> TCP/IP NT Protocol Adapter for Solaris: Version 10.1.0.4.0 -
> Production
> Time: 24-FEB-2006 14:16:12
> Tracing not turned on.
> Tns error struct:
> ns main err code: 12564
> TNS-12564: TNS:connection refused
>

Oh yeah, I type the magic words for you:

[oracle10@csdb01 oracle10]$ oerr tns 12564
12564, 00000, "TNS:connection refused"
// *Cause: The connect request was denied by the remote user (or TNS
software).
// *Action: Not normally visible to the user. For further details, turn on
// tracing and reexecute the operation.

More details, please

Nick

unread,

Feb 24, 2006, 3:01:30 PM2/24/06

to

Thanks for the response, Frank. I have retried this a few times now,
and rebooted the machine. Still no luck, I have turned on tracing, and
here is the error I see...

nsglbgetRSPidx: returning ecode=0
sntpcall: only 0 bytes read
sntpcall: Can't read from pipe; err[1] = 32
nserror: nsres: id=6, op=72, ns=12547, ns2=12560; nt[0]=517, nt[1]=32,
nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0

Does that shed some more light?

Frank van Bortel

unread,

Feb 24, 2006, 3:20:37 PM2/24/06

to

12547, 00000, "TNS:lost contact"
// *Cause: Partner has unexpectedly gone away, usually during process
// startup.
// *Action: Investigate partner application for abnormal termination. On an
// Interchange, this can happen if the machine is overloaded.

The 12560 is the generic error reported back. You can do this, too, btw,
oerr tns 12547
or
oerr ora 1401

As you state all system (and kernel?) parameters are correct, I don't
quite know where to take it from here.
Did you try starting the services manually, one-by-one?
iirc, opmnctl verbose status will show you what (or s/status/getstate/g)

See if one of the others fails with a more meaningful error message

Message has been deleted

Nick

unread,

Feb 24, 2006, 4:23:22 PM2/24/06

to

I have found some processes that appear to be hung... the following is
output from ps -fu oracle:

UID PID PPID C STIME TTY TIME CMD
oracle 3845 1 0 15:29:18 ? 0:00
oraclempris10g (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 3859 3855 0 15:33:08 pts/2 0:00 -ksh
oracle 418 1 0 14:02:12 ? 0:00
ora_pmon_XXXX
oracle 420 1 0 14:02:12 ? 0:00
ora_mman_XXXX
oracle 422 1 0 14:02:12 ? 0:01
ora_dbw0_XXXX
oracle 424 1 0 14:02:12 ? 0:01
ora_lgwr_XXXX
oracle 426 1 0 14:02:13 ? 0:04
ora_ckpt_XXXX
oracle 428 1 0 14:02:13 ? 0:05
ora_smon_XXXX
oracle 430 1 0 14:02:13 ? 0:00
ora_reco_XXXX

The ? lead me to believe these processes are hung, the main issue is
that I cannot kill them b/c their parent process is pid 1 - init.

Any advice on how to proceed?

Jim Smith

unread,

Feb 25, 2006, 1:51:00 AM2/25/06

to

In message <1140816030....@p10g2000cwp.googlegroups.com>, Nick
<Nick...@gmail.com> writes

>I have found some processes that appear to be hung... the following is
>output from ps -fu oracle:
>
> UID PID PPID C STIME TTY TIME CMD
> oracle 3845 1 0 15:29:18 ? 0:00 oraclempris10g
>(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
> oracle 3859 3855 0 15:33:08 pts/2 0:00 -ksh

> oracle 418 1 0 14:02:12 ? 0:00 ora_pmon_mpris10g
> oracle 420 1 0 14:02:12 ? 0:00 ora_mman_mpris10g
> oracle 422 1 0 14:02:12 ? 0:01 ora_dbw0_mpris10g
> oracle 424 1 0 14:02:12 ? 0:01 ora_lgwr_mpris10g
> oracle 426 1 0 14:02:13 ? 0:04 ora_ckpt_mpris10g
> oracle 428 1 0 14:02:13 ? 0:05 ora_smon_mpris10g
> oracle 430 1 0 14:02:13 ? 0:00 ora_reco_mpris10g

>
>The ? lead me to believe these processes are hung, the main issue is
>that I cannot kill them b/c their parent process is pid 1 - init.
>
>Any advice on how to proceed?
>

These are mostly oracle database background processes and are almost
certainly not hung. The ? just means they are not attached to a
terminal. Under no circumstances should they be killed.

The first one (PID 3845) is an oracle client shadow process and its
parent ought to be a sqlplus session or something similar and might be
hung.

These are probably not related to your problem.

If you want to get rid of the hung process, kill -9 3845 as root ought
to get rid of it and you can then bounce the database if you want.
--
Jim Smith
I'm afraid you've mistaken me for someone who gives a damn.

Nick

unread,

Feb 27, 2006, 9:46:09 AM2/27/06

to

Thanks all for your reply. I've still found no resolution for this
issue. I've examined every potential resource shortfall I can think
of, and everything appears to be fine. I have noticed, however, that
an effective group ID has been assigned to my oracle user. I do not
recall seeing this in the past.

Below...

$ id
uid=101(oracle) gid=100(dba) egid=2(bin)

Could this be causing the problems I am having?

Frank van Bortel

unread,

Feb 27, 2006, 1:27:11 PM2/27/06

to

Yes

Nick

unread,

Feb 28, 2006, 10:26:37 AM2/28/06

to

Thank you all for your help and insight. Turns out that the
permissions have gotten out of wack on this box. The oracle user was
using /bin/ksh - which somehow had a setgid bit in it's permissions,
and the group owner was bin - hence my egid of 2-bin. I switched the
oracle user over to /bin/sh - and was able to bring AS10g up with no
issues.

I was able to circumvent the issue - but still can't figure out why the
permissions went bad.

Thanks again....

//NC