Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Oninit from cron leaves engine in permanent fast recovery

255 views
Skip to first unread message

gerry....@dsl.pipex.com

unread,
Oct 5, 2005, 1:44:04 PM10/5/05
to
Hi

We have a requirement to automatically bounce our 9.40.FC6 instance on
AIX 5.2.

The idea is to use cron (informix) to run a script that sets the
correct environment, etc. and then issues a 'onmode -yuk', sleeps for
30 seconds then issues a 'oninit -v'.

This script ... and a cut-down wee version ... runs simply and
beautifully from the informix shell. But, when cron executes the
script the instance stops cleanly but fails to restart.

We have compared environments and have not found anything obvious that
could account for this strange behaviour.

Have any of you experienced anything like this before? I have included
the output from the script for both shell and cron methods of
execution. I have not included any configuration information as yet.
This would appear to be a deep-rooted problem, possibly to do with
process management under AIX ... or something like that ... ?! My
AIX/UNIX knowledge doesn't stretch that far!

Thanks in advance for any help anyone can offer.

Yours dsperately,
Mr Zap

Script
-----------------------------------------
#!/bin/ksh
. /opt/informixdba/Profiles/dba_profile 940 shm
/usr/informix/bin/onmode -yuk
sleep 30
/usr/informix/bin/oninit -v
-----------------------------------------
Output from shell
-----------------------------------------
Checking group membership to determine server run modesucceeded
Reading configuration file
'/usr/informix/etc/onconfig.gimukldnp01_00'...succeeded
Creating /INFORMIXTMP/.infxdirs ... succeeded
Creating infos file "/usr/informix/etc/.infos.gimukldnp01_00" ...
"/usr/informix/etc/.conf.gimukldnp01_00" ... succeeded
Writing to infos file ... succeeded
Checking config parameters...succeeded
Allocating and attaching to shared memory...succeeded
Creating resident pool 43988 kbytes...succeeded
Creating buffer pool 800000 kbytes...succeeded
Creating buffer pool 8 kbytes...succeeded
Initializing rhead structure...succeeded
Initializing ASF ...succeeded
Initializing Dictionary Cache and SPL Routine Cache...succeeded
Bringing up ADM VP...succeeded
Creating VP classes...succeeded
Onlining 0 additional cpu vps...succeeded
Onlining 2 IO vps...succeeded
Initialization of Encryption...succeeded
Forking main_loop thread...succeeded
Initializing DR structures...succeeded
Forking 1 'sqlmux' listener threads...succeeded
Forking 1 'ipcshm' listener threads...succeeded
Forking 1 'soctcp' listener threads...succeeded
Forking 1 'soctcp' listener threads...succeeded
Forking 1 'soctcp' listener threads...succeeded
Forking 1 'soctcp' listener threads...succeeded
Starting tracing...succeeded
Initializing 6 flushers...succeeded
Initializing log/checkpoint information...succeeded
Opening primary chunks...succeeded
Opening mirror chunks...succeeded
Initializing dbspaces...succeeded
Validating chunks...succeeded
Initialize Async Log Flusher...succeeded
Forking btree cleaner...succeeded
Initializing DBSPACETEMP list
Checking database partition index...succeeded
Checking location of physical log...succeeded
Initializing dataskip structure...succeeded
Checking for temporary tables to drop
Forking onmode_mon thread...succeeded
Verbose output complete: mode = 5
-----------------------------------------
onstat -
-----------------------------------------
IBM Informix Dynamic Server Version 9.40.FC6 -- On-Line -- Up
00:06:42 -- 1633776 Kbytes
-----------------------------------------
Output from cron
-----------------------------------------
Checking group membership to determine server run modesucceeded
Reading configuration file
'/usr/informix/etc/onconfig.gimukldnp01_00'...succeeded
Creating /INFORMIXTMP/.infxdirs ... succeeded
Creating infos file "/usr/informix/etc/.infos.gimukldnp01_00" ...
"/usr/informix/etc/.conf.gimukldnp01_00" ... succeeded
Writing to infos file ... succeeded
Checking config parameters...succeeded
Allocating and attaching to shared memory...succeeded
Creating resident pool 43988 kbytes...succeeded
Creating buffer pool 800000 kbytes...succeeded
Creating buffer pool 8 kbytes...succeeded
Initializing rhead structure...succeeded
Initializing ASF ...succeeded
Initializing Dictionary Cache and SPL Routine Cache...succeeded
Bringing up ADM VP...succeeded
Creating VP classes...succeeded
Onlining 0 additional cpu vps...succeeded
Onlining 2 IO vps...succeeded
Initialization of Encryption...succeeded
Forking main_loop thread...succeeded
Initializing DR structures...succeeded
Forking 1 'sqlmux' listener threads...succeeded
Forking 1 'ipcshm' listener threads...succeeded
Forking 1 'soctcp' listener threads...succeeded
Forking 1 'soctcp' listener threads...succeeded
Forking 1 'soctcp' listener threads...succeeded
Forking 1 'soctcp' listener threads...succeeded
Starting tracing...succeeded
Initializing 6 flushers...succeeded
Initializing log/checkpoint information...succeeded
Opening primary chunks...succeeded
Opening mirror chunks...succeeded
Initializing dbspaces...succeeded
Validating chunks...succeeded
Initialize Async Log Flusher...succeeded
Forking btree cleaner...succeeded
Initializing DBSPACETEMP list
-----------------------------------------
onstat -
-----------------------------------------
IBM Informix Dynamic Server Version 9.40.FC6 -- Fast Recovery -- Up
00:00:06 -- 1633776 Kbytes
-----------------------------------------

It stays in fast recovery mode so long ... we have to kill the oninit
processes!

All of the dbspaces are valid ... including temp spaces.

da...@smooth1.co.uk

unread,
Oct 5, 2005, 10:10:51 PM10/5/05
to

Change to onmode -yuck...

onmode returns BEFORE the instance has finished shutting down.

once onmode has finished run onstat - until it shows shared memory not
initialized

Then wait 30 secs and oninit -vy

Jonathan Leffler

unread,
Oct 6, 2005, 1:50:59 AM10/6/05
to
gerry....@dsl.pipex.com wrote:
> We have a requirement to automatically bounce our 9.40.FC6 instance on
> AIX 5.2.
>
> The idea is to use cron (informix) to run a script that sets the
> correct environment, etc. and then issues a 'onmode -yuk', sleeps for
> 30 seconds then issues a 'oninit -v'.
>
> This script ... and a cut-down wee version ... runs simply and
> beautifully from the informix shell. But, when cron executes the
> script the instance stops cleanly but fails to restart.
>
> We have compared environments and have not found anything obvious that
> could account for this strange behaviour.
>
> Have any of you experienced anything like this before? I have included
> the output from the script for both shell and cron methods of
> execution. I have not included any configuration information as yet.
> This would appear to be a deep-rooted problem, possibly to do with
> process management under AIX ... or something like that ... ?! My
> AIX/UNIX knowledge doesn't stretch that far!
>
> Thanks in advance for any help anyone can offer.


Couple of things to consider:
1. What is the current working directory when you run the command via
the shell? Where does cron run things? Where does your shell run them?
Does it matter? (The answer to the last is usually no, but you can
design your system so that it does matter if you choose to do so - using
relative pathnames instead of absolute ones.)

2. Which file descriptors are open in the shell? And which in cron?
Chances are that the input in cron is coming from /dev/null; I don't
think that should matter, but ...

Looking at the outputs, the shell version contains these lines:

Initializing DBSPACETEMP list
Checking database partition index...succeeded
Checking location of physical log...succeeded
Initializing dataskip structure...succeeded
Checking for temporary tables to drop
Forking onmode_mon thread...succeeded
Verbose output complete: mode = 5

Whereas the cron version stops at the first...

I'm not sure what inferences to draw from that, but it certainly should
give someone some ammunition.


--
Jonathan Leffler #include <disclaimer.h>
Email: jlef...@earthlink.net, jlef...@us.ibm.com
Guardian of DBD::Informix v2005.02 -- http://dbi.perl.org/

Simmons, Keith

unread,
Oct 6, 2005, 3:56:38 AM10/6/05
to

Gerry

What does the log file say?
I've seen a problem starting a database where a program (4GL) is still
attached to shared memory and will not allow the system to release it
and so will not allow the engine to restart.
Stop all programs befor stopping the DB or remove the the shared memory
segments between stopping and starting.

Keith

-> -----Original Message-----
-> From: gerry....@dsl.pipex.com
-> [mailto:gerry....@dsl.pipex.com]
-> Sent: Wednesday, October 05, 2005 6:44 PM
-> To: inform...@iiug.org
-> Subject: Oninit from cron leaves engine in permanent fast recovery
->
->
-> Hi
->
-> We have a requirement to automatically bounce our 9.40.FC6
-> instance on
-> AIX 5.2.
->
-> The idea is to use cron (informix) to run a script that sets the
-> correct environment, etc. and then issues a 'onmode -yuk', sleeps for
-> 30 seconds then issues a 'oninit -v'.
->
-> This script ... and a cut-down wee version ... runs simply and
-> beautifully from the informix shell. But, when cron executes the
-> script the instance stops cleanly but fails to restart.
->
-> We have compared environments and have not found anything
-> obvious that
-> could account for this strange behaviour.
->
-> Have any of you experienced anything like this before? I
-> have included
-> the output from the script for both shell and cron methods of
-> execution. I have not included any configuration information as yet.
-> This would appear to be a deep-rooted problem, possibly to do with
-> process management under AIX ... or something like that ... ?! My
-> AIX/UNIX knowledge doesn't stretch that far!
->
-> Thanks in advance for any help anyone can offer.
->
-> Yours dsperately,
-> Mr Zap
->
-> Script
-> -----------------------------------------
-> #!/bin/ksh
-> . /opt/informixdba/Profiles/dba_profile 940 shm
-> /usr/informix/bin/onmode -yuk
-> sleep 30
-> /usr/informix/bin/oninit -v
-> -----------------------------------------
-> Output from shell
-> -----------------------------------------
-> Checking group membership to determine server run modesucceeded
-> Reading configuration file
-> '/usr/informix/etc/onconfig.gimukldnp01_00'...succeeded
-> Creating /INFORMIXTMP/.infxdirs ... succeeded
-> Creating infos file "/usr/informix/etc/.infos.gimukldnp01_00" ...
-> "/usr/informix/etc/.conf.gimukldnp01_00" ... succeeded
-> Writing to infos file ... succeeded
-> Checking config parameters...succeeded
-> Allocating and attaching to shared memory...succeeded
-> Creating resident pool 43988 kbytes...succeeded
-> Creating buffer pool 800000 kbytes...succeeded
-> Creating buffer pool 8 kbytes...succeeded
-> Initializing rhead structure...succeeded
-> Initializing ASF ...succeeded
-> Initializing Dictionary Cache and SPL Routine Cache...succeeded
-> Bringing up ADM VP...succeeded
-> Creating VP classes...succeeded
-> Onlining 0 additional cpu vps...succeeded
-> Onlining 2 IO vps...succeeded
-> Initialization of Encryption...succeeded
-> Forking main_loop thread...succeeded
-> Initializing DR structures...succeeded
-> Forking 1 'sqlmux' listener threads...succeeded
-> Forking 1 'ipcshm' listener threads...succeeded
-> Forking 1 'soctcp' listener threads...succeeded
-> Forking 1 'soctcp' listener threads...succeeded
-> Forking 1 'soctcp' listener threads...succeeded
-> Forking 1 'soctcp' listener threads...succeeded
-> Starting tracing...succeeded
-> Initializing 6 flushers...succeeded
-> Initializing log/checkpoint information...succeeded
-> Opening primary chunks...succeeded
-> Opening mirror chunks...succeeded
-> Initializing dbspaces...succeeded
-> Validating chunks...succeeded
-> Initialize Async Log Flusher...succeeded
-> Forking btree cleaner...succeeded
-> Initializing DBSPACETEMP list
-> Checking database partition index...succeeded
-> Checking location of physical log...succeeded
-> Initializing dataskip structure...succeeded
-> Checking for temporary tables to drop
-> Forking onmode_mon thread...succeeded
-> Verbose output complete: mode = 5
-> -----------------------------------------
-> onstat -
-> -----------------------------------------
-> IBM Informix Dynamic Server Version 9.40.FC6 -- On-Line -- Up
-> 00:06:42 -- 1633776 Kbytes
-> -----------------------------------------
-> Output from cron
-> -----------------------------------------
-> Checking group membership to determine server run modesucceeded
-> Reading configuration file
-> '/usr/informix/etc/onconfig.gimukldnp01_00'...succeeded
-> Creating /INFORMIXTMP/.infxdirs ... succeeded
-> Creating infos file "/usr/informix/etc/.infos.gimukldnp01_00" ...
-> "/usr/informix/etc/.conf.gimukldnp01_00" ... succeeded
-> Writing to infos file ... succeeded
-> Checking config parameters...succeeded
-> Allocating and attaching to shared memory...succeeded
-> Creating resident pool 43988 kbytes...succeeded
-> Creating buffer pool 800000 kbytes...succeeded
-> Creating buffer pool 8 kbytes...succeeded
-> Initializing rhead structure...succeeded
-> Initializing ASF ...succeeded
-> Initializing Dictionary Cache and SPL Routine Cache...succeeded
-> Bringing up ADM VP...succeeded
-> Creating VP classes...succeeded
-> Onlining 0 additional cpu vps...succeeded
-> Onlining 2 IO vps...succeeded
-> Initialization of Encryption...succeeded
-> Forking main_loop thread...succeeded
-> Initializing DR structures...succeeded
-> Forking 1 'sqlmux' listener threads...succeeded
-> Forking 1 'ipcshm' listener threads...succeeded
-> Forking 1 'soctcp' listener threads...succeeded
-> Forking 1 'soctcp' listener threads...succeeded
-> Forking 1 'soctcp' listener threads...succeeded
-> Forking 1 'soctcp' listener threads...succeeded
-> Starting tracing...succeeded
-> Initializing 6 flushers...succeeded
-> Initializing log/checkpoint information...succeeded
-> Opening primary chunks...succeeded
-> Opening mirror chunks...succeeded
-> Initializing dbspaces...succeeded
-> Validating chunks...succeeded
-> Initialize Async Log Flusher...succeeded
-> Forking btree cleaner...succeeded
-> Initializing DBSPACETEMP list
-> -----------------------------------------
-> onstat -
-> -----------------------------------------
-> IBM Informix Dynamic Server Version 9.40.FC6 -- Fast
-> Recovery -- Up
-> 00:00:06 -- 1633776 Kbytes
-> -----------------------------------------
->
-> It stays in fast recovery mode so long ... we have to kill the oninit
-> processes!
->
-> All of the dbspaces are valid ... including temp spaces.
->

**********************************************************************************
This message is sent in strict confidence for the addressee only. It may
contain legally privileged information. The contents are not to be disclosed
to anyone other than the addressee. Unauthorised recipients are requested
to preserve this confidentiality and to advise the sender immediately of any
error in transmission.
This footnote also confirms that this email message has been swept for the
presence of computer viruses, however we cannot guarantee that this message
is free from such problems.
**********************************************************************************
sending to informix-list

gerry....@dsl.pipex.com

unread,
Oct 6, 2005, 6:25:49 AM10/6/05
to
Thanks for the responses.

Including a 'c' in the shutdown doesn't make any difference Dave ...
sadly. I have tried flushing, forcing, advancing, clearing shm ... all
to no avail.

There are no relative paths referenced in the script. It's all very
explicit. The script has been executed from different staring points.

I don't know about open file descriptors Jonathan, but I will find out.
There is no input and output is redirected to a log file in both
cases.

I'm not sure what to draw from the fact that it hangs on the temp space
initialisation either ... but it would seem like a clue :) There is,
however, nothing apparently wrong with the spaces.

Thanks for the suggestions.

Martin Fuerderer

unread,
Oct 6, 2005, 4:30:08 AM10/6/05
to

Hi,

for a test try to execute "oninit" instead of "oninit -v"
in your script. I suspect that the problem may
somehow have to do with the fact that the "oninit"
processes write to stdout when IDS is started with
"oninit -v". (Handling of stdout and stderr for
processes started by cron is a bit different from
the handling done by a shell.)

If that fixes the problem, then please let me know.

To check in your script whether "oninit" started the
server correctly, you can use "onstat -" output,
maybe after a further delay (sleep) of 10 seconds ...

Other than that I've no good idea at the moment.
Maybe you want to check after "onmode -yuk" that
no clients, onbar processes, etc. are still hanging
around trying to connect ...

Regards,
Martin
--
Martin Fuerderer
IBM Informix Development Munich, Germany
Information Management

owner-inf...@iiug.org wrote on 05.10.2005 19:44:04:
> Hi
>
> We have a requirement to automatically bounce our 9.40.FC6 instance on
> AIX 5.2.


>
> The idea is to use cron (informix) to run a script that sets the

> correct environment, etc. and then issues a 'onmode -yuk', sleeps for

> 30 seconds then issues a 'oninit -v'.
>

> This script ... and a cut-down wee version ... runs simply and

> beautifully from the informix shell. But, when cron executes the

> script the instance stops cleanly but fails to restart.
>

> We have compared environments and have not found anything obvious that


> could account for this strange behaviour.
>

> Have any of you experienced anything like this before? I have included


> the output from the script for both shell and cron methods of

> execution. I have not included any configuration information as yet.

> This would appear to be a deep-rooted problem, possibly to do with

> process management under AIX ... or something like that ... ?! My

> AIX/UNIX knowledge doesn't stretch that far!
>

> Thanks in advance for any help anyone can offer.
>

> Yours dsperately,
> Mr Zap
>
> Script
> -----------------------------------------
> #!/bin/ksh
> . /opt/informixdba/Profiles/dba_profile 940 shm
> /usr/informix/bin/onmode -yuk
> sleep 30
> /usr/informix/bin/oninit -v
> -----------------------------------------
> Output from shell
> -----------------------------------------

> Checking group membership to determine server run modesucceeded

> Reading configuration file
> '/usr/informix/etc/onconfig.gimukldnp01_00'...succeeded
> Creating /INFORMIXTMP/.infxdirs ... succeeded

> Creating infos file "/usr/informix/etc/.infos.gimukldnp01_00" ...

> "/usr/informix/etc/.conf.gimukldnp01_00" ... succeeded


> Writing to infos file ... succeeded

> Checking config parameters...succeeded


> Allocating and attaching to shared memory...succeeded

> Creating resident pool 43988 kbytes...succeeded

> Creating buffer pool 800000 kbytes...succeeded

> Creating buffer pool 8 kbytes...succeeded

> Initializing rhead structure...succeeded
> Initializing ASF ...succeeded

> Initializing Dictionary Cache and SPL Routine Cache...succeeded

> Bringing up ADM VP...succeeded
> Creating VP classes...succeeded

> Onlining 0 additional cpu vps...succeeded

> Onlining 2 IO vps...succeeded
> Initialization of Encryption...succeeded
> Forking main_loop thread...succeeded
> Initializing DR structures...succeeded

> Forking 1 'sqlmux' listener threads...succeeded

> Forking 1 'ipcshm' listener threads...succeeded

> Forking 1 'soctcp' listener threads...succeeded
> Forking 1 'soctcp' listener threads...succeeded
> Forking 1 'soctcp' listener threads...succeeded
> Forking 1 'soctcp' listener threads...succeeded

> Starting tracing...succeeded
> Initializing 6 flushers...succeeded
> Initializing log/checkpoint information...succeeded
> Opening primary chunks...succeeded
> Opening mirror chunks...succeeded
> Initializing dbspaces...succeeded
> Validating chunks...succeeded
> Initialize Async Log Flusher...succeeded
> Forking btree cleaner...succeeded
> Initializing DBSPACETEMP list
> Checking database partition index...succeeded

> Checking location of physical log...succeeded

> Initializing dataskip structure...succeeded


> Checking for temporary tables to drop

> Forking onmode_mon thread...succeeded


> Verbose output complete: mode = 5
> -----------------------------------------

> onstat -


> -----------------------------------------
> IBM Informix Dynamic Server Version 9.40.FC6 -- On-Line -- Up

> 00:06:42 -- 1633776 Kbytes

> -----------------------------------------
> Output from cron
> -----------------------------------------

> Checking group membership to determine server run modesucceeded

> Reading configuration file
> '/usr/informix/etc/onconfig.gimukldnp01_00'...succeeded
> Creating /INFORMIXTMP/.infxdirs ... succeeded

> Creating infos file "/usr/informix/etc/.infos.gimukldnp01_00" ...

> "/usr/informix/etc/.conf.gimukldnp01_00" ... succeeded


> Writing to infos file ... succeeded

> Checking config parameters...succeeded


> Allocating and attaching to shared memory...succeeded

> Creating resident pool 43988 kbytes...succeeded

> Creating buffer pool 800000 kbytes...succeeded

> Creating buffer pool 8 kbytes...succeeded

> Initializing rhead structure...succeeded
> Initializing ASF ...succeeded

> Initializing Dictionary Cache and SPL Routine Cache...succeeded

> Bringing up ADM VP...succeeded
> Creating VP classes...succeeded

> Onlining 0 additional cpu vps...succeeded

> Onlining 2 IO vps...succeeded
> Initialization of Encryption...succeeded
> Forking main_loop thread...succeeded
> Initializing DR structures...succeeded

> Forking 1 'sqlmux' listener threads...succeeded

> Forking 1 'ipcshm' listener threads...succeeded

> Forking 1 'soctcp' listener threads...succeeded
> Forking 1 'soctcp' listener threads...succeeded
> Forking 1 'soctcp' listener threads...succeeded
> Forking 1 'soctcp' listener threads...succeeded

> Starting tracing...succeeded
> Initializing 6 flushers...succeeded
> Initializing log/checkpoint information...succeeded
> Opening primary chunks...succeeded
> Opening mirror chunks...succeeded
> Initializing dbspaces...succeeded
> Validating chunks...succeeded
> Initialize Async Log Flusher...succeeded
> Forking btree cleaner...succeeded
> Initializing DBSPACETEMP list
> -----------------------------------------

> onstat -
> -----------------------------------------
> IBM Informix Dynamic Server Version 9.40.FC6 -- Fast Recovery -- Up


> 00:00:06 -- 1633776 Kbytes
> -----------------------------------------
>

> It stays in fast recovery mode so long ... we have to kill the oninit

> processes!


>
> All of the dbspaces are valid ... including temp spaces.

sending to informix-list

Colin Dawson

unread,
Oct 6, 2005, 9:31:18 AM10/6/05
to

Perhaps you could try redirecting stdout & stderr to a file


Regards

Colin

There are 10 types of people in the world, those that understand binary and
those that don't

sending to informix-list

Bill Dare

unread,
Oct 6, 2005, 9:59:33 AM10/6/05
to

You might want to check your script and make certain that the line:

#!/bin/ksh

is the first line in the script, no blank lines before it. That line
specifies that the script be run in the Korn shell, but only if it is
the first line. Otherwise it is ignored as a comment. Since jobs
executed from cron run in a Bourne shell, this may be causing your
problem. I don't see anything in your posting to support that, but it is
worth a mention.

Regards,
Bill


> -----Original Message-----
> From: owner-inf...@iiug.org [SMTP:owner-inf...@iiug.org]
> On Behalf Of gerry....@dsl.pipex.com
> Sent: Thursday, October 06, 2005 6:26 AM
> To: inform...@iiug.org
> Subject: Re: Oninit from cron leaves engine in permanent fast
> recovery
>

sending to informix-list

Andreas Breitfeld

unread,
Oct 6, 2005, 10:01:10 AM10/6/05
to

On Thursday 06 October 2005 12:25, you wrote:
> Thanks for the responses.
>
> Including a 'c' in the shutdown doesn't make any difference Dave ...
> sadly. I have tried flushing, forcing, advancing, clearing shm ... all
> to no avail.
>
> There are no relative paths referenced in the script. It's all very
> explicit. The script has been executed from different staring points.
>
> I don't know about open file descriptors Jonathan, but I will find out.
> There is no input and output is redirected to a log file in both
> cases.
Did you redirected stdout _and_ stderr of your script to a file?

your_script > /tmp/cron.log 2>&1

Otherwise try additional redirecting for commands of your script, e.g.

...
/usr/informix/bin/onmode -yuk > /tmp/onmode.log 2>&1
...
/usr/informix/bin/oninit -v > /tmp/oninit.log 2>&1


Andreas

da...@smooth1.co.uk

unread,
Oct 6, 2005, 2:03:31 PM10/6/05
to

When it hang what does the online log say?

What does onstat - say?

Darren...@carmax.com

unread,
Oct 6, 2005, 3:47:55 PM10/6/05
to

Try creating a startup script, informix_start. Include in the beginning of
the scripts a source file to set all of your env variables. Below is the
script I use to start informix. It works via cron as well as the rc3
startup scripts (atleast it works on HP):

informix_start:
# Source the informix environment variable
. /etc/informix_env
# Start the Informix Database engine.
oninit
# Start backing up logical logs
sleep 30
/usr/informix/scripts/log_backup.ksh
# Set priority of jobs
/usr/informix/scripts/set_inf_prio 120

Thanks


Andreas Breitfeld
<abre...@de.ibm.
com> To
Sent by: inform...@iiug.org
owner-informix-li cc
s...@iiug.org

Subject
Re: Oninit from cron leaves engine

10/06/2005 10:01 in permanent fast recovery
AM




your_script > /tmp/cron.log 2>&1


Andreas

sending to informix-list

Nog

unread,
Oct 7, 2005, 2:16:46 AM10/7/05
to
If your CONSOLE param in the ONCONFIG file is set to /dev/console try
pointing it at a plain file instead, e.g
/usr/informix/<instance>.console.

I had a similar problem earlier this year when my server console daemon
left town and informix just wouldn't start, it was waiting to write to
console and couldn't

TBP

unread,
Oct 7, 2005, 3:35:23 AM10/7/05
to
I think Nog has it :)

gerry....@dsl.pipex.com

unread,
Oct 19, 2005, 10:24:13 AM10/19/05
to
Sorry ...

... Nog does not have it. In fact, although there have been numerous
suggestions in response to this posting, all had already been tried or
were in place and nothing works. If anyone is interested, this problem
has been passed to IBM and they don't know why it won't start either.

The only information I have not been able to supply IBM is an analysis
of a core dump of the instance process.

Does anyone know 1) how to create a core dump of oninit. I have tried
sending all sorts of kill signals to do this. I can easily kill the
engine parent when it is stuck in FR but no core file is produced.

If I can get a core, I think I can use dbx to produce a stack/procedure
dump so that IBM can pinpoint the failure ... which, as people have
noticed, appears to have something to do with the initialisation of the
DBSPACETEMP list.

There is a slight problem with the latter. When cron shuts down the
instance, it will not restart with cron. But the very same offline
instance can be started by running oninit from the shell. It does not
have a problem with DBSPACETEMP or any of the dbspaces.

Please come forward if you REALLY know how to produce a core file from
an oninit.

Thanks for all your efforts.

Kind regards,
Gerry

da...@smooth1.co.uk

unread,
Oct 19, 2005, 2:02:20 PM10/19/05
to

Also check the release notes....under $INFORMIXDIR/release.

On AIX do you need to create a KAIO device (aiodev??) or something. And
run something like strload?

Make sure NOAGE = 0 in onconfig as it may not be supported.
Check RESIDENT = 0 in onconfig as I have seen it not supported on AIX.

Pity you are not on Solaris as you could run truss to trace system
calls made by the process..not sure
what the equivalent is on AIX 5.2. On AIX 4.2.1 there was a truss-like
tool but it only can with the
'Performance Toolbox' that the client was not willing to spend money
on.

What does the online log say when it hangs?

If it is in recovery mode that it is probably rolling back a
transaction.

Does the output from onstat -l change at all..I don't have a 9.40 here
(running IDS 10 here) but
run onstat -l twice to a file 1 minute apart. The difference should be
more than just the first line with the
uptime in it!.

What does onstat -g ath show? Run it a few times...which threads are
running?

Does onstat -p run 1 minutes apart change?

Post the onstat -p and onstat -l difference as well as onstat -g ath
output.

da...@smooth1.co.uk

unread,
Oct 19, 2005, 2:02:21 PM10/19/05
to

TBP

unread,
Oct 19, 2005, 7:11:27 PM10/19/05
to
gerry....@dsl.pipex.com wrote:
> Sorry ...
>
> ... Nog does not have it. In fact, although there have been numerous
> suggestions in response to this posting, all had already been tried or
> were in place and nothing works. If anyone is interested, this problem
> has been passed to IBM and they don't know why it won't start either.
>
> The only information I have not been able to supply IBM is an analysis
> of a core dump of the instance process.
>

What about the online.log?

That has been asked for before in this thread and yet ...

Also, what does a ps -ef output show, immediately after the "explicit"
oninit -v command is run (just do ps -ef | grep on if you want to limit
the output, would be interesting to see the oninit processes running,
along with anything else starting with "on").

gerry....@dsl.pipex.com

unread,
Oct 20, 2005, 6:47:14 AM10/20/05
to
Hi

The online log shows;

05:38:51 On-Line Mode
05:41:02 Shutdown Mode
05:41:03 Quiescent Mode
05:41:05 IBM Informix Dynamic Server Stopped.

05:41:26 IBM Informix Dynamic Server Started.

Thu Oct 20 05:41:28 2005

05:41:28 Event alarms enabled. ALARMPROG =
'/usr/informix/etc/alarmprogram.sh'
05:41:28 Booting Language <c> from module <>
05:41:28 Loading Module <CNULL>
05:41:28 Booting Language <builtin> from module <>
05:41:28 Loading Module <BUILTINNULL>
05:41:28 VP pid=811246 priority fixed at 60, former = 103
05:41:33 Requested shared memory segment size rounded from 2776KB to
2784KB
05:41:33 IBM Informix Dynamic Server Version 9.40.FC6 Software
Serial Number AAA#B000000
05:41:34 IBM Informix Dynamic Server Initialized -- Shared Memory
Initialized.

05:41:34 Started 1 btree scanners.
05:41:34 Low priority set for the btree scanners.
05:41:34 Btree scanner threshold set at 50000.
05:41:34 Btree scanner range scan size set at -1.
05:41:34 Physical Recovery Started at Page (2:28085).
05:41:34 Physical Recovery Complete: 0 Pages Examined, 0 Pages
Restored.
05:41:34 Logical Recovery Started.
05:41:34 10 recovery worker threads will be started.

Ï obviously have to get a complete oninit process list each time I
have to forceably kill the engine. It shows;

informix 557132 790670 0 05:41:28 - 0:00 oninit -v
informix 585960 790670 0 05:41:31 - 0:00 oninit -v
informix 635134 790670 0 05:41:33 - 0:00 oninit -v
informix 700464 790670 0 05:41:33 - 0:00 oninit -v
informix 708762 790670 0 05:41:29 - 0:00 oninit -v
informix 725048 790670 0 05:41:28 - 0:00 oninit -v
informix 762012 680042 0 05:41:26 - 0:00 oninit -v
informix 770124 790670 0 05:41:30 - 0:00 oninit -v
informix 774186 790670 0 05:41:28 - 0:00 oninit -v
informix 778312 790670 0 05:41:33 - 0:00 oninit -v
informix 782550 790670 0 05:41:33 - 0:00 oninit -v
informix 786480 790670 0 05:41:32 - 0:00 oninit -v
informix 790670 811246 0 05:41:28 - 0:00 oninit -v
informix 811246 762012 120 05:41:26 - 3:47 oninit -v

There are no backup processes running (onbar or ontape) ... in fact,
nothing else beginning with 'on'.

Back to the drawing board :(

gerry....@dsl.pipex.com

unread,
Oct 20, 2005, 8:10:02 AM10/20/05
to
Hi Dave

Forced residency is off. We are not using NOAGE as have moved to
VPCLASS. We have, however, switched it on in the cpu VPCLASS entry.
It is supported according to the release notes and will be set to the
default prioritx of 60 as the IFMX_CPUVP_PRIORITY environment variable
is not set.

I think I now have the wherewithall to get a stack dump from the oninit
core file. Trying now.

The engine is in Fast Recovery, but there were no open transactions.
There is no transaction logging on any of the databases!

There is NO difference between onstat l commands run at one (or two, or
any) minute intervals. Not even the uptime as this is not changing
while the instance is in FR mode.

onstat -g ath shows;

IBM Informix Dynamic Server Version 9.40.FC6 -- Fast Recovery -- Up
00:00:06 -- 1633776 Kbytes

Threads:
tid tcb rstcb prty status
vp-class name
2 700000050d205f8 0 2 sleeping forever
5lio lio vp 0
3 700000050d4e1f8 0 2 sleeping forever
6pio pio vp 0
4 700000050d6e1f8 0 2 sleeping forever
7aio aio vp 0
5 700000050d8e1f8 0 2 sleeping forever
8msc msc vp 0
6 700000050dc61f8 0 2 sleeping forever
9aio aio vp 1
7 700000050de6418 700000050371028 4 sleeping secs: 1
1cpu main_loop()
8 700000050d6e500 0 2 sleeping forever
1cpu sm_poll
9 700000050e21aa8 0 2 running
10soc soctcppoll
10 700000050e32d88 0 2 running
11soc soctcppoll
11 700000050e4bd88 0 2 running
12soc soctcppoll
12 700000050e64d88 0 2 running
13soc soctcppoll
13 700000050e7dd88 0 2 cond wait muxcon_ava
1cpu sqlmuxlst
14 700000050e93b30 0 3 sleeping forever
1cpu sm_listen
15 700000050ec74f0 0 2 sleeping secs: 1
1cpu sm_discon
16 700000050ed42d0 0 3 sleeping forever
1cpu soctcplst
17 700000050eea570 0 3 sleeping forever
1cpu soctcplst
18 700000050f02780 0 3 sleeping forever
1cpu soctcplst
19 700000050f19a20 0 3 sleeping forever
1cpu soctcplst
20 700000050f312d0 700000050371850 2 sleeping forever
1cpu flush_sub(0)
21 700000050f315d8 700000050372078 2 sleeping forever
1cpu flush_sub(1)
22 700000050f318e0 7000000503728a0 2 sleeping forever
1cpu flush_sub(2)
23 700000050f31be8 7000000503730c8 2 sleeping forever
1cpu flush_sub(3)
24 700000050f61028 7000000503738f0 2 sleeping forever
1cpu flush_sub(4)
25 700000050f61330 700000050374118 2 sleeping forever
1cpu flush_sub(5)
26 700000050f61740 0 4 sleeping forever
1cpu kaio
27 700000051107798 700000050374940 3 sleeping forever
1cpu aslogflush
28 7000000511321c0 700000050375168 1 sleeping secs: 2
1cpu btscanner 0
29 700000051132c38 700000050375990 2 ready
1cpu fast_rec
30 7000000511969d8 7000000503761b8 2 sleeping secs: 1
1cpu bld_logrecs
31 700000051196c78 7000000503769e0 2 cond wait packet_con
1cpu logredo
32 7000000514678e8 700000050377208 2 cond wait packet_con
1cpu xchg_1.0
33 700000051467bf0 700000050377a30 2 cond wait packet_con
1cpu xchg_1.1
34 700000051491028 700000050378258 2 cond wait packet_con
1cpu xchg_1.2
35 700000051491330 700000050378a80 2 cond wait packet_con
1cpu xchg_1.3
36 700000051491638 7000000503792a8 2 cond wait packet_con
1cpu xchg_1.4
37 700000051491940 700000050379ad0 2 cond wait packet_con
1cpu xchg_1.5
38 700000051491c48 70000005037a2f8 2 cond wait packet_con
1cpu xchg_1.6
39 7000000514f1028 70000005037ab20 2 cond wait packet_con
1cpu xchg_1.7
40 7000000514f1330 70000005037b348 2 cond wait packet_con
1cpu xchg_1.8
41 7000000514f1638 70000005037bb70 2 cond wait packet_con
1cpu xchg_1.9
42 70000005152c550 70000005037c398 2 sleeping secs: 1
1cpu xchg_2.0

There is no change after ... any amount of time (that is reasonable to
test).

Same applies to onstat -p mate, apart from usercpu (going up) and
syscpu (fluctuating slightly +/-). The engine does not appear to be
doing anything!!!!

Thanks for the suggestions.

Gerry

TBP

unread,
Oct 20, 2005, 2:45:59 PM10/20/05
to
gerry....@dsl.pipex.com wrote:
> Hi
>
<snip>

> informix 557132 790670 0 05:41:28 - 0:00 oninit -v
> informix 585960 790670 0 05:41:31 - 0:00 oninit -v
> informix 635134 790670 0 05:41:33 - 0:00 oninit -v
> informix 700464 790670 0 05:41:33 - 0:00 oninit -v
> informix 708762 790670 0 05:41:29 - 0:00 oninit -v
> informix 725048 790670 0 05:41:28 - 0:00 oninit -v
> informix 762012 680042 0 05:41:26 - 0:00 oninit -v
> informix 770124 790670 0 05:41:30 - 0:00 oninit -v
> informix 774186 790670 0 05:41:28 - 0:00 oninit -v
> informix 778312 790670 0 05:41:33 - 0:00 oninit -v
> informix 782550 790670 0 05:41:33 - 0:00 oninit -v
> informix 786480 790670 0 05:41:32 - 0:00 oninit -v
> informix 790670 811246 0 05:41:28 - 0:00 oninit -v
> informix 811246 762012 120 05:41:26 - 3:47 oninit -v
>
> There are no backup processes running (onbar or ontape) ... in fact,
> nothing else beginning with 'on'.
>
> Back to the drawing board :(
>
>
> TBP wrote:
>
>>gerry....@dsl.pipex.com wrote:
>>
>>>Sorry ...
>>>
>>>... Nog does not have it. In fact, although there have been numerous
>>>suggestions in response to this posting, all had already been tried or
>>>were in place and nothing works. If anyone is interested, this problem
>>>has been passed to IBM and they don't know why it won't start either.
>>>

What about suggesting getting in touch with AIX support regarding
starting a ksh script which runs a processes which forks children and
what are the caveats.

>>>The only information I have not been able to supply IBM is an analysis
>>>of a core dump of the instance process.
>>>
>>
>>What about the online.log?
>>
>>That has been asked for before in this thread and yet ...
>>
>>Also, what does a ps -ef output show, immediately after the "explicit"
>>oninit -v command is run (just do ps -ef | grep on if you want to limit
>>the output, would be interesting to see the oninit processes running,
>>along with anything else starting with "on").
>
>

Right, I think you need to read up on cron and queuedefs ...

I mucked around a bit today, and managed to get into a similar situation
by mucking around with parent and child process with a debugger.

The giveaway from the ps -ef output is that the main oninit process
still has a parent shell, and not a parent pid of 1.

I would suggest that this is not an informix product issue, but more
"what on earth is cron doing when it starts processes which have to fork
children". Reading up on cron on AIX, there appear to be quite a few
"things to bear in mind".

Having said that :

1. Why do you have to bounce the engine?

We have a requirement to automatically bounce our 9.40.FC6 instance on
AIX 5.2.

2. Can you provide the contents of dba_profile?

========================


-----------------------------------------
#!/bin/ksh
. /opt/informixdba/Profiles/dba_profile 940 shm
/usr/informix/bin/onmode -yuk
sleep 30
/usr/informix/bin/oninit -v

========================

Obnoxio The Clown

unread,
Oct 20, 2005, 4:38:35 PM10/20/05
to

TBP said:
>
> What about suggesting getting in touch with AIX support regarding
> starting a ksh script which runs a process *which forks children* and
> what are the caveats.

Sounds more like you need to get hold of the police or social services. :o|

--
Bye now,
Obnoxio

"C'est pas parce qu'on n'a rien ` dire qu'il faut fermer sa gueule"
- Coluche

"You are an index and a prologue to the history of lust and foul thoughts."
- William Shakespeare
sending to informix-list

gerry....@dsl.pipex.com

unread,
Oct 21, 2005, 5:54:25 AM10/21/05
to
Thanks TBP, I think you may have hit on something there. The
difference between ps outputs taken from the cron startup and the
manual startup is shown below;

< informix 585744 770152 0 02:54:26 - 0:00 oninit -v
< informix 643312 585744 76 02:54:27 - 84:26 oninit -v
< informix 680146 803060 0 02:54:30 - 0:00 oninit -v
< informix 684190 803060 0 02:54:33 - 0:00 oninit -v
< informix 688160 803060 0 02:54:28 - 0:00 oninit -v
< informix 704580 803060 0 02:54:33 - 0:00 oninit -v
< informix 708738 803060 0 02:54:28 - 0:00 oninit -v
< informix 729210 803060 0 02:54:29 - 0:00 oninit -v
< informix 733304 803060 0 02:54:31 - 0:00 oninit -v
< informix 765966 803060 0 02:54:32 - 0:00 oninit -v
< informix 790632 803060 0 02:54:33 - 0:00 oninit -v
< informix 803060 643312 0 02:54:28 - 0:00 oninit -v
< informix 811086 803060 0 02:54:33 - 0:00 oninit -v
< informix 827566 803060 0 02:54:28 - 0:00 oninit -v
---
> informix 643318 1 0 04:20:49 - 0:03 oninit -v
> informix 680148 803066 0 04:20:55 - 0:00 oninit -v
> informix 684192 803066 0 04:20:51 - 0:00 oninit -v
> informix 688168 803066 0 04:20:51 - 0:00 oninit -v
> informix 704582 803066 0 04:20:50 - 0:00 oninit -v
> informix 708744 803066 0 04:20:52 - 0:00 oninit -v
> informix 729212 803066 0 04:20:54 - 0:00 oninit -v
> informix 733306 803066 0 04:20:56 - 0:00 oninit -v
> informix 765968 803066 0 04:20:56 - 0:00 oninit -v
> informix 790634 803066 0 04:20:50 - 0:00 oninit -v
> informix 803066 643318 0 04:20:50 - 0:00 oninit -v
> informix 827568 803066 0 04:20:53 - 0:00 oninit -v
> informix 831622 803066 0 04:20:56 - 0:00 oninit -v

I agree that the main oninit should have a PID of 1 ... don't know how
I missed it :( I think I will travel down this route for the moment.
We have very good AIX guys, but raising a call with AIX support would
certainly be the next step.

In answer to your questions;
1) The customer has a requirement to bounce, so we say ... how high
shall we bounce?
2) If I get time I will post dba_profile, but it's routine stuff. The
environments are not different between cron and ksh. Tested
thoroughly.

Thanks again. Seems like a big clue.

Gerry

da...@smooth1.co.uk

unread,
Oct 26, 2005, 6:32:19 PM10/26/05
to

When IDS starts up the first oninit process starts the others. When it
exists the oninit command returns
to the command line and the first VP loses it's parent process. init
notices this and the OS should
assign init (PID 1) as it's new parent. I suspect that the first oninit
process that is creating is not completing
hence it never exits. This means the next one will not get a parent PID
of 1 and the oninit command appears to hang.

The theads that exist in IDS include

fastrec (fast recovery)
bld_logrecs ( I assume collects log records for application)
log_redo (i assume applies the log records)

The "Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored."
means no active transactions
that I can see. It is in the Logical Recovery phase.

There make still be logical log records to be applied since some events
are always logged even if no transaction logging is on any
database.(e.g. allocating a new extent to a table, adding a chunk) so
IDS can keep it's space allocation information consistent. Also things
like changes to stuff stored on the reserved pages e.g.
checkpoint records, level 0 archive information, changing onconfig
parameters are logged.so Informix can keep
reserved pages consistent.
.

What does

onstat -g stk all

give?

You could try running onstat -l and the get the logical log with a C
against it (current).
Run onlog against it.

onlog -n <logical log id number, I think or just
onlog and dump the output to a file as that dumps everything.

See if this changes as IDS applies logical log records.

onstat -g stk all should be interesting.

0 new messages