Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Shared memory isn't cleaned up after onmode -ky

686 views
Skip to first unread message

RG

unread,
Mar 6, 1999, 3:00:00 AM3/6/99
to

AIX 4.2.1 ODS 7.24.uc5

while in online mode "ipcs -m | grep informix" shows:
m 12289 0x52574801 --rw-rw---- root informix
m 12290 0x52574802 --rw-rw---- root informix
m 12291 0x52574803 --rw-rw-rw- root informix

Bring Engine down with onmode -ky.

"ipcs -m | grep informix" still shows
m 12289 0x52574801 --rw-rw---- root informix
m 12290 0x52574802 --rw-rw---- root informix
m 12291 0x52574803 --rw-rw-rw- root informix

After running "oninit -y " i get :
"DS_TOTAL_MEMORY recalculated and changed from 10000kb to 128000kb oninit
fatal error in shared memory connection."

online message log shows:
***************************************************************************
*******************************
*** WARNING: INFORMIX-OnLine server is no longer running. ***


INFORMIX-OnLine Version 7.24.UC5 -- Initialization -- Up 8 days 23:37:08
-- 228224 Kbytes

Message Log File: /usr/informix/etc/satol.log
16:20:04 Logical Log 298 Complete.
16:20:04 Process exited with return code 127: /bin/sh /bin/sh -c
/u/informix/alarms.exe 2 23 "Logical Log 298 Complete." "Logical Log 298
Complete."
16:24:16 Checkpoint Completed: duration was 22 seconds.

Wed Mar 3 09:40:18 1999

09:40:18 Checkpoint Completed: duration was 3 seconds.

Thu Mar 4 11:30:51 1999

11:30:51 INFORMIX-OnLine Stopped.
11:31:29 shmget: [EEXIST][17]: key 52574801: shared memory already exists
11:31:29 mt_shm_init: can't create resident segment

11:31:44 shmget: [EEXIST][17]: key 52574801: shared memory already exists
11:31:44 mt_shm_init: can't create resident segment

11:32:01 shmget: [EEXIST][17]: key 52574801: shared memory already exists
11:32:01 mt_shm_init: can't create resident segment


*** WARNING: INFORMIX-OnLine server is no longer running. ***
***************************************************************************
*******************************
I now have to either reboot the machine or manually clean up shared memory
with "ipcrm -m ###"


Here's my config file:
***************************************************************************
*******************************
ROOTNAME root_dbs # Root dbspace name
ROOTPATH /saturn/rrsat_rdbs # Path for device containing root
dbspace
ROOTOFFSET 0 #512 # Offset of root dbspace
into device (Kbytes)
ROOTSIZE 1800000 # Size of root dbspace (Kbytes)
MIRROR 0 # Mirroring flag (Yes = 1, No = 0)
MIRRORPATH # Path for device containing mirrored root
MIRROROFFSET 0 # Offset into mirrored device (Kbytes)
PHYSDBS root_dbs # Location (dbspace) of physical log
PHYSFILE 24000 # Physical log file size (Kbytes)
LOGFILES 19 # Number of logical log files
LOGSIZE 5000 # Logical log size (Kbytes)
MSGPATH /usr/informix/etc/satol.log # System message log file path
CONSOLE /usr/informix/satol.msg # System console message path
ALARMPROGRAM /u/informix/alarms.exe # Alarm program path
TAPEDEV /dev/rmt1 # Tape device path
TAPEBLK 1024 # Tape block size (Kbytes)
TAPESIZE 4000000 # Maximum amount of data to put on tape
(Kbytes)
LTAPEDEV /dev/null # Log tape device path
LTAPEBLK 1024 # Log tape block size (Kbytes)
LTAPESIZE 4000000 # Max amount of data to put on log tape
(Kbytes)
STAGEBLOB # INFORMIX-OnLine/Optical staging area
SERVERNUM 1 # Unique id corresponding to a OnLine
instance
DBSERVERNAME on_saturn1 # Name of default database server
DBSERVERALIASES saturn # List of alternate dbservernames
DEADLOCK_TIMEOUT 60 # Max time to wait of lock in distributed
env.
RESIDENT 0 # Forced residency flag (Yes = 1, No = 0)
MULTIPROCESSOR 0 # 0 for single-processor, 1 for
multi-processor
NUMCPUVPS 1 # Number of user (cpu) vps
SINGLE_CPU_VP 1 # If non-zero, limit number of cpu vps to
one
NOAGE 1 # Process aging
AFF_SPROC 0 # Affinity start processor
AFF_NPROCS 0 # Affinity number of processors
LOCKS 50000 # Maximum number of locks
BUFFERS 32000 # Maximum number of shared buffers
NUMAIOVPS 2 # Number of IO vps
PHYSBUFF 500 # Physical log buffer size (Kbytes)
LOGBUFF 500 # Logical log buffer size (Kbytes)
LOGSMAX 20 # Maximum number of logical log files
CLEANERS 8 # Number of buffer cleaner processes
SHMBASE 0x30000000 #0x30000000 # Shared memory base
address
SHMVIRTSIZE 92000 # initial virtual shared memory segment
size
SHMADD 8192 # Size of new shared memory segments
(Kbytes)
SHMTOTAL 0 # Total shared memory (Kbytes).
0=>unlimited
CKPTINTVL 900 # Check point interval (in sec)
LRUS 8 # Number of LRU queues
LRU_MAX_DIRTY 2 # LRU percent dirty begin cleaning limit
LRU_MIN_DIRTY 1 # LRU percent dirty end cleaning limit
LTXHWM 40 # Long transaction high water mark
percentage
LTXEHWM 50 # Long transaction high water mark
(exclusive)
TXTIMEOUT 0x12c # Transaction timeout (in sec)
STACKSIZE 32 # Stack size (Kbytes)
OFF_RECVRY_THREADS 10 # Default number of offline worker
threads
ON_RECVRY_THREADS 1 # Default number of online worker threads
DRAUTO 0 # DR automatic switchover
DRINTERVAL 30 # DR max time between DR buffer flushes (in
sec)
DRTIMEOUT 60 # DR network timeout (in sec)
DRLOSTFOUND /saturn/ifmxlostfound # DR lost+found file path
RA_PAGES 32 # Number of pages to attempt to read ahead
RA_THRESHOLD 30 # Number of pages left before next group
DBSPACETEMP root_dbs # Default temp dbspaces
DUMPDIR /tmp # Preserve diagnostics in this directory
DUMPSHMEM 1 # Dump a copy of shared memory
DUMPGCORE 0 # Dump a core image using 'gcore'
DUMPCORE 0 # Dump a core image (Warning:this aborts
OnLine)
DUMPCNT 1 # Number of shared memory or gcore dumps
for
FILLFACTOR 90 # Fill factor for building indexes
USEOSTIME 0 # 0: use internal time(fast), 1: get time
from OS(slow)
MAX_PDQPRIORITY 100 # Maximum allowed pdqpriority
DS_MAX_QUERIES 1000 # Maximum number of decision support
queries
DS_TOTAL_MEMORY 10000 # Decision support memory (Kbytes)
DS_MAX_SCANS 1048576 # Maximum number of decision support scans

DATASKIP off # List of dbspaces to skip
OPTCOMPIND 0 # To hint the optimizer
ONDBSPACEDOWN 0 # Dbspace down option: 0 = CONTINUE, 1 =
ABORT, 2 = WAIT
LBU_PRESERVE 0 # Preserve last log for log backup
OPCACHEMAX 0 # Maximum optical cache size (Kbytes)
NETTYPE soctcp,1,30,NET # How clients connect, mead_wps
NETTYPE ipcshm,1,30,CPU # How clients connect, mead_wps
CDR_LOGBUFFERS 2048 # size of log reading buffer pool (Kbytes)
CDR_EVALTHREADS 1,1 # evaluator threads (per-cpu-vp,additional)
CDR_DSLOCKWAIT 5 # DS lockwait timeout (seconds)
CDR_QUEUEMEM 4096 # Maximum amount of memory for any CDR
queue (Kbytes)
BAR_ACT_LOG /tmp/bar_act.log
BAR_MAX_BACKUP 0
BAR_RETRY 1
BAR_NB_XPORT_COUNT 10
BAR_XFER_BUF_SIZE 31
HETERO_COMMIT 0
***************************************************************************
*******************************

and here's my sqlhosts:
***************************************************************************
*******************************
mead_wps onsoctcp rs6000d1 e20_serv
saturn onipcshm rs6000d1 dummy2
on_saturn1 onsoctcp rs6000d1 e20_serv

TIA,
Robert


--
Regards,
Robert

Gary Mitchell

unread,
Mar 6, 1999, 3:00:00 AM3/6/99
to
The first thing I'd do is look at the log file to see if there is some error
in the log.

I'd then run oncheck with various parms to see if the db is corrupted and
the engine
couldn't be shut down for that reason.
RG wrote in message <01be676c$3bb996e0$7b0501cf@rgriffin>...

Jeff Tyzzer

unread,
Mar 8, 1999, 3:00:00 AM3/8/99
to
Hi, RG:

Does adding "-F" to the onmode command help (i.e., "onmode -kyF")?

-- Jeff Tyzzer

Art S. Kagel

unread,
Mar 8, 1999, 3:00:00 AM3/8/99
to
Gary Mitchell wrote:
>
> The first thing I'd do is look at the log file to see if there is some error
> in the log.
>
> I'd then run oncheck with various parms to see if the db is corrupted and
> the engine
> couldn't be shut down for that reason.
> RG wrote in message <01be676c$3bb996e0$7b0501cf@rgriffin>...
> >
> >
> >
> >
> >AIX 4.2.1 ODS 7.24.uc5
> >
> >while in online mode "ipcs -m | grep informix" shows:
> >m 12289 0x52574801 --rw-rw---- root informix
> >m 12290 0x52574802 --rw-rw---- root informix
> >m 12291 0x52574803 --rw-rw-rw- root informix
> >
> >Bring Engine down with onmode -ky.
> >
> >"ipcs -m | grep informix" still shows
> >m 12289 0x52574801 --rw-rw---- root informix
> >m 12290 0x52574802 --rw-rw---- root informix
> >m 12291 0x52574803 --rw-rw-rw- root informix
> >
> >After running "oninit -y " i get :
> >"DS_TOTAL_MEMORY recalculated and changed from 10000kb to 128000kb oninit
> >fatal error in shared memory connection."

Ordinarily when I see the ipcs results it is meaningless only
indicating that some client program is still attached to the shared
memory segments but that the segments have indeed been destroyed. If
you try to use ipcrm to remove the segments by hand there is usually an
error indicating that the segment does not exist even though it is
listed by ipcs. However, when this happens, you would be able to
restart the engine with no trouble, so that is not the problem. Here
are the steps I go through when this, rarely, occurs:

1) Make sure the engine is truly down: ps -fe | fgrep oninit
2) Either way try onmode -ky again, it will usually remove shared
memory if it does not find a running engine.
3) If the engine was not down originally check again. If still up try
to force a checkpoint then use 'kill -TERM' and 'kill -PIPE' on the
master oninit (the one which is parent to most of the others). This
will force an emergency shutdown by the admin VP or the misc VP when it
sees the master VP go down.
4) Use ipcrm to destroy the shared memory segments, and for good
measure any semaphores owned by Informix also, manually.
5) Restart the engine.

Art S. Kagel

Tim Schaefer

unread,
Mar 8, 1999, 3:00:00 AM3/8/99
to RG
RG,

We've had this problem with 8.21.UC1, and it is a known problem on AIX when
using Informix. You need to contact IFMX and IBM tech support and ask them
what the PSSP patches are for your system to upgrade the OS to a point where
this problem is minimized. I don't have the PSSP patches off the top of my
head, but IFMX does know about this problem if you mention XPS. The tech
support person can then search against the more recent problems. Or probably
mentioning my name in the search will help. :-)

A program called "onclean" came supplied with 8.21.UC1 expressly for managing
this problem. It is essentially an ipcrm binary that requires the -k option,
and occasionally works until you get the right PSSP patch for your OS. Once
you apply the patch, the shared-memory problem goes away for the most part,
and no need for onclean. A really nice catch-22.

In the meantime, until you get patched up, you can use the following script
as ROOT to clean up shared memory segments. If you see a "D" in the ipcs
list, the program, probably an oninit virtual processor is still running.

Best of success to you with AIX!

:-)

Tim

#!/bin/sh
################################################################################
#
# Program: gen_ipcrm.sh - Generates ipcrm commands to remove informix shared
# memory segments. This simplifies the task of cleaning out the
# message queues and the semaphore segments by hand, by issuing the
# ipcrm commands in one fell swoop. If a shared memory message queue
# has a "D" at the beginning of the permissions you will need to kill
# the oninit virtual processor by hand by running a ps -ef and
# grepping for informix. Kill the oninit with a kill -9 and this
# should clear out the pesky message queue. You won't be able to
# ipcrm the message queues any other way even if you are root.
#
# Comment-out the system() call to run this program without doing the
# the actual ipcrm commands.
#
# Author: Tim Schaefer
# Created: Mon Nov 23 18:55:52 EST 1998
# Usage: gen_ipcrm.sh
# Notes: MUST BE RUN AS ROOT LOGIN
#
################################################################################
ipcs | egrep "^m|^s" | while read line
do
echo $line
done | grep informix | awk '
{
run_string="ipcrm -"$1" "$2
{ print run_string }
system ( run_string )

}
'

--
-
--
--- Tim Schaefer
---- tsch...@mindspring.com
--- http://www.inxutil.com
--
-

Stephen F. Cawley

unread,
Mar 23, 1999, 3:00:00 AM3/23/99
to
> "ipcs -m | grep informix" still shows
> m 12289 0x52574801 --rw-rw---- root informix
> m 12290 0x52574802 --rw-rw---- root informix
> m 12291 0x52574803 --rw-rw-rw- root informix

If you are _sure_ that the engine is down, then you can take the brute force
approach and remove the segments with ipcrm (ipcrm -m 12289). I think you have
to be root to do this, ipcrm will tell you if you do not have enough
horsepower.

I have seen this problem, usually when installing a new instance and the
network settings are not correct so the engine craters.

SC


Obnoxio The Clown

unread,
Mar 24, 1999, 3:00:00 AM3/24/99
to

>> "ipcs -m | grep informix" still shows
>> m 12289 0x52574801 --rw-rw---- root informix
>> m 12290 0x52574802 --rw-rw---- root informix
>> m 12291 0x52574803 --rw-rw-rw- root informix
>
>If you are _sure_ that the engine is down, then you can take the brute
force
>approach and remove the segments with ipcrm (ipcrm -m 12289). I think
you have
>to be root to do this, ipcrm will tell you if you do not have enough
>horsepower.

_That's_ not brute force -- brute force is a reboot! :-)
Get Your Private, Free Email at http://www.hotmail.com

PaulITID...@chase.com

unread,
Mar 24, 1999, 3:00:00 AM3/24/99
to


Alternatively, try onmode -F. This should clear up the unused sements

>> "ipcs -m | grep informix" still shows
>> m 12289 0x52574801 --rw-rw---- root informix
>> m 12290 0x52574802 --rw-rw---- root informix
>> m 12291 0x52574803 --rw-rw-rw- root informix
>
>If you are _sure_ that the engine is down, then you can take the brute
force
>approach and remove the segments with ipcrm (ipcrm -m 12289). I think you
have
>to be root to do this, ipcrm will tell you if you do not have enough
>horsepower.
>

0 new messages