Once - when we tried to bring a database online
after we had shut it down without problems as a part
of a weekly task- database hung out for about an hour.
Online.log looked like this:
08:19:13 On-Line Mode
08:19:13 Affinitied VP 1 to phys proc 4
08:27:11 VP Notify mechanism incomplete after 5
minutes. This can be due to slo
w network file access. Will try 12 more times
08:35:11 VP Notify mechanism incomplete after 5
minutes. This can be due to slo
w network file access. Will try 11 more times
08:43:09 VP Notify mechanism incomplete after 5
minutes. This can be due to slo
w network file access. Will try 10 more times
08:51:08 VP Notify mechanism incomplete after 5
minutes. This can be due to slo
w network file access. Will try 9 more times
After that, database crashed, and online.log showed:
09:54:56 notifyvp(): vp 3, pid 22717 of class 0
didn't rcv
09:54:56 notifyvp(): vp 4, pid 22718 of class 0
didn't rcv
09:54:56 notifyvp(): vp 5, pid 22719 of class 0
didn't rcv
09:54:56 notifyvp(): vp 6, pid 22720 of class 0
didn't rcv
09:54:56 notifyvp(): vp 7, pid 22721 of class 0
didn't rcv
09:54:56 notifyvp(): vp 8, pid 22722 of class 0
didn't rcv
09:54:56 notifyvp(): vp 9, pid 22723 of class 0
didn't rcv
09:54:56 Assert Failed: mt_notifyvp timed out
09:54:56 IBM Informix Dynamic Server Version 9.40.FC7
09:54:56 Who: Session(1, informix@orion, 0,
35930e028)
Thread(7, main_loop(), 3592cc028, 1)
File: mt.c Line: 11121
09:54:56 stack trace for pid 22637 written to
/respaldo2/informix/tmp/siisa/af.
3ef58cf
09:54:56 See Also:
/respaldo2/informix/tmp/siisa/af.3ef58cf,
shmem.3ef58cf.0
09:56:12 Error writing
'/respaldo2/informix/tmp/siisa/shmem.3ef58cf.0' errno
=
28
09:56:12 mt.c, line 11121, thread 7, proc id 22637,
mt_notifyvp timed out.
09:56:14 The Master Daemon Died
09:56:15 PANIC: Attempting to bring system down
We finally were able to bring informix online only by
restarting the entire server (a Sun server with
Solaris 9). Even so, I'd like to ask you which is the
reason why this kind of error happens. I'm talking
about an informix 9.40 FC7 database.
Thanks in advance
Omar Munoz
____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping
Hi.
Thanks in advance
Omar Munoz
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
============================================================
The information contained in this message may be privileged
and confidential and protected from disclosure. If the reader
of this message is not the intended recipient, or an employee
or agent responsible for delivering this message to the
intended recipient, you are hereby notified that any reproduction,
dissemination or distribution of this communication is strictly
prohibited. If you have received this communication in error,
please notify us immediately by replying to the message and
deleting it from your computer. Thank you. Tellabs
============================================================
No Norma, this does not have to be a network issue, you are wrong. All
it means if that the VP's timed out communicating.
Perhaps one of the threads was hung in an OS call or there is an
Informix bug? DO NOT MAKE ASSUMPTIONS.
There is a known vp notify issue in 9.40.UC7
Patch to 9.40.UC7W1X1 to fix. PS When will 9.40.xC8 be out?
Omar, there could be some other issue. Who knows, could be a hardware
issue an OS issue or an Informix issue.
You have DUMPSHMEM enabled but if you check under /usr/include/sys/
errno.h on your system you will find errno 28 is no more space.
There is either not enough space under "'/respaldo2/informix/tmp/siisa/
shmem" to dump shared memory or it is trying to write a
file >2Gb on a file system that is not largefile enabled.
What did onstat -g stk all give? Did you strace/truss the server
pids?
Where there any issues in the OS logs?
You need to do more investigation when the problem happens.
At the beginning I thought something similar to
Norma and I edited sqlhosts in order to use IP instead
of names in order to avoid solving then, but that
didn't work, and anyway everything is at the same
machine.
You're right about shared memory dump, David. My
dump partition didn't get full, but since shared
memory is pretty big (13 Gb for all) I don't think I
had large file support on it. I gonna talk about it
with the OS administrator. Anyway, assert failure
happened before that.
Sadly, I didn't perform any onstat monitor activity
at that time, but I guess some information might be at
af file. What should I look for?
I gonna do some work on the patch you sent me, even
when supposedly this situation takes effect on Solaris
8 and we have solaris 9, according with the link.
Thanks a lot to everyone
Omar Munoz
--- "da...@smooth1.co.uk" <da...@smooth1.co.uk> wrote:
> _______________________________________________
> Informix-list mailing list
> Inform...@iiug.org
> http://www.iiug.org/mailman/listinfo/informix-list
>
____________________________________________________________________________________
Never miss a thing. Make Yahoo your home page.
http://www.yahoo.com/r/hs
cheers
j.
Sane ego te vocavi. Forsitan capedictum tuum desit.
"impression was that it could not affinitize." ??
If the system calls to pin a CPU VP onto a cpu fail then I would
expect an error in the online.log.
"something was hanging onto the CPU that prevented IDS from grabbing
it". What? I've never heard such complete bollocks!
Nothing can "hang on" to a CPU. I expect either an OS bug or an
Informix bug (SPARC Solaris normally reports hardware issues
pretty reliably to the OS logs). Without more info we cannot tell
which.