Happens occasionally, and I can't just put my finger on it. Seems to
happen when one of the machines connecting to the server has a network
problem.
I ran strace on one of the processes: it's in an infinite loop:
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
[root@production ~]# strace -c -p 25981
Process 25981 attached - interrupt to quit
^CProcess 25981 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.006223 0 125325 recvfrom
------ ----------- ----------- --------- --------- ----------------
100.00 0.006223 125325 total
Sam
I compiled m_apache myself, and killed off mumps processes to
reproduce the issue, attached gdb and ran backtrace. The output is
enlightening.
#0 0x00007fe90b3b1648 in recv () from /lib64/libc.so.6
#1 0x00007fe90234ce6a in recv (lp_request=0x7fffc01a5630,
data=0x7fe90e2d1b70 "ORCEDHALT, Image HALTed by MUPIP STOP\nU",
size=3926088, mode=<value optimized out>) at /usr/include/bits/socket2.h:45
#2 mgwsi_db_receive_ex (lp_request=0x7fffc01a5630,
data=0x7fe90e2d1b70 "ORCEDHALT, Image HALTed by MUPIP STOP\nU",
size=3926088, mode=<value optimized out>) at m_apache.c:2939
#3 0x00007fe90234d189 in mgwsi_db_receive (lp_request=0x7fffc01a5630,
lp_trans_buffer=0x7fffc01a35c0, size=32768,
mode=<value optimized out>) at m_apache.c:2903
#4 0x00007fe902351515 in mgwsi_handler (r=0x7fe90e2b9528) at m_apache.c:1517
#5 0x00007fe90cdaa8c0 in ap_run_handler ()
#6 0x00007fe90cdae148 in ap_invoke_handler ()
#7 0x00007fe90cdb97a0 in ap_process_request ()
#8 0x00007fe90cdb6668 in ?? ()
#9 0x00007fe90cdb2398 in ap_run_process_connection ()
#10 0x00007fe90cdbe097 in ?? ()
#11 0x00007fe90cdbe3aa in ?? ()
#12 0x00007fe90cdbe6db in ap_mpm_run ()
#13 0x00007fe90cd96840 in main ()
Sam
> --
> You received this message because you are subscribed to the Google Groups "Enterprise Web Developer Community" group.
> To post to this group, send an email to enterprise-web-de...@googlegroups.com.
> To unsubscribe from this group, send email to enterprise-web-develope...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/enterprise-web-developer-community?hl=en-GB.
>
>
There are no M processes running. So I think I would skip these steps for now.
On Thu, Oct 6, 2011 at 4:06 PM, DL Wicksell <dlwic...@gmail.com> wrote:
Anyways:
Here's what I did to m_apache.c in mgwsi_db_receive_ex:
n = MGWSI_NET_RECV(lp_request->lp_mgwsicon->sockfd, data +
len, size - len, 0);
// smh - Fix a bug in m_apache
// http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=%2Fapis%2Frecv.htm
if (n = 0) { // connection is broken
// errno has the error code; but not useful to check.
len = n;
break
}
//smh - end
I don't have time to recompile and try again today.
Sam
> For more options, visit this group at http://groups.google.com/group/enterprise-web-developer-community?hl=en-GB.
>
>
Sam
I discovered that bug and Rob told me about the fix on hardhats; and
the fix is applied on the server. Thank you for telling me about it.
The good news is: I fixed the m_apache gateway!!!!! Now it won't go
into an infinite loop when the M process dies.
I showed the above code to Rick Trotter and he told me: you don't have
==. Duh! I just fixed that and now it works properly.
Sam
Sam
Sam
Sam
To unsubscribe from this group and stop receiving emails from it, send an email to enterprise-web-develope...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/enterprise-web-developer-community/95ac8237-f52c-49dd-9c85-3cba1f163d94n%40googlegroups.com.