apache taking 100% of CPU...

103 views
Skip to first unread message

Sam Habiel

unread,
Oct 5, 2011, 5:09:05 PM10/5/11
to enterprise-web-de...@googlegroups.com
This apache server only services EWD calls.

Happens occasionally, and I can't just put my finger on it. Seems to
happen when one of the machines connecting to the server has a network
problem.

I ran strace on one of the processes: it's in an infinite loop:

recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0
recvfrom(16, "", 8, 0, NULL, NULL) = 0

[root@production ~]# strace -c -p 25981
Process 25981 attached - interrupt to quit
^CProcess 25981 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.006223 0 125325 recvfrom
------ ----------- ----------- --------- --------- ----------------
100.00 0.006223 125325 total


Sam

DL Wicksell

unread,
Oct 6, 2011, 4:58:04 PM10/6/11
to Enterprise Web Developer Community
Hey Sam,

I would need a lot more information to be able to debug this issue.
However, could you send back the output of this command to me:

GTM> zp CHILDE^%ZMGWSIS:CHILDE+6

I assume you are using GT.M. I'd like to see what version of EWD
you are running, as far as what that subroutine contains. Thanks.

DL Wicksell

unread,
Oct 6, 2011, 5:06:37 PM10/6/11
to Enterprise Web Developer Community
Sam,

Could you also send back the output of this call too:

GTM>w $$version^%zewdAPI

Thank you.

Sam Habiel

unread,
Oct 6, 2011, 8:09:43 PM10/6/11
to enterprise-web-de...@googlegroups.com
Progress:

I compiled m_apache myself, and killed off mumps processes to
reproduce the issue, attached gdb and ran backtrace. The output is
enlightening.

#0 0x00007fe90b3b1648 in recv () from /lib64/libc.so.6
#1 0x00007fe90234ce6a in recv (lp_request=0x7fffc01a5630,
data=0x7fe90e2d1b70 "ORCEDHALT, Image HALTed by MUPIP STOP\nU",
size=3926088, mode=<value optimized out>) at /usr/include/bits/socket2.h:45
#2 mgwsi_db_receive_ex (lp_request=0x7fffc01a5630,
data=0x7fe90e2d1b70 "ORCEDHALT, Image HALTed by MUPIP STOP\nU",
size=3926088, mode=<value optimized out>) at m_apache.c:2939
#3 0x00007fe90234d189 in mgwsi_db_receive (lp_request=0x7fffc01a5630,
lp_trans_buffer=0x7fffc01a35c0, size=32768,
mode=<value optimized out>) at m_apache.c:2903
#4 0x00007fe902351515 in mgwsi_handler (r=0x7fe90e2b9528) at m_apache.c:1517
#5 0x00007fe90cdaa8c0 in ap_run_handler ()
#6 0x00007fe90cdae148 in ap_invoke_handler ()
#7 0x00007fe90cdb97a0 in ap_process_request ()
#8 0x00007fe90cdb6668 in ?? ()
#9 0x00007fe90cdb2398 in ap_run_process_connection ()
#10 0x00007fe90cdbe097 in ?? ()
#11 0x00007fe90cdbe3aa in ?? ()
#12 0x00007fe90cdbe6db in ap_mpm_run ()
#13 0x00007fe90cd96840 in main ()

Sam

> --
> You received this message because you are subscribed to the Google Groups "Enterprise Web Developer Community" group.
> To post to this group, send an email to enterprise-web-de...@googlegroups.com.
> To unsubscribe from this group, send email to enterprise-web-develope...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/enterprise-web-developer-community?hl=en-GB.
>
>

Sam Habiel

unread,
Oct 6, 2011, 8:28:24 PM10/6/11
to enterprise-web-de...@googlegroups.com
David,

There are no M processes running. So I think I would skip these steps for now.

On Thu, Oct 6, 2011 at 4:06 PM, DL Wicksell <dlwic...@gmail.com> wrote:

DL Wicksell

unread,
Oct 6, 2011, 9:10:16 PM10/6/11
to Enterprise Web Developer Community
Sam,

Though you may have killed all your M processes, you were the one who
said
that this problem you are having is on a web server which only serves
EWD.

And EWD runs M processes. And in fact, if you are on a version of EWD
that is
older than a week or two ago, there is a bug in the socket code, on
the M side,
which can occur when a client shuts down the connection. The server
code can hang,
and will try to keep reading data from a socket which no longer is
sending any
data, thus creating an infinite loop.

Since that sounded like exactly what your problem was, and since I
knew about the
fix, I thought I'd be nice and help you out. But I didn't want to
confuse you if that was
not your issue, so I asked you to send me some very simple output, so
that I would
know if this very easy to fix bug was present on your system.

Have a good day.


On Oct 6, 6:28 pm, Sam Habiel <sam.hab...@gmail.com> wrote:
> David,
>
> There are no M processes running. So I think I would skip these steps for now.
>
> > For more options, visit this group athttp://groups.google.com/group/enterprise-web-developer-community?hl=....

Sam Habiel

unread,
Oct 6, 2011, 9:41:47 PM10/6/11
to enterprise-web-de...@googlegroups.com
I might indeed have that problem. It would be nice not just to find
out from you on the off chance that I happened to ask about it.

Anyways:

Here's what I did to m_apache.c in mgwsi_db_receive_ex:

n = MGWSI_NET_RECV(lp_request->lp_mgwsicon->sockfd, data +
len, size - len, 0);
// smh - Fix a bug in m_apache
// http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=%2Fapis%2Frecv.htm
if (n = 0) { // connection is broken
// errno has the error code; but not useful to check.
len = n;
break
}
//smh - end

I don't have time to recompile and try again today.

Sam

> For more options, visit this group at http://groups.google.com/group/enterprise-web-developer-community?hl=en-GB.
>
>

Sam Habiel

unread,
Oct 6, 2011, 9:43:41 PM10/6/11
to enterprise-web-de...@googlegroups.com
FYI: The bug is that fact that m_apache does not check for zero in n.
Zero according to the document from IBM means that something went
wrong with the socket. So that's what I do with my code.

Sam

Sam Habiel

unread,
Oct 7, 2011, 9:19:21 AM10/7/11
to enterprise-web-de...@googlegroups.com
Nope. My fix doesn't work. recv hangs forever now.

LD 'Gus' Landis

unread,
Oct 7, 2011, 1:03:36 PM10/7/11
to enterprise-web-de...@googlegroups.com
Sam,

Forget changing apache.... that is not the issue David is talking about.

Specifically,  you need to change some code in EWD's _MSGWSIS.m.
See below.

If your are using any EWD older than 884, you have this issue.  I don't
have 883, but the change was introduced between 882 and 884.

Cheers,
  --ldl

ldl@boGus:~/Downloads$ diff 862/m_apache/_ZMGWSIS.m 885/m_apache/_ZMGWSIS.m 
289a290
>  i $ze["%GTM-E-IOEOF" g HALT
493c494
<  Q $$CRYPT("127.0.0.1",$s(context:80,1:7040),"HMAC-SHA256",string,key,b64,context)
---
>  Q $$CRYPT("127.0.0.1",$s(context=1:80,context>1:context,1:7040),"HMAC-SHA256",string,key,b64,context)
508c509,513
<  Q $$CRYPT("127.0.0.1",$s(context:80,1:7040),"SHA1",string,"",b64,context)
---
>  n port
>  s port=80
>  i context=0 s port=7040
>  i context>1 s port=context,context=1
>  Q $$CRYPT("127.0.0.1",port,"SHA1",string,"",b64,context)
ldl@boGus:~/Downloads$ 
--
---
NOTE: If it is important CALL ME - I may miss email,
which I do NOT normally check on weekends nor on
a regular basis during any other day.
---
LD Landis - N0YRQ - de la tierra del encanto
3960 Schooner Loop, Las Cruces, NM 88012
651-340-4007  N32 21'48.28" W106 46'5.80"

Sam Habiel

unread,
Oct 7, 2011, 1:49:21 PM10/7/11
to enterprise-web-de...@googlegroups.com
Gus,

I discovered that bug and Rob told me about the fix on hardhats; and
the fix is applied on the server. Thank you for telling me about it.

The good news is: I fixed the m_apache gateway!!!!! Now it won't go
into an infinite loop when the M process dies.

I showed the above code to Rick Trotter and he told me: you don't have
==. Duh! I just fixed that and now it works properly.

Sam

Sam Habiel

unread,
Oct 9, 2011, 11:41:06 PM10/9/11
to enterprise-web-de...@googlegroups.com
The code has been on our production server for a couple of days, and
it seems to work okay. I will try to upload my changes to the git site
once I figure out how to do that.

Sam

Sam Habiel

unread,
Oct 13, 2011, 11:39:00 AM10/13/11
to enterprise-web-de...@googlegroups.com
I have a fork and pull request on github.

Sam

Sam Habiel

unread,
Oct 13, 2011, 11:41:13 AM10/13/11
to enterprise-web-de...@googlegroups.com
FYI: The code has been on production for a week now. *seems* to work fine.

Sam

Jai Singh VistA

unread,
Apr 18, 2022, 3:15:49 AM4/18/22
to Enterprise Web Developer Community
Hi Sam,

Can you please help me if you can recall the fixes which you did for resolving the infinite loop issue?.

Sam Habiel

unread,
Apr 18, 2022, 7:55:53 AM4/18/22
to enterprise-web-de...@googlegroups.com
There is an open pull request from 11 years ago. It has the fix.


Wonder why you are asking for my help though. I could have been long gone… it has been 11 years!

—Sam

To unsubscribe from this group and stop receiving emails from it, send an email to enterprise-web-develope...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/enterprise-web-developer-community/95ac8237-f52c-49dd-9c85-3cba1f163d94n%40googlegroups.com.

Jai Singh VistA

unread,
Apr 18, 2022, 9:47:28 AM4/18/22
to Enterprise Web Developer Community
Thanks, Sam

We have compiled the m_apache22.so and m_apache24.so file using your updated code now the application is working fine.
Hope the issue would be gone off.
Reply all
Reply to author
Forward
0 new messages