We are looking for a tool to detect the loop for batch application,
any suggestion are appreciated.
My shop is runing z/OS, application is C/C++.
Bob.
Considering the speed of today's processors, all batch programs that run
for more than a few seconds must of necessity contain one or more logic
loops; so I assume the intended question is how do you determine if the
program is in a non-productive loop.
In general making this determination automatically by any tool without
knowing something about the expected or historic behaviour of the
program is an impossible task. And in the practical world it's not just
applications that will never terminate that are a problem, but also
those which cost more in resources than the end user can afford. You
want to catch a batch job step that is badly designed or poorly tuned
that consumes an order of magnitude more CPU than required, as this is
also a problem even if it may not be in an infinite loop.
If the general issue is one of jobs wasting resources (which with
sub-capacity licensing may cost real money), then the simplest first
step is to impose and enforce standards requiring reasonable CPU TIME
limits on jobs and job steps in JCL, and possibly also require OUTLIM to
restrict SYSOUT loops in testing (needless to say "NOLIMIT" for CPU time
should not be allowed for any batch job). Different default limits can
be set for testing vs production via JES2 definitions based on job
classes, and it is possible to use an IEFUTL exit to provide for
unanticipated application growth by allowing Operators the option of
granting CPU time extensions or cancelling production jobs that reach
the limit based on job class. JCL overrides for higher limits could be
allowed for specific job steps that have known higher requirements where
the cost is acceptable to the end user.
Rate of CPU consumption by itself is an unreliable indicator of
problems. A single-threaded program would be limited to 100% of one CP,
but even a solid, infinite CPU loop could show up as a much lower value
on a loaded system, and some very efficiently-designed,
computationally-intense programs might be able to approach 100% of a CP
on a lightly loaded system and still be doing productive work.
Some very simple tools, like SDSF DA display, that show both CPU and
EXCP resources used, are sometime sufficient to provide clues. If the
program is known to require periodic I/O to do anything useful and it is
consuming an unusually amount of CPU time with no EXCPs, that would be
strongly suggestive of a problem; or if the program is generating much
more SYSOUT than usual or repetitive SYSOUT lines, again a likely
problem.
If the program is using both CPU and EXCPs, but a lot more than expected
and no other obvious perverse behaviour, it is more difficult to make a
determination. Other tools, like Omegamon, that show EXCPs on specific
DDs in the job step may allow one to see if the program is continuing to
progress through sequential data and if the total number of blocks is
known may allow you to estimate if it will complete in an acceptable
time and at what total cost. Or EXCPs on a file way in excess of the
total number of blocks in the file may point to a poorly tuned or poorly
designed application.
In our experience, there is no substitute for having human intelligence
in the monitoring loop when resources get tight.
--
Joel C. Ewing, Fort Smith, AR jREMOVEc...@acm.org
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to list...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
I'm sure there might be other methods, but, this method works for me.
K. Kripke
kkr...@mindspring.com
"Joel C. Ewing" <jce...@ACM.ORG> wrote in message
news:<4B1A9636...@acm.org>...
I agree. We have been trying for decades to build a good general loop
detector for operations but found it impossible.
Kees.
**********************************************************************
For information, services and offers, please visit our web site:
http://www.klm.com. This e-mail and any attachment may contain
confidential and privileged material intended for the addressee
only. If you are not the addressee, you are notified that no part
of the e-mail or any attachment may be disclosed, copied or
distributed, and that any other action related to this e-mail or
attachment is strictly prohibited, and may be unlawful. If you have
received this e-mail by error, please notify the sender immediately
by return e-mail, and delete this message.
Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries
and/or its employees shall not be liable for the incorrect or
incomplete transmission of this e-mail or any attachments, nor
responsible for any delay in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal
Dutch Airlines) is registered in Amstelveen, The Netherlands, with
registered number 33014286
**********************************************************************
About the only automated solution I can think of would be to set a CPU time limit.
-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-...@bama.ua.edu] On Behalf Of Joel C. Ewing
Sent: Saturday, December 05, 2009 11:20 AM
To: IBM-...@bama.ua.edu
Subject: Re: Detect the loop for batch job
On 12/04/2009 10:14 AM, bjbxd wrote:
> Hello List,
> We are looking for a tool to detect the loop for batch application,
> any suggestion are appreciated.
>
> My shop is runing z/OS, application is C/C++.
> Bob.
NOTICE: This electronic mail message and any files transmitted with it are intended
exclusively for the individual or entity to which it is addressed. The message,
together with any attachment, may contain confidential and/or privileged information.
Any unauthorized review, use, printing, saving, copying, disclosure or distribution
is strictly prohibited. If you have received this message in error, please
immediately advise the sender by reply email and delete all copies.
Detecting an address space in an infinite loop is not an easy job. Our
modern z/OS LPARs often have multiple CPUs and specialty processors like
zIIP and zAAP where instructions can be dispatched. In addition Workload
Manager will try to distribute processor resources equitably based on the
current workload mix and priorities defined in the installation?s policy.
Lower priority batch workloads that happen to be looping can easily run
under the radar for long periods of time squandering resources.
OMEGAMON XE on z/OS 4.2.0 with the addition of Interim feature 1 has a
strategy to help surface these problems. OMEGAMON has had a feature
called Bottleneck Analysis for many years. Bottleneck Analysis builds a
profile over time through periodic sampling of what execution states are
being used by address spaces. These execution states include things like
Using CPU, Using zIIP, Using zAAP, Waiting for CPU, Waiting for zIIP,
Waiting for zAAP, Using I/O, Waiting for I/O, Waiting for Enqueue, Waiting
for HSM, Swapped, etc. A cpu looping address space will reveal itself by
populating only the using and waiting states for CPU resources (including
zIIP and zAAP).
OMEGAMON XE on z/OS 4.2.0 has a new attribute called CPU Loop Index that
uses this bottleneck information as its basis. High priority workloads
can be reliably detected fairly quickly. The real trick is discriminating
between well behaved low priority work that is just starved for attention
from low priority work that is looping. OMEGAMON XE on z/OS 4.2.0 Interim
Feature 1 provides this discrimination by dynamically extending the
observation period required before indicating a likely loop when the ratio
of waiting for CPU to Using CPU is high. For more information on OMEGAMON
XE on z/OS approach to CPU Loop detection please see the article titled
?Detecting CPU looping address spaces using IBM Tivoli OMEGAMON XE on z/OS
version 4.2.0? in the August issue of the z System Advisor at
http://www-01.ibm.com/software/tivoli/systemz-advisor/2009-08/.
Joe Winterton
Release Mgr - OMEGAMON - Development Team
919-224-1328 Cell -914-954-0483 - jos...@us.ibm.com
From:
Hal Merritt <HMer...@JACKHENRY.COM>
To:
IBM-...@bama.ua.edu
Date:
12/07/2009 09:48 AM
Subject:
Re: Detect the loop for batch job
Sent by:
IBM Mainframe Discussion List <IBM-...@bama.ua.edu>
Interesting development in Omegamon.
I am still new to Omegamon, but I am going to install V4.2 next month,
how do I check if I have Interim Feature 1, how do I install it?
Kees.
"Joseph H Winterton" <jos...@US.IBM.COM> wrote in message
news:<OF614AD680.D9513FFB-ON862576...@us.ibm.c
om>...
**********************************************************************
For information, services and offers, please visit our web site:
http://www.klm.com. This e-mail and any attachment may contain
confidential and privileged material intended for the addressee
only. If you are not the addressee, you are notified that no part
of the e-mail or any attachment may be disclosed, copied or
distributed, and that any other action related to this e-mail or
attachment is strictly prohibited, and may be unlawful. If you have
received this e-mail by error, please notify the sender immediately
by return e-mail, and delete this message.
Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries
and/or its employees shall not be liable for the incorrect or
incomplete transmission of this e-mail or any attachments, nor
responsible for any delay in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal
Dutch Airlines) is registered in Amstelveen, The Netherlands, with
registered number 33014286
**********************************************************************
----------------------------------------------------------------------
You will have to update ERBR3WFX CLIST, and may want to put some exceptions in order to bypass certain conditions that you don't want catch. Otherwise it will keep you a pager slave;
In ERBR3WFX Sub-routine: Wfex_handler_2,
If SUBSTR(wfxname,1,4) = 'ALL ' &,
SUBSTR(wfxreasn,1,4) = 'PROC' &,
INDEX(wfxpcaus,'looping') �= 0
Then Do
If (SUBSTR(wfxreasn,6,4) �= 'XXXX') &,
Then Do
msg= "RMFL00I" SUBSTR(wfxpcaus,1,35)
"SELECT PGM(ERBCSWTO) PARM("msg")"
John Kim
-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-...@bama.ua.edu] On Behalf Of Joel C. Ewing
Sent: Saturday, December 05, 2009 10:20 AM
To: IBM-...@bama.ua.edu
Subject: Re: Detect the loop for batch job
On 12/04/2009 10:14 AM, bjbxd wrote:
> Hello List,
> We are looking for a tool to detect the loop for batch application,
> any suggestion are appreciated.
>
> My shop is runing z/OS, application is C/C++.
> Bob.
The information transmitted is intended only for the addressee and may contain confidential, proprietary and/or privileged material. Any unauthorized review, distribution or other use of or the taking of any action in reliance upon this information is prohibited. If you receive this in error, please contact the sender and delete or destroy this message and any copies.
My friend told me there is a tool named TriTune which can detect BATCH
LOOP, Anyone have experience on this tools ?
BOB