Google 網路論壇不再支援新的 Usenet 貼文或訂閱項目,但過往內容仍可供查看。

Detecting a looping partition

瀏覽次數:42 次
跳到第一則未讀訊息

tony.p...@xchanging.com

未讀,
2017年7月28日 凌晨4:15:432017/7/28
收件者:
Hi,

I am looking for something I can run to try and detect if a partition is in a loop.

We don’t have any z/VSE batch monitoring software, and unfortunately there is no budget to purchase any.

I have z/VM performance Tool kit so can detect a high CPU usage over a period of time for a z/VSE guest, I was then looking for something that I could submit to check the CPU consumption for each of the partitions, and warn the Operators of a job potentially looping.

Does anybody have anything they would be willing to share with me?


Thanks

Tony

Billy Bingham

未讀,
2017年7月28日 清晨6:58:072017/7/28
收件者:
The IUI has a System Status display. Fastpath 36 gives you the menu of
options and selecting a 1 will display system activity. Of course if the
partition that the IUI is running in is below the one looping you might
me out of luck.


Billy


On 7/28/2017 3:15 AM, tony.p...@xchanging.com wrote:
> Hi,
> =20
> I am looking for something I can run to try and detect if a partition is in=
> a loop.
> =20
> We don=E2=80=99t have any z/VSE batch monitoring software, and unfortunatel=
> y there is no budget to purchase any.
> =20
> I have z/VM performance Tool kit so can detect a high CPU usage over a peri=
> od of time for a z/VSE guest, I was then looking for something that I could=
> submit to check the CPU consumption for each of the partitions, and warn t=
> he Operators of a job potentially looping.
> =20
> Does anybody have anything they would be willing to share with me?
>
> =20
> Thanks=20
> =20
> Tony =20
>
> _______________________________________________
> VSE-L mailing list
> VS...@lists.lehigh.edu
> https://lists.lehigh.edu/mailman/listinfo/vse-l
>
>


_______________________________________________
VSE-L mailing list
VS...@lists.lehigh.edu
https://lists.lehigh.edu/mailman/listinfo/vse-l

tony.p...@xchanging.com

未讀,
2017年7月28日 清晨7:02:462017/7/28
收件者:
Hi,

These are batch only systems, the other issue is our Operations have been outsourced, and the people now looking after it have no z/VSE experience, hence I need a way to identify a potential loop and then generate an alert to on-call support.

Thanks

Tony

Mick Poil

未讀,
2017年7月28日 上午8:34:552017/7/28
收件者:
Tony,

I have a utility called PTNMON that gives data like this:

END-TIME,SAMPLES,VSECPU%,LPARCPU%,CPCCPU%,SVCK/SEC,I1DISP%,I1NP%,I1CPU%,S1DISP%,S1NP%,S1CPU%,S2DISP%,S2NP%,S2CPU%,S3DISP%,S3NP%,S3CPU%,S4DISP%,S4NP%,S4CPU%,S5DISP%,S5NP%,S5CPU%
2017/05/01-00:02:00,   3635,00008.92,00008.24,00018.61,000019.8,000.66,000.00,00000.62,000.06,000.00,00000.01,000.28,000.11,00000.42,000.00,000.00,00000.01,000.03,000.00,00000.02,000.25,000.17,00000.37
2017/05/01-00:03:00,   5967,00008.99,00007.94,00031.16,000016.1,000.72,000.05,00000.95,000.07,000.00,00000.01,000.79,000.07,00000.63,000.10,000.00,00000.05,000.05,000.02,00000.02,000.74,000.30,00000.55
Sample JCL:
// JOB PTNMON
// OPTION NOSYSDUMP
// LIBDEF PHASE,SEARCH=PRD2.CONFIG
// SETPFIX LIMIT=28K    The minimum requirement
/*
/* ************************** IMPORTANT NOTES ************************
/* PTNMON MUST run at a very high VSE PRTY or the output will not
/* be accurate - typically just below POWER. But a busy POWER can
/* affect the results, and hence you may need to run PTNMON in a
/* Dynamic Partition at a higher PRTY than POWER.
/*
/* For PTNMON to return valid CPC cpu utilisation for Native VSE,
/* the LPAR image must have Global Performance Data Control enabled
/* or the CPC cpu utilisation will be the same as the LPAR cpu
/* utilisation.
/*
/* ************************** IMPORTANT NOTE *************************
/*
/* PTNMON is designed to show cpu utilisation and related values
/* for a series of partitions, plus total cpu utilisation, at a user
/* specified interval in units of seconds. This version also shows
/* the number of SVCs executed in the interval in units of K.
/*
/* PTNMON can be used instead of VSE's CPUMON unless you need to use
/* CPUMON to create XML format data for an IBM Capacity Planning tool. 
/*
/* PTNMON output goes to SYSLST and is in CSV format.
/*
/* The use of interval reporting means that it should only be used for
/* long-running jobs such as CICS and not batch jobs. Every time a new
/* job step begins, PTNMON needs to zeroise the partition counters and
/* begin monitoring again from scratch. This may result in no cpu or
/* inaccurate cpu activity being reported in an interval. The longer
/* the interval, the more likely that new job steps will affect the
/* results.
/*
/* The specified partitions do not need to be active when PTNMON
/* starts. Monitoring starts automatically when a partition becomes
/* active and stops when it stops.
/*
/* Optionally, PTNMON can issue a console message whenever a
/* partition's cpu utilisation reaches a user-specified percentage
/* in an interval.
/*
/* PTNMON could be used instead of VSE CPUMON if cpu utilisation and
/* an SVC count are all that you need and you are not collecting data
/* for IBM's Capacity Planning tool.
/*
/* PTNMON uses a very small amount of cpu time, and only needs a very
/* small partition allocation as the size is the same as the
/* required SETPFIX value.
/*
/* Use MSG xx to terminate PTNMON. No other operands are required.
/*
/* *******************************************************************
/* The SYSLST output contains these columns for each interval:
/*
/* END-TIME      Date and time at the end of the reported interval.
/* SAMPLES       Number of samples made in the interval.
/*               The first interval normally has less than the others
/*               in order to achieve the interval boundary alignment.
/* VSECPU%       Total VSE CPU% as would be shown by CPUMON.
/* LPARCPU% or,  For Native VSE this is based on LPAR LCP utilisation
/* VMCPU%        data, and includes the LPAR management overhead.
/*               Under VM, it is based on the Total Virtual Machine
/*               cpu time (TTIME) which includes the CP overhead.
/* CPCCPU%       For native z/VSE, this is the total CPC cpu
/*               utilisation if Global Performance Data Control is
/*               enabled in the LPAR image. Otherwise it will be the
/*               same as LPARCPU%.
/*               Under z/VM, it is the total CPC cpu utilisation,
/*               including IFLs and ICFs if they are used in the
/*               LPAR that runs z/VM.
/* SVCK/SEC      Units of 1,000 SVCs per second.
/* xxDISP%       The % of samples where any task in partition xx was
/*               ready-to-run. Or the % where only the CICS QR task
/*               was ready-to-run when UPSI x1x is used.
/* xxNP%         The % of ready-to-run samples where Non-Parallel
/*               code was active.
/* xxCPU%        As seen in the IUI Display System Actvity.
/*
/* It is normal to see a small % variation in the number of samples
/* per interval, but anything much more than that suggests that
/* VSE is cpu-constrained. Remember that PTNMON could be affected by
/* higher PRTY VSE partition activity though.
/*
/* Using SYSDEF TD,RESETCNT or changing the number of active VSE cpus
/* during a reporting interval will affect VSECPU%. PTNMON may be able
/* to detect it and report a value of zero, but that is not
/* guaranteed in all circumstances and the reported value will then
/* be inaccurate.
/*
/* Under ideal conditions (at least for CICS), xxDISP% will track
/* xxCPU% reasonably closely. If it does not, it suggests that the
/* partition is cpu-constrained during the interval.
/*
/* Cpu utilisation should be accurate to about +/- 0.02%, assuming
/* that the data given to PTNMON is accurate.
/*
/* xxDISP% and xxNP% are based on sampling, and hence the accuracy
/* is dependent to some degree on the number of samples in an
/* interval.
/*
/* *******************************************************************
// UPSI 00000
/*
/* UPSI 0xxxx  Standard partition cpu utilisation console messages.
/* UPSI 1xxxx  Highlighted partition cpu utilisation console messages.
/* UPSI x0xxx  Normal dispatchable sampling.
/* UPSI x1xxx  Only sample CICS QR task dispatchability. This option
/*             is recommended when MRO is active.
/* UPSI x11xx  Sample QR and the DFHIRPST subtask dispatchability.
/*             Use when requested by IBM.
/* UPSI xxx0x  Capture LPARCPU% or VMCPU%. Requires SETPFIX.
/* UPSI xxx1x  LPARCPU% or VMCPU% data is not captured and zero is
/*             shown in the report.
/* UPSI xxx00  CPCCPU% data is captured if it is available.
/* UPSI xxx01  CPCCPU% data is not captured.
/*
/* *******************************************************************
/* PARM='rrrr,ss,tt,xx,xx,...'
/*
/* rrrr Is the 4-digit reporting interval 0001 to 3600 seconds.
/*      A value greater than 3600 is rounded down to 3600.
/*      PTNMON reports at rrrr second boundaries. For example, using
/*      0030 would report at hh:mm:00 and hh:mm:30 consistently.
/*      Use a short interval when you have performance problems, but
/*      an interval such as 0900 (15 minutes) if you are more 
/*      interested in looking at cpu capacity. 
/* ss   Is the sampling interval 01 to 99 1/300 second units.
/*      03 is normally the smallest value that is recommended, and
/*      gives 100 samples per second. Aim for 1,000+ samples per
/*      interval where possible.
/* tt   Is the partition CPU% threshold 01 to 99, or 00 to disable it.
/*      For example, a value of 50 will cause a message for 50% or
/*      higher utilisation in an interval for any monitored partition.
/* xx   Up to 15 partition ids to monitor. Partition ids that are not
/*      valid will always show zero values in the report.
/*
// EXEC PTNMON,SIZE=PTNMON,PARM='rrrr,ss,tt,xx,xx,..'
/*
// EXEC LISTLOG
/&

Mick Poil

未讀,
2017年7月28日 上午9:02:312017/7/28
收件者:
Sorry, it may not be obvious that PTNMON tracks cpu for the selected partition ids and will send a (highlighted) console message if it exceeds a certain %. It may not be the best fit for you. While I can't send you what is IBM Confidential source code, it might be a candidate for a bit of rework.

mike

Stuart, David

未讀,
2017年7月28日 上午10:51:242017/7/28
收件者:
Tony,

If you have any flavor of FAQS/ASO, they have a System Activity display that will show CPU activity at the partition level.


Dave


Dave Stuart
Principal Info. Systems Support Analyst
County of Ventura
805-662-6731
David....@ventura.org


-----Original Message-----
From: VSE-L [mailto:vse-l-bounces+david.stuart=ventu...@lists.lehigh.edu] On Behalf Of tony.p...@xchanging.com
Sent: Friday, July 28, 2017 1:16 AM
To: vs...@lehigh.edu
Subject: Detecting a looping partition

Hi,
=20
I am looking for something I can run to try and detect if a partition is in= a loop.
=20
We don=E2=80=99t have any z/VSE batch monitoring software, and unfortunatel= y there is no budget to purchase any.
=20
I have z/VM performance Tool kit so can detect a high CPU usage over a peri= od of time for a z/VSE guest, I was then looking for something that I could= submit to check the CPU consumption for each of the partitions, and warn t= he Operators of a job potentially looping.
=20
Does anybody have anything they would be willing to share with me?

K and M

未讀,
2017年7月28日 上午11:35:482017/7/28
收件者:
Plus sample IMODs that demonstrate a function that can be used to find a loop.
If I remember correctly there is a more recent z/VSE command that could also help
with this.

Ken
 

Sent: Friday, July 28, 2017 at 9:51 AM
From: "Stuart, David" <David....@ventura.org>
To: "VSE Discussion List" <vs...@lists.lehigh.edu>
Subject: RE: Detecting a looping partition
Tony,

If you have any flavor of FAQS/ASO, they have a System Activity display that will show CPU activity at the partition level.

snip..

tony.p...@xchanging.com

未讀,
2017年7月31日 上午9:47:112017/7/31
收件者:
Hi Mike,

Thanks for the offer of PTNMON but unfortunately I don't think this will work for us, we already have some jobs that are very heavy on CPU and are not looping.

My loop detection would be based on no (or almost no I/O) activity and significant CPU usage over say a 1 minute period.

Not fool proof I know, but I just want to raise email alerts and keep these to a minimum.

Tony

Mick Poil

未讀,
2017年8月1日 上午11:09:412017/8/1
收件者:
Tony,

The stuff the the IUI System Activity Display shows can be hooked into, but I haven't tried using it, so it is a DIY job.

Mike
0 則新訊息