On 06/07/2018 02:26 AM, Peter Hunkeler wrote:
> There are some statements around zIIP utilization which I read here and there. Statements like:
>
> - "You should not utilize one zIIP more than 30%, two zIIPs more than 60%..."
> - "A task may become delayed for up to 3.2 ms (actually ZIIPAWMT) before the busy zIIP asks for help from a CP".
>
>
>
> For this discussion, lets assume equal speed CPs and zIIPs, and a reasonable CP to zIIP ratio, and more than on processor of each kind.
>
>
> It has been a long time strength of IBM Z (and all the predecessors) that the CPs in an LPAR can be utilized way above 90% without major problems arising. I seem to understand that this has changed lately, but still some 85% (?) should be fine.
>
>
> Now, all work running on zIIPs was once work running on CPs (and still is if there are no zIIPs). So the work is no different (apart from much being run under an SRB instead of a TCB), and the response time requirement is no different. Right?
>
>
> If so, how comes that busy zIIPs are said to be more of a problem than busy CPs? If the work can accept some queueing when run on CPs, why not when run on zIIPs. Queueing theory should apply equally to both.
>
>
>
> When a processor is busy 50%, then 50% of the time there is at least one ready task, the one executing. Maybe there are some more waiting on the work queue. But these 50% say nothing about the delay of the tasks on the work queue.
>
>
> In a simplified case, assume 5 tasks with equal priority, each one quickly, say after 0.5 ms, coming to the point where it has to give up the processor for a very short period of time before being requeued on the work queue. They all constantly work that way for 30 seconds in row, then become undispatchable for the remaining 30 seconds of that 50% busy minute. During the first 30 seconds, the zIIP is 100% busy, and after 3.2ms (ZIIPAWMT), the zIIP will ask a CP for help.
>
>
> None of the tasks has been delayed by 3.2ms, although the ZIIP recognized its work queue has not become empty for 3.2ms and asked for help. To the contrary, the work has gotten better service because two processors are now serving the single work queue. (Again for simplicity, not currently taking priorities into account).
>
>
> Same case but the task are working 1ms each time. Now it always takes more than 3.2ms for the last task on the work queue before it is being redispatched as long as the zIIP has not asked for help. But the zIIP will ask for help after 3.2ms, and the delay for the tasks will shrink.
>
>
> Isn't this a better situation for zIIP work than for non-zIIP work? Same scenario on CPs. There is no-one to help.
>
>
> Any thoughts?
>
>
I suspect the point of the guidelines was primarily to insure that work
intended for zIIPs didn't get redirected to general CPs and raise z/OS
software charges. But, if those guidelines have been observed, zIIP
workloads would have been experiencing minimal CPU queueing delays and
any increase in delay from higher zIIP utilization and offloading part
of the workload to CPs may be perceived by users as a significant
performance hit.
How prevalent are installations today where the CPs run at top speed, in
other words at the same speed as zIIP engines? In other words, Is it
that valid to assume equal speed processors? Clearly guidelines for
lower zIIP utilization matter more when there is a difference, as
offloading any zIIP work to a slower CP would elongate processing and
response time, even if there is no delay waiting for a processor.
Another possible issue is that application work that is zIIP-eligible at
least historically tended to be things like java that were more
CPU-intensive. These applications quite possibly are in a service class
of lesser importance. As long as they can run on a zIIP engine, they
only compete with similar workloads. But, if the zIIP utilization
reaches the point that work is offloaded to a CP, it would seem logical
to me to expect other "more important" workloads competing for the CP to
get served first, and if that is indeed the case the queueing time to
actually get service for lower importance zIIP workload from an equally
busy CP could be much greater. It doesn't seem valid to me to assume
that just because another CP is allowed to to zIIP work that a CP will
immediately be free to actually do that work, or that a CP will have
queueing delays similar to the zIIP engines -- the workload on the CP is
totally different.
Joel C. Ewing
--
Joel C. Ewing, Bentonville, AR
jce...@acm.org