[PATCH] lparstat: Fix Idle and busy PURR/SPURR

100 views
Skip to first unread message

Saket Kumar Bhaskar

<skb99@linux.ibm.com>
unread,
May 21, 2024, 12:13:06 PM5/21/24
to powerpc-utils-devel@googlegroups.com, naveen.n.rao@linux.ibm.com, srikar@linux.ibm.com, tyreld@linux.ibm.com, Saket Kumar Bhaskar
Idle and busy PURR/SPURR values are not adding upto 100% in case of
dedicated-donate and shared partitions, with the present formula.
Because of this, users might get a false impression of resource
utilisation. This is expected because a core can be in either
idle or busy state out of total of 100(core's shared resource can
either be consumed or be left idle). When lpar is in dedicated-donate
or shared,the purr values are not being counted when the CPU is ceded.
The idle_purr is calculated by taking snapshots of purr values at
every idle entry and idle exit. So, when a CPU is ceded, the calculation
for idle_purr will be wrong as purr is not being counted.

Before Change:
|-----------------------------------------------------------------|
| Dedicated-donate (8 cores) : |
|----------------------|---------------------|--------------------|
| | Actual | Normalized |
| Stress-ng threads |---------------------|--------------------|
| | %busy | %idle | %busy | %idle |
|----------------------|----------|----------|---------|----------|
| 0 threads | 0.02 | 0.11 | 0.02 | 0.12 |
|----------------------|----------|----------|---------|----------|
| 8 threads | 30.01 | 32.53 | 32.41 | 35.14 |
|----------------------|----------|----------|---------|----------|
| 16 threads | 50.30 | 24.71 | 54.33 | 26.68 |
|----------------------|----------|----------|---------|----------|
| 24 threads | 58.35 | 16.66 | 63.02 | 17.99 |
|----------------------|----------|----------|---------|----------|
| 32 threads | 65.57 | 9.44 | 70.82 | 10.19 |
|----------------------|----------|----------|---------|----------|
| 40 threads | 80.96 | 19.03 | 87.44 | 20.56 |
|----------------------|----------|----------|---------|----------|
| 48 threads | 90.57 | 9.44 | 97.82 | 10.19 |
|----------------------|----------|----------|---------|----------|
| 56 threads | 96.09 | 3.91 | 103.78 | 4.22 |
|----------------------|----------|----------|---------|----------|
| 64 threads | 100.00 | 0.00 | 103.78 | 4.22 |
|----------------------|----------|----------|---------|----------|

|-----------------------------------------------------------------|
| Shared Capped (8 VP / 4 EC) : |
|----------------------|---------------------|--------------------|
| | Actual | Normalized |
| Stress-ng threads |---------------------|--------------------|
| | %busy | %idle | %busy | %idle |
|----------------------|----------|----------|---------|----------|
| 0 threads | 0.04 | 0.18 | 0.03 | 0.19 |
|----------------------|----------|----------|---------|----------|
| 8 threads | 29.94 | 20.05 | 32.33 | 21.65 |
|----------------------|----------|----------|---------|----------|
| 16 threads | 30.26 | 19.65 | 32.36 | 21.04 |
|----------------------|----------|----------|---------|----------|
| 24 threads | 34.77 | 15.11 | 37.55 | 16.31 |
|----------------------|----------|----------|---------|----------|
| 32 threads | 39.31 | 10.57 | 42.03 | 11.33 |
|----------------------|----------|----------|---------|----------|
| 40 threads | 42.83 | 7.04 | 46.25 | 7.60 |
|----------------------|----------|----------|---------|----------|
| 48 threads | 45.56 | 4.36 | 48.72 | 4.69 |
|----------------------|----------|----------|---------|----------|
| 56 threads | 47.73 | 2.12 | 51.07 | 2.27 |
|----------------------|----------|----------|---------|----------|
| 64 threads | 49.89 | 0.00 | 53.88 | 0.00 |
|----------------------|----------|----------|---------|----------|

This commit, rather than considering delta_idle_purr for calculation of
idle ticks, takes (delta_tb - delta_purr) as ticks for which the CPUs
were in CEDE and hence idle. Following similar logic, delta_purr would
represent busy ticks when CPUs were not ceded. Since, the output was
correct for dedicated capped mode, changes has been made only for
shared and dedicated-donate mode.

After Change:
|-----------------------------------------------------------------|
| Dedicated-donate (8 cores) : |
|----------------------|---------------------|--------------------|
| | Actual | Normalized |
| Stress-ng threads |---------------------|--------------------|
| | %busy | %idle | %busy | %idle |
|----------------------|----------|----------|---------|----------|
| 0 threads | 0.09 | 99.91 | 0.09 | 99.45 |
|----------------------|----------|----------|---------|----------|
| 8 threads | 62.55 | 37.45 | 67.55 | 32.81 |
|----------------------|----------|----------|---------|----------|
| 16 threads | 75.01 | 24.99 | 81.01 | 19.35 |
|----------------------|----------|----------|---------|----------|
| 24 threads | 87.50 | 12.50 | 94.50 | 5.86 |
|----------------------|----------|----------|---------|----------|
| 32 threads | 87.51 | 12.49 | 94.51 | 5.86 |
|----------------------|----------|----------|---------|----------|
| 40 threads | 100.00 | 0.00 | 108.00 | 0.00 |
|----------------------|----------|----------|---------|----------|
| 48 threads | 100.00 | 0.00 | 108.00 | 0.00 |
|----------------------|----------|----------|---------|----------|
| 56 threads | 100.00 | 0.00 | 108.00 | 0.00 |
|----------------------|----------|----------|---------|----------|
| 64 threads | 100.00 | 0.00 | 108.00 | 0.00 |
|----------------------|----------|----------|---------|----------|

|-----------------------------------------------------------------|
| Shared Capped (8 VP / 4 EC) : |
|----------------------|---------------------|--------------------|
| | Actual | Normalized |
| Stress-ng threads |---------------------|--------------------|
| | %busy | %idle | %busy | %idle |
|----------------------|----------|----------|---------|----------|
| 0 threads | 0.18 | 99.82 | 0.19 | 99.44 |
|----------------------|----------|----------|---------|----------|
| 8 threads | 49.99 | 50.01 | 53.99 | 45.89 |
|----------------------|----------|----------|---------|----------|
| 16 threads | 49.66 | 50.34 | 53.63 | 46.71 |
|----------------------|----------|----------|---------|----------|
| 24 threads | 49.93 | 50.07 | 53.42 | 46.08 |
|----------------------|----------|----------|---------|----------|
| 32 threads | 49.92 | 50.08 | 53.41 | 46.08 |
|----------------------|----------|----------|---------|----------|
| 40 threads | 49.89 | 50.11 | 53.88 | 46.52 |
|----------------------|----------|----------|---------|----------|
| 48 threads | 49.88 | 50.12 | 53.87 | 46.41 |
|----------------------|----------|----------|---------|----------|
| 56 threads | 49.90 | 50.10 | 53.40 | 46.10 |
|----------------------|----------|----------|---------|----------|
| 64 threads | 49.90 | 50.10 | 54.39 | 45.36 |
|----------------------|----------|----------|---------|----------|

Before Change:
%busy = (delta_purr - delta_idle_purr) / delta_tb * 100
%idle = delta_idle_purr / delta_tb * 100

After Change:
%busy = delta_purr / delta_tb * 100
%idle = (delta_tb - delta_purr) / delta_tb * 100

Signed-off-by: Saket Kumar Bhaskar <sk...@linux.ibm.com>
---
src/lparstat.c | 83 ++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 73 insertions(+), 10 deletions(-)

diff --git a/src/lparstat.c b/src/lparstat.c
index d2fdb3f..e74ca17 100644
--- a/src/lparstat.c
+++ b/src/lparstat.c
@@ -491,11 +491,16 @@ void get_cpu_util_purr(struct sysentry *unused_se, char *buf)
{
double delta_tb, delta_purr, delta_idle_purr;
double physc;
+ char *descr;
+ char mode[32];

delta_tb = get_scaled_tb();
delta_purr = get_delta_value("purr");
delta_idle_purr = get_delta_value("idle_purr");

+ get_sysdata("shared_processor_mode", &descr, mode);
+ if (!strcmp(mode, "Dedicated"))
+ get_sysdata("DedDonMode", &descr, mode);
/*
* Given that these values are read from different
* sources (purr from lparcfg and idle_purr from sysfs),
@@ -505,8 +510,15 @@ void get_cpu_util_purr(struct sysentry *unused_se, char *buf)
if (delta_idle_purr > delta_purr)
delta_idle_purr = delta_purr;

- physc = (delta_purr - delta_idle_purr) / delta_tb;
- physc *= 100.00;
+ if (!strcmp(mode, "Capped")) {
+ /* For dedicated - capped mode */
+ physc = (delta_purr - delta_idle_purr) / delta_tb;
+ physc *= 100.00;
+ } else {
+ /* For shared and dedicated - donate mode */
+ physc = delta_purr / delta_tb;
+ physc *= 100.00;
+ }

sprintf(buf, "%.2f", physc);
}
@@ -515,11 +527,17 @@ void get_cpu_idle_purr(struct sysentry *unused_se, char *buf)
{
double delta_tb, delta_purr, delta_idle_purr;
double physc, idle;
+ char *descr;
+ char mode[32];

delta_tb = get_scaled_tb();
delta_purr = get_delta_value("purr");
delta_idle_purr = get_delta_value("idle_purr");

+ get_sysdata("shared_processor_mode", &descr, mode);
+ if (!strcmp(mode, "Dedicated"))
+ get_sysdata("DedDonMode", &descr, mode);
+
/*
* Given that these values are read from different
* sources (purr from lparcfg and idle_purr from sysfs),
@@ -529,9 +547,19 @@ void get_cpu_idle_purr(struct sysentry *unused_se, char *buf)
if (delta_idle_purr > delta_purr)
delta_idle_purr = delta_purr;

- physc = (delta_purr - delta_idle_purr) / delta_tb;
- idle = (delta_purr / delta_tb) - physc;
- idle *= 100.00;
+ if (delta_purr > delta_tb)
+ delta_purr = delta_tb;
+
+ if (!strcmp(mode, "Capped")) {
+ /* For dedicated - capped mode */
+ physc = (delta_purr - delta_idle_purr) / delta_tb;
+ idle = (delta_purr / delta_tb) - physc;
+ idle *= 100.00;
+ } else {
+ /* For shared and dedicated - donate mode */
+ idle = (delta_tb - delta_purr) / delta_tb;
+ idle *= 100.00;
+ }

sprintf(buf, "%.2f", idle);
}
@@ -540,13 +568,29 @@ void get_cpu_util_spurr(struct sysentry *unused_se, char *buf)
{
double delta_tb, delta_spurr, delta_idle_spurr;
double physc, rfreq;
+ char *descr;
+ char mode[32];

delta_tb = get_scaled_tb();
delta_spurr = get_delta_value("spurr");
delta_idle_spurr = get_delta_value("idle_spurr");

- physc = (delta_spurr - delta_idle_spurr) / delta_tb;
- physc *= 100.00;
+ get_sysdata("shared_processor_mode", &descr, mode);
+ if (!strcmp(mode, "Dedicated"))
+ get_sysdata("DedDonMode", &descr, mode);
+
+ if (delta_idle_spurr > delta_spurr)
+ delta_idle_spurr = delta_spurr;
+
+ if (!strcmp(mode, "Capped")) {
+ /* For dedicated - capped mode */
+ physc = (delta_spurr - delta_idle_spurr) / delta_tb;
+ physc *= 100.00;
+ } else {
+ /* For shared and dedicated - donate mode */
+ physc = delta_spurr / delta_tb;
+ physc *= 100.00;
+ }

rfreq = round_off_freq();
physc += ((physc * rfreq) / 100);
@@ -559,14 +603,33 @@ void get_cpu_idle_spurr(struct sysentry *unused_se, char *buf)
double delta_tb, delta_spurr, delta_idle_spurr;
double physc, idle;
double rfreq;
+ char *descr;
+ char mode[32];

delta_tb = get_scaled_tb();
delta_spurr = get_delta_value("spurr");
delta_idle_spurr = get_delta_value("idle_spurr");

- physc = (delta_spurr - delta_idle_spurr) / delta_tb;
- idle = (delta_spurr / delta_tb) - physc;
- idle *= 100.00;
+ get_sysdata("shared_processor_mode", &descr, mode);
+ if (!strcmp(mode, "Dedicated"))
+ get_sysdata("DedDonMode", &descr, mode);
+
+ if (delta_idle_spurr > delta_spurr)
+ delta_idle_spurr = delta_spurr;
+
+ if (delta_spurr > delta_tb)
+ delta_spurr = delta_tb;
+
+ if (!strcmp(mode, "Capped")) {
+ /* For dedicated - capped mode */
+ physc = (delta_spurr - delta_idle_spurr) / delta_tb;
+ idle = (delta_spurr / delta_tb) - physc;
+ idle *= 100.00;
+ } else {
+ /* For shared and dedicated - donate mode */
+ idle = (delta_tb - delta_spurr) / delta_tb;
+ idle *= 100.00;
+ }

rfreq = round_off_freq();
idle += ((idle * rfreq) / 100);
--
2.39.3

Srikar Dronamraju

<srikar@linux.ibm.com>
unread,
May 22, 2024, 1:31:32 AM5/22/24
to Saket Kumar Bhaskar, powerpc-utils-devel@googlegroups.com, naveen.n.rao@linux.ibm.com, tyreld@linux.ibm.com
> @@ -529,9 +547,19 @@ void get_cpu_idle_purr(struct sysentry *unused_se, char *buf)
> if (delta_idle_purr > delta_purr)
> delta_idle_purr = delta_purr;
>
> - physc = (delta_purr - delta_idle_purr) / delta_tb;
> - idle = (delta_purr / delta_tb) - physc;
> - idle *= 100.00;
> + if (delta_purr > delta_tb)
> + delta_purr = delta_tb;
> +
> + if (!strcmp(mode, "Capped")) {
> + /* For dedicated - capped mode */
> + physc = (delta_purr - delta_idle_purr) / delta_tb;

Are we missing setting physc in the else case.
Shouldnt physc be common for both cases?

> + idle = (delta_purr / delta_tb) - physc;
> + idle *= 100.00;
> + } else {
> + /* For shared and dedicated - donate mode */
> + idle = (delta_tb - delta_purr) / delta_tb;
> + idle *= 100.00;
> + }
>


> sprintf(buf, "%.2f", idle);
> }

Same here.

> + idle = (delta_spurr / delta_tb) - physc;
> + idle *= 100.00;
> + } else {
> + /* For shared and dedicated - donate mode */
> + idle = (delta_tb - delta_spurr) / delta_tb;
> + idle *= 100.00;
> + }
>
> rfreq = round_off_freq();
> idle += ((idle * rfreq) / 100);
> --
> 2.39.3
>

--
Thanks and Regards
Srikar Dronamraju

Naveen N. Rao

<naveen.n.rao@linux.ibm.com>
unread,
May 23, 2024, 1:58:32 AM5/23/24
to powerpc-utils-devel@googlegroups.com, Saket Kumar Bhaskar, srikar@linux.ibm.com, tyreld@linux.ibm.com
Assuming similar workloads, it is strange seeing 100% busy with >32
threads.
Is there a reason to intentionally ignore delta_idle_purr for the
calculations? Those are supposed to be purr ticks we accululated while
in idle.

- Naveen

Saket Kumar Bhaskar

<skb99@linux.ibm.com>
unread,
May 23, 2024, 5:21:36 AM5/23/24
to Srikar Dronamraju, powerpc-utils-devel@googlegroups.com, naveen.n.rao@linux.ibm.com, tyreld@linux.ibm.com
On 5/22/24 11:01, Srikar Dronamraju wrote:
@@ -529,9 +547,19 @@ void get_cpu_idle_purr(struct sysentry *unused_se, char *buf)
 	if (delta_idle_purr > delta_purr)
 		delta_idle_purr = delta_purr;
 
-	physc = (delta_purr - delta_idle_purr) / delta_tb;
-	idle = (delta_purr / delta_tb) - physc;
-	idle *= 100.00;
+	if (delta_purr > delta_tb)
+		delta_purr = delta_tb;
+
+	if (!strcmp(mode, "Capped")) {
+		/* For dedicated - capped mode */
+		physc = (delta_purr - delta_idle_purr) / delta_tb;
Are we missing setting physc in the else case.
Shouldnt physc be common for both cases?
Here, physc is not the actual physc that is shown in lparstat, rather a variable name.
Since, we don't need physc to be subtracted from anything in else condition, 
we should be fine with it. 
Thanks for review,
Saket

Srikar Dronamraju

<srikar@linux.ibm.com>
unread,
May 27, 2024, 4:45:13 AM5/27/24
to Saket Kumar Bhaskar, powerpc-utils-devel@googlegroups.com, naveen.n.rao@linux.ibm.com, tyreld@linux.ibm.com
* Saket Kumar Bhaskar <sk...@linux.ibm.com> [2024-05-23 14:51:07]:

> On 5/22/24 11:01, Srikar Dronamraju wrote:
>
> > > @@ -529,9 +547,19 @@ void get_cpu_idle_purr(struct sysentry *unused_se, char *buf)
> > > if (delta_idle_purr > delta_purr)
> > > delta_idle_purr = delta_purr;
> > > - physc = (delta_purr - delta_idle_purr) / delta_tb;
> > > - idle = (delta_purr / delta_tb) - physc;
> > > - idle *= 100.00;
> > > + if (delta_purr > delta_tb)
> > > + delta_purr = delta_tb;
> > > +
> > > + if (!strcmp(mode, "Capped")) {
> > > + /* For dedicated - capped mode */
> > > + physc = (delta_purr - delta_idle_purr) / delta_tb;
> > Are we missing setting physc in the else case.
> > Shouldnt physc be common for both cases?
>
> Here, physc is not the actual physc that is shown in lparstat, rather a variable name.
> Since, we don't need physc to be subtracted from anything in else condition,
> we should be fine with it.
>

Ok, I get it. You can ignore my comment.

Saket Kumar Bhaskar

<skb99@linux.ibm.com>
unread,
Jun 5, 2024, 5:23:47 AM6/5/24
to powerpc-utils-devel@googlegroups.com, naveen.n.rao@linux.ibm.com, srikar@linux.ibm.com, tyreld@linux.ibm.com, Saket Kumar Bhaskar
Idle and busy PURR/SPURR values are not adding upto 100% in case of
dedicated-donate and shared partitions, with the present formula.
Because of this, users might get a false impression of resource
utilisation. This is expected because a core can be in either
idle or busy state out of total of 100(core's shared resource can
either be consumed or be left idle). When lpar is in dedicated-donate
or shared,the purr values are not being counted when the CPU is ceded.
The idle_purr is calculated by taking snapshots of purr values at
every idle entry and idle exit. So, when a CPU is ceded, the calculation
for idle_purr will be wrong as purr is not being counted.

Before Change:
|-----------------------------------------------------------------|
| Dedicated-donate (8 cores) : |
|----------------------|---------------------|--------------------|
| | Actual | Normalized |
| Stress-ng threads |---------------------|--------------------|
| | %busy | %idle | %busy | %idle |
|----------------------|----------|----------|---------|----------|
| 0 threads | 0.02 | 0.05 | 0.02 | 0.05 |
|----------------------|----------|----------|---------|----------|
| 8 threads | 32.64 | 17.37 | 35.25 | 18.77 |
|----------------------|----------|----------|---------|----------|
| 16 threads | 58.61 | 16.42 | 63.29 | 17.74 |
|----------------------|----------|----------|---------|----------|
| 24 threads | 78.14 | 21.86 | 84.39 | 23.61 |
|----------------------|----------|----------|---------|----------|
| 32 threads | 83.60 | 16.40 | 90.30 | 17.71 |
|----------------------|----------|----------|---------|----------|
| 40 threads | 91.90 | 6.94 | 98.31 | 7.46 |
|----------------------|----------|----------|---------|----------|
| 48 threads | 96.08 | 3.92 | 102.79 | 4.21 |
|----------------------|----------|----------|---------|----------|
| 56 threads | 98.42 | 1.57 | 105.31 | 1.69 |
|----------------------|----------|----------|---------|----------|
| 64 threads | 100.00 | 0.00 | 106.00 | 0.00 |
|----------------------|----------|----------|---------|----------|

|-----------------------------------------------------------------|
| Shared Capped (8 VP / 4 EC) : |
|----------------------|---------------------|--------------------|
| | Actual | Normalized |
| Stress-ng threads |---------------------|--------------------|
| | %busy | %idle | %busy | %idle |
|----------------------|----------|----------|---------|----------|
| 0 threads | 0.04 | 0.18 | 0.03 | 0.19 |
|----------------------|----------|----------|---------|----------|
| 8 threads | 35.90 | 14.09 | 38.77 | 15.21 |
|----------------------|----------|----------|---------|----------|
| 16 threads | 35.25 | 14.84 | 38.08 | 16.02 |
|----------------------|----------|----------|---------|----------|
| 24 threads | 40.13 | 9.73 | 42.93 | 10.43 |
|----------------------|----------|----------|---------|----------|
| 32 threads | 44.13 | 5.73 | 47.22 | 6.14 |
|----------------------|----------|----------|---------|----------|
| 40 threads | 46.47 | 3.42 | 50.18 | 3.69 |
|----------------------|----------|----------|---------|----------|
| 48 threads | 48.03 | 1.83 | 51.39 | 1.96 |
|----------------------|----------|----------|---------|----------|
| 56 threads | 49.04 | 0.86 | 52.47 | 0.93 |
|----------------------|----------|----------|---------|----------|
| 64 threads | 49.87 | 0.00 | 53.36 | 0.00 |
|----------------------|----------|----------|---------|----------|

This commit, rather than considering delta_idle_purr for calculation of
idle ticks, takes (delta_tb - delta_purr + delta_idle_purr) as total
ticks for which the CPUs were idle. Here, since delta_idle_purr will
also contain some idle ticks, thats why it is added to the formula.

Given that, in case of dedicated capped mode delta_purr will be
approximately equal to delta_idle_purr, the expression:
idle = (delta_tb - delta_purr + delta_idle_purr) / delta_tb

will be effectively:
idle = delta_idle_purr / delta_tb.

Further, no changes has been made for calculation of %busy.

After Change:

|-----------------------------------------------------------------|
| Dedicated-donate (8 cores) : |
|----------------------|---------------------|--------------------|
| | Actual | Normalized |
| Stress-ng threads |---------------------|--------------------|
| | %busy | %idle | %busy | %idle |
|----------------------|----------|----------|---------|----------|
| 0 threads | 0.02 | 99.98 | 0.02 | 100.04 |
|----------------------|----------|----------|---------|----------|
| 8 threads | 35.97 | 64.03 | 38.84 | 61.51 |
|----------------------|----------|----------|---------|----------|
| 16 threads | 58.60 | 41.40 | 63.28 | 37.08 |
|----------------------|----------|----------|---------|----------|
| 24 threads | 78.14 | 21.86 | 84.39 | 23.61 |
|----------------------|----------|----------|---------|----------|
| 32 threads | 83.60 | 16.41 | 90.29 | 17.71 |
|----------------------|----------|----------|---------|----------|
| 40 threads | 92.96 | 7.04 | 100.39 | 7.61 |
|----------------------|----------|----------|---------|----------|
| 48 threads | 96.08 | 3.92 | 103.77 | 4.24 |
|----------------------|----------|----------|---------|----------|
| 56 threads | 98.42 | 1.58 | 105.31 | 1.68 |
|----------------------|----------|----------|---------|----------|
| 64 threads | 100.00 | 0.00 | 107.00 | 0.00 |
|----------------------|----------|----------|---------|----------|

|-----------------------------------------------------------------|
| Shared Capped (8 VP / 4 EC) : |
|----------------------|---------------------|--------------------|
| | Actual | Normalized |
| Stress-ng threads |---------------------|--------------------|
| | %busy | %idle | %busy | %idle |
|----------------------|----------|----------|---------|----------|
| 0 threads | 0.03 | 99.97 | 0.19 | 99.44 |
|----------------------|----------|----------|---------|----------|
| 8 threads | 35.91 | 64.09 | 38.78 | 61.58 |
|----------------------|----------|----------|---------|----------|
| 16 threads | 36.83 | 63.17 | 39.78 | 60.55 |
|----------------------|----------|----------|---------|----------|
| 24 threads | 40.16 | 59.84 | 43.37 | 56.95 |
|----------------------|----------|----------|---------|----------|
| 32 threads | 44.47 | 55.53 | 48.02 | 52.38 |
|----------------------|----------|----------|---------|----------|
| 40 threads | 46.55 | 53.45 | 50.27 | 50.04 |
|----------------------|----------|----------|---------|----------|
| 48 threads | 48.13 | 51.87 | 52.48 | 47.82 |
|----------------------|----------|----------|---------|----------|
| 56 threads | 49.01 | 50.99 | 52.93 | 47.41 |
|----------------------|----------|----------|---------|----------|
| 64 threads | 49.90 | 50.10 | 53.40 | 46.19 |
|----------------------|----------|----------|---------|----------|

Before Change:
%idle = delta_idle_purr / delta_tb * 100

After Change:
%idle = (delta_tb - delta_purr + delta_idle_purr) / delta_tb * 100

Signed-off-by: Saket Kumar Bhaskar <sk...@linux.ibm.com>
---

v1 -> v2:
- Ticks accounted in delta_idle_purr were missed for busy and idle
calculation, as pointed out by Naveen.
- Because of this we were also seeing high busy values for lesser
number of threads.
- Generic formula for %idle calculation in case of dedicated and shared
lpar.

v1: https://groups.google.com/g/powerpc-utils-devel/c/IZs0Norg8nc


src/lparstat.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/src/lparstat.c b/src/lparstat.c
index d2fdb3f..0a81329 100644
--- a/src/lparstat.c
+++ b/src/lparstat.c
@@ -514,24 +514,26 @@ void get_cpu_util_purr(struct sysentry *unused_se, char *buf)
void get_cpu_idle_purr(struct sysentry *unused_se, char *buf)
{
double delta_tb, delta_purr, delta_idle_purr;
- double physc, idle;
+ double idle;

delta_tb = get_scaled_tb();
delta_purr = get_delta_value("purr");
delta_idle_purr = get_delta_value("idle_purr");

/*
* Given that these values are read from different
* sources (purr from lparcfg and idle_purr from sysfs),
* a small variation in the values is possible.
* In such cases, round down delta_idle_purr to delta_purr.
*/
if (delta_idle_purr > delta_purr)
delta_idle_purr = delta_purr;

- physc = (delta_purr - delta_idle_purr) / delta_tb;
- idle = (delta_purr / delta_tb) - physc;
+ if (delta_purr > delta_tb)
+ delta_purr = delta_tb;
+
+ idle = (delta_tb - delta_purr + delta_idle_purr) / delta_tb;
idle *= 100.00;

sprintf(buf, "%.2f", idle);
}
@@ -539,17 +541,20 @@ void get_cpu_idle_purr(struct sysentry *unused_se, char *buf)
void get_cpu_util_spurr(struct sysentry *unused_se, char *buf)
{
double delta_tb, delta_spurr, delta_idle_spurr;
double physc, rfreq;

delta_tb = get_scaled_tb();
delta_spurr = get_delta_value("spurr");
delta_idle_spurr = get_delta_value("idle_spurr");

+ if (delta_idle_spurr > delta_spurr)
+ delta_idle_spurr = delta_spurr;
+
physc = (delta_spurr - delta_idle_spurr) / delta_tb;
physc *= 100.00;

rfreq = round_off_freq();
physc += ((physc * rfreq) / 100);

sprintf(buf, "%.2f", physc);
}
@@ -557,19 +562,24 @@ void get_cpu_util_spurr(struct sysentry *unused_se, char *buf)
void get_cpu_idle_spurr(struct sysentry *unused_se, char *buf)
{
double delta_tb, delta_spurr, delta_idle_spurr;
- double physc, idle;
+ double idle;
double rfreq;

delta_tb = get_scaled_tb();
delta_spurr = get_delta_value("spurr");
delta_idle_spurr = get_delta_value("idle_spurr");

- physc = (delta_spurr - delta_idle_spurr) / delta_tb;
- idle = (delta_spurr / delta_tb) - physc;
+ if (delta_idle_spurr > delta_spurr)
+ delta_idle_spurr = delta_spurr;
+
+ if (delta_spurr > delta_tb)
+ delta_spurr = delta_tb;
+
+ idle = (delta_tb - delta_spurr + delta_idle_spurr) / delta_tb;
idle *= 100.00;

rfreq = round_off_freq();
idle += ((idle * rfreq) / 100);

sprintf(buf, "%.2f", idle);
}
--
2.39.3

Srikar Dronamraju

<srikar@linux.ibm.com>
unread,
Jun 6, 2024, 2:23:00 AM6/6/24
to Saket Kumar Bhaskar, powerpc-utils-devel@googlegroups.com, naveen.n.rao@linux.ibm.com, tyreld@linux.ibm.com
* Saket Kumar Bhaskar <sk...@linux.ibm.com> [2024-06-05 14:52:27]:

> Idle and busy PURR/SPURR values are not adding upto 100% in case of
> dedicated-donate and shared partitions, with the present formula.
> Because of this, users might get a false impression of resource
> utilisation. This is expected because a core can be in either
> idle or busy state out of total of 100(core's shared resource can
> either be consumed or be left idle). When lpar is in dedicated-donate
> or shared,the purr values are not being counted when the CPU is ceded.
> The idle_purr is calculated by taking snapshots of purr values at
> every idle entry and idle exit. So, when a CPU is ceded, the calculation
> for idle_purr will be wrong as purr is not being counted.
>

Looks good to me.

Reviewed-by: Srikar Dronamraju <sri...@linux.ibm.com>

Saket Kumar Bhaskar

<skb99@linux.ibm.com>
unread,
Jun 10, 2024, 1:03:14 AM6/10/24
to Srikar Dronamraju, powerpc-utils-devel@googlegroups.com, naveen.n.rao@linux.ibm.com, tyreld@linux.ibm.com
Thanks for reviewing, Srikar.

Best Regards,
Saket Kumar Bhaskar

Naveen N. Rao

<naveen.n.rao@linux.ibm.com>
unread,
Jun 12, 2024, 1:30:41 AM6/12/24
to powerpc-utils-devel@googlegroups.com, Saket Kumar Bhaskar, srikar@linux.ibm.com, tyreld@linux.ibm.com
The numbers look good now.
Two minor concerns:
- Do you often see delta_purr to be larger than delta_tb? Is it a
rounding error, or is the difference much bigger?
- You also generalize the calculation here, and is no longer specific to
shared/ded-donate. Not sure if this can hide bugs in our purr
accounting, et al.

Hopefully not big issues, but would be good to get confirmation on the
above.


- Naveen

Saket Kumar Bhaskar

<skb99@linux.ibm.com>
unread,
Jun 19, 2024, 1:35:31 AM6/19/24
to Naveen N. Rao, powerpc-utils-devel@googlegroups.com, srikar@linux.ibm.com, tyreld@linux.ibm.com
I have seen for dedicated-donate, delta_purr > delta_tb and delta_spurr>delta_tb.

I removed the if condition (if (delta_purr > delta_tb)), and dumped the delta
values on a 64 CPUs system:

1. For 64 threads:

#./lparstat -E 1 30

System Configuration
type=Dedicated mode=Donating smt=8 lcpu=8 mem=102544256 kB cpus=0 ent=8.00

---Actual--- -Normalized-
%busy %idle Frequency %busy %idle
------ ------ ------------- ------ ------
delta_purr 4194065168.000000 delta_tb 4193853440.000000 delta_idle_purr 0.000000 delta_spurr 4513401456.000000 delta_tb 4193853440.000000 delta_idle_spurr 0.000000 100.01 -0.01 3.87GHz[108%] 108.01 -7.65
delta_purr 4115058032.000000 delta_tb 4115066880.000000 delta_idle_purr 0.000000 delta_spurr 4428474080.000000 delta_tb 4115066880.000000 delta_idle_spurr 0.000000 100.00 0.00 3.87GHz[108%] 108.00 -7.64
delta_purr 4114827136.000000 delta_tb 4114767872.000000 delta_idle_purr 0.000000 delta_spurr 4430429340.000000 delta_tb 4114767872.000000 delta_idle_spurr 0.000000 100.00 -0.00 3.88GHz[108%] 108.00 -7.69
delta_purr 4117258032.000000 delta_tb 4117172224.000000 delta_idle_purr 0.000000 delta_spurr 4430827912.000000 delta_tb 4117172224.000000 delta_idle_spurr 0.000000 100.00 -0.00 3.87GHz[108%] 108.00 -7.65
delta_purr 4116939616.000000 delta_tb 4116955136.000000 delta_idle_purr 0.000000 delta_spurr 4429904076.000000 delta_tb 4116955136.000000 delta_idle_spurr 0.000000 100.00 0.00 3.87GHz[108%] 108.00 -7.63

2. For 56 threads:

#./lparstat -E 1 30

System Configuration
type=Dedicated mode=Donating smt=8 lcpu=8 mem=102544256 kB cpus=0 ent=8.00

---Actual--- -Normalized-
%busy %idle Frequency %busy %idle
------ ------ ------------- ------ ------
delta_purr 4124239336.000000 delta_tb 4124053504.000000 delta_idle_purr 65320864.000000 delta_spurr 4438340384.000000 delta_tb 4124053504.000000 delta_idle_spurr 70298488.000000 98.42 1.58 3.87GHz[108%] 106.29 -5.94
delta_purr 4115783736.000000 delta_tb 4115742720.000000 delta_idle_purr 64844040.000000 delta_spurr 4430295260.000000 delta_tb 4115742720.000000 delta_idle_spurr 69784284.000000 98.43 1.57 3.88GHz[108%] 106.30 -5.97
delta_purr 4116809568.000000 delta_tb 4117000192.000000 delta_idle_purr 64362656.000000 delta_spurr 4429448892.000000 delta_tb 4117000192.000000 delta_idle_spurr 69267872.000000 98.43 1.57 3.87GHz[108%] 106.31 -5.93
delta_purr 4115649744.000000 delta_tb 4115554304.000000 delta_idle_purr 64738312.000000 delta_spurr 4429501792.000000 delta_tb 4115554304.000000 delta_idle_spurr 69671320.000000 98.43 1.57 3.87GHz[108%] 106.30 -5.96
delta_purr 4115961312.000000 delta_tb 4115902464.000000 delta_idle_purr 64828552.000000 delta_spurr 4429060776.000000 delta_tb 4115902464.000000 delta_idle_spurr 69768592.000000 98.43 1.57 3.87GHz[108%] 106.30 -5.93

We can see negative values for %idle purr/spurr, because delta_(s)purr > delta_tb.

> - You also generalize the calculation here, and is no longer specific to
> shared/ded-donate. Not sure if this can hide bugs in our purr accounting,
> et al.
>
Earlier, too we had generic formula. IMO, I dont see any specific reason for
different formulas for dedicated-donate and shared as of now.

> Hopefully not big issues, but would be good to get confirmation on the
> above.
>
>
> - Naveen
>
Thanks,
Saket Kumar Bhaskar

Naveen N. Rao

<naveen.n.rao@linux.ibm.com>
unread,
Jun 25, 2024, 6:27:08 AM6/25/24
to Saket Kumar Bhaskar, powerpc-utils-devel@googlegroups.com, srikar@linux.ibm.com, tyreld@linux.ibm.com
Saket Kumar Bhaskar wrote:
> On Wed, Jun 12, 2024 at 11:00:29AM +0530, Naveen N. Rao wrote:
>> Saket Kumar Bhaskar wrote:
>> > Idle and busy PURR/SPURR values are not adding upto 100% in case of
>> > dedicated-donate and shared partitions, with the present formula.
>> > Because of this, users might get a false impression of resource
>> > utilisation. This is expected because a core can be in either
>> > idle or busy state out of total of 100(core's shared resource can
>> > either be consumed or be left idle). When lpar is in dedicated-donate
>> > or shared,the purr values are not being counted when the CPU is ceded.
>> > The idle_purr is calculated by taking snapshots of purr values at
>> > every idle entry and idle exit. So, when a CPU is ceded, the calculation
>> > for idle_purr will be wrong as purr is not being counted.
>> >

<snip>
So delta_purr to delta_tb is indeed a rounding off error.

>
> We can see negative values for %idle purr/spurr, because delta_(s)purr > delta_tb.

For spurr though, I think it is reasonable to expect delta_spurr to be
greater than delta_tb since it is a scaled value.

> diff --git a/src/lparstat.c b/src/lparstat.c
> index d2fdb3f..0a81329 100644
> --- a/src/lparstat.c
> +++ b/src/lparstat.c
> @@ -514,24 +514,26 @@ void get_cpu_util_purr(struct sysentry *unused_se, char *buf)
> void get_cpu_idle_purr(struct sysentry *unused_se, char *buf)
> {
> double delta_tb, delta_purr, delta_idle_purr;
> - double physc, idle;
> + double idle;
>
> delta_tb = get_scaled_tb();
> delta_purr = get_delta_value("purr");
> delta_idle_purr = get_delta_value("idle_purr");
>
> /*
> * Given that these values are read from different
> * sources (purr from lparcfg and idle_purr from sysfs),
> * a small variation in the values is possible.
> * In such cases, round down delta_idle_purr to delta_purr.
> */
> if (delta_idle_purr > delta_purr)
> delta_idle_purr = delta_purr;
>
> - physc = (delta_purr - delta_idle_purr) / delta_tb;
> - idle = (delta_purr / delta_tb) - physc;
> + if (delta_purr > delta_tb)
> + delta_purr = delta_tb;

This is ok.

> +
> + idle = (delta_tb - delta_purr + delta_idle_purr) / delta_tb;

(delta_tb - delta_purr) should only be considered for shared/ded-donate.

> idle *= 100.00;
>
> sprintf(buf, "%.2f", idle);
> }
> @@ -539,17 +541,20 @@ void get_cpu_idle_purr(struct sysentry *unused_se, char *buf)
> void get_cpu_util_spurr(struct sysentry *unused_se, char *buf)
> {
> double delta_tb, delta_spurr, delta_idle_spurr;
> double physc, rfreq;
>
> delta_tb = get_scaled_tb();
> delta_spurr = get_delta_value("spurr");
> delta_idle_spurr = get_delta_value("idle_spurr");
>
> + if (delta_idle_spurr > delta_spurr)
> + delta_idle_spurr = delta_spurr;
> +

Have you ever seen this happen?

> physc = (delta_spurr - delta_idle_spurr) / delta_tb;
> physc *= 100.00;
>
> rfreq = round_off_freq();
> physc += ((physc * rfreq) / 100);
>
> sprintf(buf, "%.2f", physc);
> }
> @@ -557,19 +562,24 @@ void get_cpu_util_spurr(struct sysentry *unused_se, char *buf)
> void get_cpu_idle_spurr(struct sysentry *unused_se, char *buf)
> {
> double delta_tb, delta_spurr, delta_idle_spurr;
> - double physc, idle;
> + double idle;
> double rfreq;
>
> delta_tb = get_scaled_tb();
> delta_spurr = get_delta_value("spurr");
> delta_idle_spurr = get_delta_value("idle_spurr");
>
> - physc = (delta_spurr - delta_idle_spurr) / delta_tb;
> - idle = (delta_spurr / delta_tb) - physc;
> + if (delta_idle_spurr > delta_spurr)
> + delta_idle_spurr = delta_spurr;

Ditto. I don't see how this can happen.

> +
> + if (delta_spurr > delta_tb)
> + delta_spurr = delta_tb;

This is reasonable, and to be expected since SPURR is a scaled value.

> +
> + idle = (delta_tb - delta_spurr + delta_idle_spurr) / delta_tb;
> idle *= 100.00;

This change should be limited to shared/ded-donate, perhaps gated
behind:
if (delta_tb > delta_spurr)

>
> rfreq = round_off_freq();
> idle += ((idle * rfreq) / 100);
>
> sprintf(buf, "%.2f", idle);
> }

>
>> - You also generalize the calculation here, and is no longer specific to
>> shared/ded-donate. Not sure if this can hide bugs in our purr accounting,
>> et al.
>>
> Earlier, too we had generic formula. IMO, I dont see any specific reason for
> different formulas for dedicated-donate and shared as of now.

Is this broken only on shared/dedicated-donate or is there an issue in
other configurations too?

Your commit log indicates there is an issue in shared/dedicated-donate
partitions only. As far as I can see, that is because PURR ticks are not
accounted to those partitions when the cpu is in idle. As such, it makes
sense to account (delta_tb - delta_purr) to idle only on those partition
types. For dedicated partitions, it should effectively be zero. But, it
would be better not to ignore the difference, if any. That will just
hide accounting errors.


- Naveen

Srikar Dronamraju

<srikar@linux.ibm.com>
unread,
Jun 25, 2024, 9:13:22 AM6/25/24
to Naveen N. Rao, Saket Kumar Bhaskar, powerpc-utils-devel@googlegroups.com, tyreld@linux.ibm.com
* Naveen N. Rao <naveen...@linux.ibm.com> [2024-06-25 15:56:55]:

> Saket Kumar Bhaskar wrote:
> > On Wed, Jun 12, 2024 at 11:00:29AM +0530, Naveen N. Rao wrote:
> > > Saket Kumar Bhaskar wrote:
> > if (delta_idle_purr > delta_purr)
> > delta_idle_purr = delta_purr;
> > - physc = (delta_purr - delta_idle_purr) / delta_tb;
> > - idle = (delta_purr / delta_tb) - physc;
> > + if (delta_purr > delta_tb)
> > + delta_purr = delta_tb;
>
> This is ok.
>
> > +
> > + idle = (delta_tb - delta_purr + delta_idle_purr) / delta_tb;
>
> (delta_tb - delta_purr) should only be considered for shared/ded-donate.

On dedicated-capped i.e non shared/dedicated-donate, this delta_tb -
delta_purr will always be 0. So keeping this for dedicated wouldnt cause any
side-effect. We can as well use an if-else condition but essentially we will
be doing the same in both the if and else cases.

>
> > idle *= 100.00;
> > sprintf(buf, "%.2f", idle);
> > }
> > @@ -539,17 +541,20 @@ void get_cpu_idle_purr(struct sysentry *unused_se, char *buf)
> > void get_cpu_util_spurr(struct sysentry *unused_se, char *buf)
> > {
> > double delta_tb, delta_spurr, delta_idle_spurr;
> > double physc, rfreq;
> > delta_tb = get_scaled_tb();
> > delta_spurr = get_delta_value("spurr");
> > delta_idle_spurr = get_delta_value("idle_spurr");
> > + if (delta_idle_spurr > delta_spurr)
> > + delta_idle_spurr = delta_spurr;
> > +
>
> Have you ever seen this happen?
>
> > }
>
> >
> > > - You also generalize the calculation here, and is no longer specific to
> > > shared/ded-donate. Not sure if this can hide bugs in our purr accounting,
> > > et al.
> > >
> > Earlier, too we had generic formula. IMO, I dont see any specific reason for
> > different formulas for dedicated-donate and shared as of now.
>
> Is this broken only on shared/dedicated-donate or is there an issue in other
> configurations too?
>
> Your commit log indicates there is an issue in shared/dedicated-donate
> partitions only. As far as I can see, that is because PURR ticks are not
> accounted to those partitions when the cpu is in idle. As such, it makes
> sense to account (delta_tb - delta_purr) to idle only on those partition
> types. For dedicated partitions, it should effectively be zero. But, it
> would be better not to ignore the difference, if any. That will just hide
> accounting errors.
>

Yes, the problem we observe is only on the shared/dedicated-donate.

However lets take a hypothetical situation where on a dedicated-capped lpar
delta_tb is greater than delta_purr. Without the current change, busy would
be delta_purr/delta_tb and idle would be delta_idle_purr/delta_tb.

Now busy + idle would again not be 100%. Lets say busy+idle comes down to 95%
So what would users wonder about the missing 5%. Where was it spent?

My take, difference of tb - purr, irrespective of dedicated or shared should
be accounted to idle.

lparstat is not a validation tool for tb and purr values. That should be
done outside and independent of lparstat tool.

We should neither consider the users of lparstat tool to be validators nor
expect them to understand the underlying internals.

Tyrel Datwyler

<tyreld@linux.ibm.com>
unread,
Jun 26, 2024, 7:30:08 PM6/26/24
to Saket Kumar Bhaskar, powerpc-utils-devel@googlegroups.com, naveen.n.rao@linux.ibm.com, srikar@linux.ibm.com
On 6/5/24 02:22, Saket Kumar Bhaskar wrote:
> Idle and busy PURR/SPURR values are not adding upto 100% in case of
> dedicated-donate and shared partitions, with the present formula.
> Because of this, users might get a false impression of resource
> utilisation. This is expected because a core can be in either
> idle or busy state out of total of 100(core's shared resource can
> either be consumed or be left idle). When lpar is in dedicated-donate
> or shared,the purr values are not being counted when the CPU is ceded.
> The idle_purr is calculated by taking snapshots of purr values at
> every idle entry and idle exit. So, when a CPU is ceded, the calculation
> for idle_purr will be wrong as purr is not being counted.
>

My only nit pick is that there is no mention that the Actual/Normalized values
of %busy and %idle are coming from the output of lparstat -E.

-Tyrel
Reply all
Reply to author
Forward
0 new messages