[PATCH 0/2] lparstat: Improve since boot reports

21 views
Skip to first unread message

Shrikanth Hegde

<sshegde@linux.ibm.com>
unread,
Apr 5, 2024, 7:07:50 AMApr 5
to tyreld@linux.ibm.com, powerpc-utils-devel@googlegroups.com, sshegde@linux.ibm.com, mpe@ellerman.id.au, naveen.n.rao@linux.ibm.com, mahesh@linux.ibm.com
When there are no options provided to lparstat, it is expected to give
the reports since the LPAR boot. Currently it shows wrong values for
physc, purr/idle_purr utilization values in Dedicated LPARs and app,
physc, purr in Shared Processor LPARs.

For accurate reporting of app, a new field is added to lparcfg called
boot pool idle time. It is pool idle time captured at boot. Using it,
app value is coming out correctly. PATCH 1 does that.

For the other fields, the issue is mainly due to how lparstat gets
"time". It should give either interval time or time since boot.
Using gettimeofday is a wrong interface to get that info, since it gives
time since Epoch. Though that works for intervals, it doesn't work for
since boot reports. Using /proc/uptime seems to be the right interface
that should be used instead. PATCH 2 does that.

PATCH 1 Depends on the lparcfg to have the new field.
*Hence depends on this*
https://lore.kernel.org/all/20240405101340....@linux.ibm.com/

PATCH 2 is independent.

Shrikanth Hegde (2):
lparstat: app: use pic value at boot for accurate boot time reporting
lparstat: use /proc/uptime instead for get_time()

src/lparstat.c | 20 ++++++++++----------
src/lparstat.h | 2 ++
2 files changed, 12 insertions(+), 10 deletions(-)

--
2.39.3

Shrikanth Hegde

<sshegde@linux.ibm.com>
unread,
Apr 5, 2024, 7:07:54 AMApr 5
to tyreld@linux.ibm.com, powerpc-utils-devel@googlegroups.com, sshegde@linux.ibm.com, mpe@ellerman.id.au, naveen.n.rao@linux.ibm.com, mahesh@linux.ibm.com
"time" is used in lparstat.c to find the time elapsed either since boot
or between two intervals. But it is using gettimeofday which returns the
time elapsed since Epoch. This works for intervals calculations but it
doesn't work for since boot reports.

Instead use the /proc/uptime interface to get the elapsed time. This
fixes physc, utilization based on purr being wrong since boot


Test::
reboot
stress-ng --cpu=$(nproc) -t 600
sleep 600


Results::
==================== Shared LPAR ==================================
System Configuration
type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00

lparstat -E <-- Observe utilization values
====== 6.9-rc1 and lparstat 1.3.10 =============
---Actual--- -Normalized-
%busy %idle Frequency %busy %idle
------ ------ ------------- ------ ------
0.00 0.00 3.87GHz[106%] 0.00 0.00

==== With this patch and patch 2/3 =============
---Actual--- -Normalized-
%busy %idle Frequency %busy %idle
------ ------ ------------- ------ ------
38.72 0.11 3.87GHz[106%] 41.04 0.12

lparstat <-- Observe physc values
====== 6.9-rc1 and lparstat 1.3.10 ===================================
%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ----- ----- ----- ----- ----- ----- ----- -----
47.48 0.01 0.00 52.51 0.00 0.00 47.49 69099.72 541547 21

=== With this patch and this patch ================================ ===
%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ----- ----- ----- ----- ----- ----- ----- -----
47.48 0.01 0.00 52.51 5.73 47.75 47.49 31.21 541753 21


==================== Dedicated LPAR ==================================
System Configuration
type=Dedicated mode=Capped smt=8 lcpu=12 mem=15573248 kB cpus=0 ent=12.00

::lparstat -E:: <-- Observe utilization values.
======= 6.9-rc1 and lparstat 1.3.10 =============
---Actual--- -Normalized-
%busy %idle Frequency %busy %idle
------ ------ ------------- ------ ------
0.00 0.00 3.87GHz[106%] 0.00 0.00

=== With this patch and powerpc-utils patch to do the above equation ===
---Actual--- -Normalized-
%busy %idle Frequency %busy %idle
------ ------ ------------- ------ ------
48.87 51.51 3.87GHz[106%] 51.81 54.60


::lparstat:: <-- Observe physc values.
======= 6.9-rc1 and lparstat 1.3.10 =============
%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ----- ----- ----- ----- ----- ----- ----- -----
48.38 0.01 0.00 51.61 0.03 0.25 48.39 0.00 344661 8

=== With this patch and powerpc-utils patch to do the above equation ===
%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ----- ----- ----- ----- ----- ----- ----- -----
48.38 0.01 0.00 51.61 12.05 100.42 48.39 0.00 344877 8

=============================================================================

Interval based lparstat values are same. With this patch the physc and
busy purr/idle purr values show correctly for since boot reports.

Note: this patch doesn't fix the idle purr being incorrect. That is
currently being investigated.

Signed-off-by: Shrikanth Hegde <ssh...@linux.ibm.com>
---
src/lparstat.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/lparstat.c b/src/lparstat.c
index 612ffd0..b56fe6d 100644
--- a/src/lparstat.c
+++ b/src/lparstat.c
@@ -255,14 +255,16 @@ long long get_delta_value(char *se_name)

void get_time()
{
- struct timeval t;
+ char *descr, uptime[32];
struct sysentry *se;

- gettimeofday(&t, 0);
+ get_sysdata("uptime", &descr, uptime);
+ if (!strcmp(uptime, SE_NOT_VALID))
+ return;

se = get_sysentry("time");
sprintf(se->value, "%lld",
- (long long)t.tv_sec * 1000000LL + (long long)t.tv_usec);
+ (long long)atof(uptime));
}

int get_time_base()
@@ -304,7 +306,6 @@ double get_scaled_tb(void)
online_cores = atoi(se->value);

elapsed = get_delta_value("time");
- elapsed = elapsed / 1000000.0;

se = get_sysentry("timebase");
timebase = atoi(se->value);
@@ -403,13 +404,12 @@ void get_cpu_physc(struct sysentry *unused_se, char *buf)
delta_purr = get_delta_value("purr");

se = get_sysentry("tbr");
- if (se->value[0] != '\0') {
+ if (se->old_value[0] != '\0') {
delta_tb = get_delta_value("tbr");

physc = delta_purr / delta_tb;
} else {
elapsed = get_delta_value("time");
- elapsed = elapsed / 1000000.0;

se = get_sysentry("timebase");
timebase = atoi(se->value);
@@ -436,7 +436,7 @@ void get_cpu_app(struct sysentry *unused_se, char *buf)
{
struct sysentry *se;
float timebase, app, elapsed_time;
- long long new_app, old_app, delta_time;
+ long long new_app, old_app;
char *descr, uptime[32];

se = get_sysentry("time");
@@ -450,8 +450,7 @@ void get_cpu_app(struct sysentry *unused_se, char *buf)
}
elapsed_time = atof(uptime);
} else {
- delta_time = get_delta_value("time");
- elapsed_time = delta_time / 1000000.0;
+ elapsed_time = get_delta_value("time");
}

se = get_sysentry("timebase");
--
2.39.3

Shrikanth Hegde

<sshegde@linux.ibm.com>
unread,
Apr 5, 2024, 7:07:54 AMApr 5
to tyreld@linux.ibm.com, powerpc-utils-devel@googlegroups.com, sshegde@linux.ibm.com, mpe@ellerman.id.au, naveen.n.rao@linux.ibm.com, mahesh@linux.ibm.com
When there are no options specified for lparstat, it is expected to
give reports since LPAR boot. App is an indicator for available
processor pool in an Shared Processor Logical Partition(SPLPAR). App is
derived using pool_idle_time which is obtained using H_PIC call.

The interval based reports show correct App value while since boot
report shows very high App values. This happens because in that case app
is obtained by dividing pool idle time by LPAR uptime. Since pool idle
time is reported by the PowerVM hypervisor since its boot, it need not
align with LPAR boot. This leads to large App values.

To fix that use the boot pool idle time added newly in the lparcfg as
below.
App = (pool idle time - boot pool idle time) / (uptime * timebase)


**
This depends on "powerpc/pseries: Add pool idle time at LPAR boot" be
merged into kernel
**


Results: (Observe app values)
========================================================================
lparstat
System Configuration
type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00

reboot
stress-ng --cpu=$(nproc) -t 600
sleep 600
So in this case app is expected to close to 37-6=31.

====== 6.9-rc1 and lparstat 1.3.10 =============
%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ----- ----- ----- ----- ----- ----- ----- -----
47.48 0.01 0.00 52.51 0.00 0.00 47.49 69099.72 541547 21

=== With this patch and this patch to do the above equation ===
%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ----- ----- ----- ----- ----- ----- ----- -----
47.48 0.01 0.00 52.51 5.73 47.75 47.49 31.21 541753 21

Signed-off-by: Shrikanth Hegde <ssh...@linux.ibm.com>
---

Note: This patch doesn't fix the phsyc. It would be fixed by a
patch which changes the how "time" is being captured.

src/lparstat.c | 3 ++-
src/lparstat.h | 2 ++
2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/lparstat.c b/src/lparstat.c
index d2fdb3f..612ffd0 100644
--- a/src/lparstat.c
+++ b/src/lparstat.c
@@ -460,7 +460,8 @@ void get_cpu_app(struct sysentry *unused_se, char *buf)
se = get_sysentry("pool_idle_time");
new_app = strtoll(se->value, NULL, 0);
if (se->old_value[0] == '\0') {
- old_app = 0;
+ se = get_sysentry("boot_pool_idle_time");
+ old_app = strtoll(se->value, NULL, 0);
} else {
old_app = strtoll(se->old_value, NULL, 0);
}
diff --git a/src/lparstat.h b/src/lparstat.h
index b7c88e9..77203e1 100644
--- a/src/lparstat.h
+++ b/src/lparstat.h
@@ -124,6 +124,8 @@ struct sysentry system_data[] = {
.get = &get_percent_entry},
{.name = "pool_idle_time",
.descr = "Shared Processor Pool Idle Time"},
+ {.name = "boot_pool_idle_time",
+ .descr = "Shared Processor Pool Idle Time"},
{.name = "pool_num_procs",
.descr = "Shared Processor Pool Processors"},
{.name = "unallocated_capacity_weight",
--
2.39.3

Nathan Lynch

<nathanl@linux.ibm.com>
unread,
Apr 5, 2024, 8:40:43 AMApr 5
to Shrikanth Hegde, tyreld@linux.ibm.com, powerpc-utils-devel@googlegroups.com, sshegde@linux.ibm.com, mpe@ellerman.id.au, naveen.n.rao@linux.ibm.com, mahesh@linux.ibm.com
Shrikanth Hegde <ssh...@linux.ibm.com> writes:

> "time" is used in lparstat.c to find the time elapsed either since boot
> or between two intervals. But it is using gettimeofday which returns the
> time elapsed since Epoch. This works for intervals calculations but it
> doesn't work for since boot reports.
>
> Instead use the /proc/uptime interface to get the elapsed time.

clock_gettime() with a clock id of CLOCK_MONOTONIC or CLOCK_BOOTTIME is
probably a better option than parsing /proc/uptime.

Nathan Lynch

<nathanl@linux.ibm.com>
unread,
Apr 5, 2024, 8:52:28 AMApr 5
to Shrikanth Hegde, tyreld@linux.ibm.com, powerpc-utils-devel@googlegroups.com, sshegde@linux.ibm.com, mpe@ellerman.id.au, naveen.n.rao@linux.ibm.com, mahesh@linux.ibm.com
Shrikanth Hegde <ssh...@linux.ibm.com> writes:
> diff --git a/src/lparstat.c b/src/lparstat.c
> index d2fdb3f..612ffd0 100644
> --- a/src/lparstat.c
> +++ b/src/lparstat.c
> @@ -460,7 +460,8 @@ void get_cpu_app(struct sysentry *unused_se, char *buf)
> se = get_sysentry("pool_idle_time");
> new_app = strtoll(se->value, NULL, 0);
> if (se->old_value[0] == '\0') {
> - old_app = 0;
> + se = get_sysentry("boot_pool_idle_time");
> + old_app = strtoll(se->value, NULL, 0);

What happens if the kernel doesn't provide the boot_pool_idle_time
entry? Does get_sysentry() return a reasonable/useful value?

Shrikanth Hegde

<sshegde@linux.ibm.com>
unread,
Apr 5, 2024, 10:25:59 AMApr 5
to Nathan Lynch, tyreld@linux.ibm.com, powerpc-utils-devel@googlegroups.com, mpe@ellerman.id.au, naveen.n.rao@linux.ibm.com, mahesh@linux.ibm.com
In that case se->value returns 0. This is because get_sysentry find the entry
because the its is statically defined and found based on the name.
the name and descr has been added in lparstat.h. So it wont crash and returns 0.

Currently this happens for pool_idle_time. this is exported only if it is shared
LPAR and the file is not present in Dedicated mode. But it is accessed always.

# cat /proc/powerpc/lparcfg | grep pool_idle_time
#

se = get_sysentry("pool_idle_time");
new_app = strtoll(se->value, NULL, 0);
+ printf("new_app: %d\n", new_app);


new_app: 0

Shrikanth Hegde

<sshegde@linux.ibm.com>
unread,
Apr 5, 2024, 11:31:39 AMApr 5
to Nathan Lynch, tyreld@linux.ibm.com, powerpc-utils-devel@googlegroups.com, mpe@ellerman.id.au, naveen.n.rao@linux.ibm.com, mahesh@linux.ibm.com


On 4/5/24 6:10 PM, Nathan Lynch wrote:
> Shrikanth Hegde <ssh...@linux.ibm.com> writes:


Thank you Nathan for reviewing this.
Yes. That makes sense. Tried and it works.

while we do this change, we can remove uptime and
related functions as it would not be needed anymore. I will add a patch to
move from uptime to clock_gettime so one could follow the patches easier.

Reply all
Reply to author
Forward
0 new messages