AMD Zen3 energy measurement, ctd.

23 views
Skip to first unread message

ewbe...@lbl.gov

unread,
Oct 19, 2024, 10:29:02 AM10/19/24
to likwid-users

Greetings,

I'm having some issues obtaining energy counter data on the AMD Zen3 platform. I see there is a thread on this subject dated Oct 2023 already which provides some useful information but doesn't seem to finish the story.

Starting at the end of that thread, on my system, the following command works and produces what appears to be meaningful data: 

% perf stat -e power/event=0x02/ ./a.out 

 Performance counter stats for 'system wide':

   127,301,255,168      power/event=0x02/                                          

But when I run with LIKWID (5.2.2), the resulting counter data appears to be 0:

% likwid-perfctr -m -g RAPL_PKG_ENERGY:PWR1 -C N:0-0 .a.out

--------------------------------------------------------------------------------
CPU name: AMD EPYC 7763 64-Core Processor                
CPU type: AMD K19 (Zen3) architecture
CPU clock: 2.45 GHz
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Region foo, Group 1: Custom
+-------------------+------------+
|    Region Info    | HWThread 0 |
+-------------------+------------+
| RDTSC Runtime [s] |   0.042600 |
|     call count    |          1 |
+-------------------+------------+

+---------------------+---------+--------------+
|        Event        | Counter |  HWThread 0  |
+---------------------+---------+--------------+
| Runtime (RDTSC) [s] |   TSC   | 4.259985e-02 |
|   RAPL_PKG_ENERGY   |   PWR1  |            0 |
+---------------------+---------+--------------+

Additional info:

On this system,

cat /proc/sys/kernel/perf_event_paranoid: 0
cat /sys/devices/power/events/energy-pkg.scale: 2.3283064365386962890625e-10
/sys/devices/power/events/energy-pkg.unit: Joules

From that earlier email thread, I see the comment about "So this means, the configuration of this event works (eg  RAPL_PKG_ENERGY:PWR1). Why it does not count is out-of-scope of LIKWID. In perf_event mode, it relies on perf_event providing reasonable data". 

I'm hoping there may now be more information/insight than there was a year ago.

The system in use is perlmutter at NERSC which means I have very limited options for doing an install of 5.3.0 or making other system tweaks.

Thanks,
wes

Thomas Gruber

unread,
Oct 24, 2024, 12:09:47 PM10/24/24
to likwid-users
Hi Wes,

thanks for bringing this back to my attention. The reason is quite simple. RAPL_PKG_ENERGY has to use PWR0. This is a limitation of the current implementation. I attached a patch (branch amd_zen3_perf_power).

For usage on Perlmutter with limited privileges but /proc/sys/kernel/perf_event_paranoid == 0:
$ git clone -b v.5.3 https://github.com/RRZE-HPC/likwid.git likwid-5.3.0-patched
$ cd likwid-5.3.0-patched
$ patch -p1 < likwid_5.3.0_zen3_perf_power.patch
$ make ACCESSMODE=perf_event
$ make ACCESSMODE=perf_event local  # outputs an export LD_LIBRARY_PATH=... line, execute it
$ export LD_LIBRARY_PATH=...
$ ./likwid-perfctr -g RAPL_PKG_ENERGY:PWR0 ...

Note: You can also use the branch directly. This installation method works also with the master branch and the patch branch.

The only problematic thing with this installation method is that it tries to load the performance groups from $INSTALL_PREFIX/share/likwid/perfgroup. You can get the installation path of the current module installation, change the PREFIX in config.mk to it and the message is gone BUT when you use the ENERGY group, it will take the old one (with RAPL_PKG_ENERGY:PWR1). An option would be to copy the new ENERGY group (likwid-5.3.0-patched/groups/zen3/ENERGY.txt) to $HOME/.likwid/groups/zen3/NEWENERGY.txt and use ./likwid-perfctr -g NEWENERGY ... for measurements.


Best,
Tom
likwid_5.3.0_zen3_perf_power.patch
Reply all
Reply to author
Forward
0 new messages