Regarding uncore registers for Intel sapphire Rapid

14 views
Skip to first unread message

techsun

unread,
Apr 10, 2024, 6:54:34 AMApr 10
to likwid-users
Hi,
I wanted to know about Intel PMON uncore discovery. I have checked that CBOX registers mentioned in LIKWID have prefix named "FAKE". I have some question related to this:
1. are all register address for sapphire rapid correct which is mentioned in this file: https://github.com/RRZE-HPC/likwid/blob/master/src/includes/registers.h ??
2. There are two files and name of those files are intel_perfmon_uncore_discovery.hintel_perfmon_uncore_discovery.c. I am not able to figure out how to discover unknow cbox registers by using these files. 
3. CBOX registers for core 0 to 39 is not available MSR_UNC_SPR_C0_PMON* for sapphire rapid in register.h. 
4. I have seen another file located at https://github.com/RRZE-HPC/likwid/blob/master/src/includes/perfmon_sapphirerapids_counters.h, it contains cbox register for core 0 in line number 47 to 57 in this file. Can you please let me know why these registers are commented and why registers for core1 to 56 are not available.?

Can anyone help me how to discover address value of cbox registers on sapphire rapid?

Thanks

Thomas Gruber

unread,
Apr 10, 2024, 8:08:40 AMApr 10
to likwid-users
Hi,

On SPR systems, the register addresses/offsets in the uncore and how to access them is not documented anymore. Instead, you get this information from the uncore discovery mechanism (files in 2). This means, all registers with "SPR" in them in registers.h are remainings from my efforts to add SPR and are not valid and should be deleted (1 and 3).

4. The internal logic of LIKWID requires to have a counter list covering all possible counters for an architecture. The list is then filtered at runtime after register checks. In the case of SPR, this counter list is unknown at startup, therefore I added a list with "fake" registers (with offsets far outside of common address space). It probably contains also more units than provided by SPR but to be sure, I added some more, they are filtered out at runtime. When LIKWID wants to access one of those fake registers, there is a lookup step added (access_x86_translate.c/.h and accessDaemon) to resolve the fake offset to the real offset.

Before you ask why I do not generate this list based on the uncore discovery: You need higher permissions to run the discovery mechanism. In case of ACCESMODE=direct, this is not a problem because LIKWID has to be executed as root. But in the case of ACCESSMODE=accessdaemon, only the daemon can run the discovery and I would need to transfer back all infos to generate the list on the library side. With the current method, no special communication is required. The library sends the fake register offset, the accessdaemon resolves the real offset, performs the access and returns the result back to the library.

I hope my explanation is clear to understand. I got quite some headache from integrating the discovery mechanism for SPR.

Best,
Thomas

techsun

unread,
May 21, 2024, 3:19:45 AMMay 21
to likwid-users
Hey, 
Thank you so much for your detailed explanation. I will try this again.

Reply all
Reply to author
Forward
0 new messages