linkwid-perfctr: Counter register not supported or PCI device not available

147 views
Skip to first unread message

Martin Ichilevici de Oliveira

unread,
May 7, 2015, 8:13:05 AM5/7/15
to likwid...@googlegroups.com
Hello,

I'm trying to learn how to use Likwid, but I've encoutered some problems with likwid-perfctr and I'm unsure if I'm doing something wrong or if there's indeed a problem in my machine/installation/likwid itself. Let me quickly run across my setup, as I had to make some changes to get it compiled.

I compiled likwid from trunk (rev591). During installation (to my HOMEDIR), the make install script could not chown likwid-accessD and likwid-setFreq to root:root (even with sudo), so I left them as user applications. I couldn't manually change it as well. I'm not sure what the impact is.

My machine is an AMD Interlagos. It has 4 NUMA nodes, but they are not sequential (nodes 0, 2, 4 and 6). This has caused this bug: https://code.google.com/p/likwid/issues/detail?id=134. To bypass it, I commented out the offending line (again, this might be why I can't get likwid working).

Then i ran

$ ./likwid-perfctr -a
    BRANCH      Branch prediction miss rate/ratio
     CACHE      Data cache miss rate/ratio
       CPI      Cycles per instruction
      DATA      Load to store ratio
  FLOPS_DP      Double Precision MFlops/s
  FLOPS_SP      Single Precision MFlops/s
FPU_EXCEPTION   Floating point exceptions
    ICACHE      Instruction cache miss rate/ratio
   L2CACHE      L2 cache miss rate/ratio
        L2      L2 cache bandwidth in MBytes/s
   L3CACHE      L3 cache miss rate/ratio
        L3      L3 cache bandwidth in MBytes/s
     LINKS      Bandwidth on the Hypertransport links
       MEM      Main memory bandwidth in MBytes/s
      NUMA      Read/Write Events between the ccNUMA nodes

$ ./likwid-perfctr -e
This architecture has 10 counters.
Counter tags(name, type<, options>):
PMC0, Core-local general purpose counters, OPCODE|MATCH0|MATCH1|EDGEDETECT|INVERT|COUNT_KERNEL
PMC1, Core-local general purpose counters, OPCODE|MATCH0|MATCH1|EDGEDETECT|INVERT|COUNT_KERNEL
PMC2, Core-local general purpose counters, OPCODE|MATCH0|MATCH1|EDGEDETECT|INVERT|COUNT_KERNEL
PMC3, Core-local general purpose counters, OPCODE|MATCH0|MATCH1|EDGEDETECT|INVERT|COUNT_KERNEL
PMC4, Core-local general purpose counters, OPCODE|MATCH0|MATCH1|EDGEDETECT|INVERT|COUNT_KERNEL
PMC5, Core-local general purpose counters, OPCODE|MATCH0|MATCH1|EDGEDETECT|INVERT|COUNT_KERNEL
UPMC0, Socket-local general/fixed purpose counters
UPMC1, Socket-local general/fixed purpose counters
UPMC2, Socket-local general/fixed purpose counters
UPMC3, Socket-local general/fixed purpose counters
This architecture has 570 events.
Event tags (tag, id, umask, counters<, options>):
(...)

$ ./likwid-perfctr  -C S1:0  -g CACHE ./a.out
CPU name:       AMD Opteron(TM) Processor 6272
CPU type:       AMD Interlagos processor
CPU clock:      2.10 GHz
Counter register PMC0 not supported or PCI device not available
Counter register PMC1 not supported or PCI device not available
Counter register PMC2 not supported or PCI device not available
Counter register PMC3 not supported or PCI device not available
No event in given event string can be configured

This happened for any event group I tried. I also tried running it with sudo but it made no difference.

Any suggestions?

Thank you,
Martin

Thomas Röhl

unread,
May 7, 2015, 9:48:18 AM5/7/15
to likwid...@googlegroups.com
Hi Martin,

In order to access the MSR and PCI registers you need root priviedges. That's why make install tries to chown the two access daemons. I don't know why your try with sudo did not work. I don't know much about sudo. For me sudo make install always did the job.

sudo chown root:root likwid-accessD
sudo chmod 4755 likwid-accessD
same for likwid-setFreq

The other way is to use the direct access mode (in config.mk) but you have to call likwid-perfctr and some other likwid tools with sudo all the time. Depending on your system this also might not work because of other security features like POSIX capabilities.

I updated the issue about the non-sequential nodes.
Thomas

Martin Ichilevici de Oliveira

unread,
May 7, 2015, 10:27:04 AM5/7/15
to likwid...@googlegroups.com


Le jeudi 7 mai 2015 10:48:18 UTC-3, Thomas Röhl a écrit :
Hi Martin,

In order to access the MSR and PCI registers you need root priviedges. That's why make install tries to chown the two access daemons. I don't know why your try with sudo did not work. I don't know much about sudo. For me sudo make install always did the job.

sudo chown root:root likwid-accessD
sudo chmod 4755 likwid-accessD
same for likwid-setFreq

The other way is to use the direct access mode (in config.mk) but you have to call likwid-perfctr and some other likwid tools with sudo all the time. Depending on your system this also might not work because of other security features like POSIX capabilities.


Hi Thomas,

Thank you for the fix on the non-sequential nodes, I think it worked.

I was getting the error because I was trying to instlall likwid on my homedir. Weirdly, I can't chown anything there. When I installed to /usr/local/, it went smoothly.

However, I still get that error. I tried with the daemons and the direct access mode.

Thank you,
Martin

Thomas Röhl

unread,
May 7, 2015, 11:40:57 AM5/7/15
to likwid...@googlegroups.com
HI Martin,

you can try the topology code with likwid-topology if it got the right numbering of NUMA nodes.

The likwid-accessD logs into syslog, maybe there is some helpful information.
Moreover, please supply me some more infos:
your OS and kernel version
ls -la /dev/cpu/0/msr
ls -la /usr/local/sbin/likwid-accessD

And try to run likwid-perfctr -V 3 ... for much more output.

You could additionally try the POSIX capabilities for likwid-accessD.
sudo setcap cap_sys_rawio+ep /usr/local/sbin/likwid-accessD

Greetings,
Thomas




Martin Ichilevici de Oliveira

unread,
May 7, 2015, 5:01:17 PM5/7/15
to likwid...@googlegroups.com
Hi Thomas,



you can try the topology code with likwid-topology if it got the right numbering of NUMA nodes.

Yes! =D
 

The likwid-accessD logs into syslog, maybe there is some helpful information.
Moreover, please supply me some more infos:
your OS and kernel version
ls -la /dev/cpu/0/msr
ls -la /usr/local/sbin/likwid-accessD

And try to run likwid-perfctr -V 3 ... for much more output.

You could additionally try the POSIX capabilities for likwid-accessD.
sudo setcap cap_sys_rawio+ep /usr/local/sbin/likwid-accessD


What are POSIX capabilities? I tried changing it but it made no difference.

I'm running CentOS 6.5 with Linux 3.17.1.

Below is the output I got.

Thanks,
Martin

$ ls -la /dev/cpu/0/msr
crw-rw-rw-. 1 root root 202, 0 Apr 19 17:22 /dev/cpu/0/msr

$ ls -la /usr/local/sbin/likwid-accessD
-rwsrwxr-x. 1 root root 25903 May  7 11:33 /usr/local/sbin/likwid-accessD

 $ likwid-perfctr  -C S1:0  -g CACHE -V 3 ./a.out
--------------------------------------------------------------------------------

CPU name:       AMD Opteron(TM) Processor 6272
CPU type:       AMD Interlagos processor
CPU clock:      2.10 GHz
CPU family:     21
CPU model:      1
CPU stepping:   2
CPU features:   MMX SSE SSE2 RDTSCP MONITOR SSSE3 SSE41 SSE42 AES AVX
DEBUG - [startDaemon:157] Socket pathname is /tmp/likwid-24553
DEBUG - [startDaemon:185] Successfully opened socket /tmp/likwid-24553 to daemon
DEBUG - [perfmon_addEventSet:1149] Currently 1 groups of 2 active
DEBUG - [HPMread:130] READ S[4] C[0] DEV[0] R 0xC0010201
DEBUG - [accessClient_read:251] Got error 'failed to read/write register' from access daemon reading reg 0xC0010201 at CPU 0
DEBUG - [HPMread:141] READ S[4] C[0] DEV[0] R 0xC0010201 = 0x0 ERR[-5]
DEBUG - [getIndexAndType:133] Counter PMC0 not readable on this machine

Counter register PMC0 not supported or PCI device not available
DEBUG - [HPMread:130] READ S[4] C[0] DEV[0] R 0xC0010203
DEBUG - [accessClient_read:251] Got error 'failed to read/write register' from access daemon reading reg 0xC0010203 at CPU 0
DEBUG - [HPMread:141] READ S[4] C[0] DEV[0] R 0xC0010203 = 0x0 ERR[-5]
DEBUG - [getIndexAndType:133] Counter PMC1 not readable on this machine

Counter register PMC1 not supported or PCI device not available
DEBUG - [HPMread:130] READ S[4] C[0] DEV[0] R 0xC0010205
DEBUG - [accessClient_read:251] Got error 'failed to read/write register' from access daemon reading reg 0xC0010205 at CPU 0
DEBUG - [HPMread:141] READ S[4] C[0] DEV[0] R 0xC0010205 = 0x0 ERR[-5]
DEBUG - [getIndexAndType:133] Counter PMC2 not readable on this machine

Counter register PMC2 not supported or PCI device not available
DEBUG - [HPMread:130] READ S[4] C[0] DEV[0] R 0xC0010207
DEBUG - [accessClient_read:251] Got error 'failed to read/write register' from access daemon reading reg 0xC0010207 at CPU 0
DEBUG - [HPMread:141] READ S[4] C[0] DEV[0] R 0xC0010207 = 0x0 ERR[-5]
DEBUG - [getIndexAndType:133] Counter PMC3 not readable on this machine

Counter register PMC3 not supported or PCI device not available

$ tail -f /var/log/messages
May  7 17:59:09 node10 accessD: daemon started
May  7 17:59:09 node10 accessD: daemon accepted client
May  7 17:59:09 node10 accessD: Failed to open device file /dev/msr0.
May  7 17:59:09 node10 accessD: Failed to open device file /dev/msr1.
May  7 17:59:09 node10 accessD: Failed to open device file /dev/msr2.
May  7 17:59:09 node10 accessD: Failed to open device file /dev/msr3.
May  7 17:59:09 node10 accessD: Failed to open device file /dev/msr4.
(...)
May  7 17:59:09 node10 accessD: Failed to open device file /dev/msr62.
May  7 17:59:09 node10 accessD: Failed to open device file /dev/msr63.
May  7 17:59:11 node10 accessD: Failed to read data to register 0xc0010201 on core 0
May  7 17:59:11 node10 accessD: Failed to read data to register 0xc0010203 on core 0
May  7 17:59:11 node10 accessD: Failed to read data to register 0xc0010205 on core 0
May  7 17:59:11 node10 accessD: Failed to read data to register 0xc0010207 on core 0
May  7 17:59:11 node10 accessD: ERROR - [accessDaemon.c:822] zero read
May  7 17:59:11 node10 accessD: daemon dropped client
May  7 17:59:11 node10 accessD: daemon exiting

Thomas Röhl

unread,
May 8, 2015, 7:11:21 AM5/8/15
to likwid...@googlegroups.com
Hi Martin,

I'm glad that the patch worked.

POSIX capabilites allow to give users the right to do some actions that are normally only allowed for root users. Besides the cap_sys_rawio permission flag, there are some others for user management, etc.

I don't know the exact reason but the /dev/cpu/*/msr files don't seem to be read and writable for the access daemon. It checks these files first and only if the open fails, it tries the files /dev/msr*. If this fails too, the message "Failed to open device file /dev/msr*" is send to syslog.
I checked the kernel code and there is not much difference between my 3.11 and your 3.17. Nothing that could change the normal behavior.
Try to set the access daemon back to your normal user and set the capabilities again. Since your msr files are read and writable by anybody, this could also work

Since the POSIX capabilities are sometimes really a pain in the ass, can you please try to set the capabilities directly to the lua interpreter. If the flags are inherited somehow, this could reduce the permissions for the access daemon although it has higher ones. This is unlikely because the +ep has the meaning of "add the permission, also when the calling application does not has it".
sudo setcap cap_sys_rawio+eip /usr/local/bin/likwid-lua
Here you should use +eip so that the flag is inherited to the forked access daemon.

I've written a blog post about the whole security stuff that can make problems with likwid. Maybe some of the tips there may helps you.
http://likwid-tools.blogspot.de/2014/06/likwid-capabilities-system-and-setuid.html

From my point of view, your configuration looks valid to access the msr device files.

Greetings,
Thomas


Martin Ichilevici de Oliveira

unread,
May 14, 2015, 8:31:51 PM5/14/15
to likwid...@googlegroups.com
Hi Thomas

Sorry for taking so long to reply. I'm not sure what you changed, but I just compiled from trunk and it's almost completely working now. All performance groups but LINKS are making measurements. However,

$ likwid-perfctr  -C S1:0  -g LINKS /bin/ls                                                                                                                                                                                                                                                                                  
--------------------------------------------------------------------------------
CPU name:       AMD Opteron(TM) Processor 6272
CPU type:       AMD Interlagos processor
CPU clock:      2.10 GHz
Event UNC_LINK_TRANSMIT_BW_L0_USE not found for current architecture
Event UNC_LINK_TRANSMIT_BW_L1_USE not found for current architecture
Event UNC_LINK_TRANSMIT_BW_L2_USE not found for current architecture
Event UNC_LINK_TRANSMIT_BW_L3_USE not found for current architecture

No event in given event string can be configured

Any idea? LINKS and NUMA are the two performance groups I'm more interested in.

Thomas Röhl

unread,
May 15, 2015, 9:30:27 AM5/15/15
to likwid...@googlegroups.com
Hi Martin,

I updated the event list for Interlagos. The LINK_TRANSMIT events can be further devided by sublinks. But those sublink events are only interesting if your machine is not fully equipped with CPUs.

Greetings,
Thomas
Reply all
Reply to author
Forward
0 new messages