I attempted collecting branch statistics with the PMU hardware events for PGO/AutoFDO compiled optimization, to no avail. It appears that the PMU is not exposed to the VMs. Is there any way to enable it?
I attempted a few crazy things, like installing different kernels (from 4.19 to 5.10, all pretty recent), and disabling all mitigations in the kernel command line ('mitigations=off', thinking that the Spectre mitigation may interfere with branch stats collection), but they were invariably unhelpful.
This is the standard Debian 10 image, and all support for hardware PMU perf is indeed compiled in.
Indeed, dmesg shows (N1 and N2 instances; CPU model varies)
[ 0.768027] Performance Events: unsupported p6 CPU model 85 no PMU driver, software events only.
and perf reports that it's unable to collect branch prediction statistics
$ sudo sysctl kernel.perf_event_paranoid=-1
kernel.perf_event_paranoid = -1
$ perf record -b -- sleep .5
Error:
cpu-clock: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
$ perf record -b -e branches -- sleep .5
Error:
The branches event is not supported.
Intel VTune agrees:
[ 782.787868] vtsspp: Driver version 1.8.237-613804
[ 782.792812] vtsspp: Kernel version 4.19.0-14-cloud-amd64
[ 782.798264] vtsspp: Detected 6 CPUs
[ 782.801870] vtsspp: CPU family: 0x06, model: 0x55, stepping: 03, HT: yes
[ 782.808698] vtsspp: CPU freq: 2000184KHz, timer freq: 1000000KHz
[ 782.814862] vtsspp: PMU: fixed counters: 0, general counters: 0
[ 782.820961] vtsspp: PMU counters are not detected
[ 782.826488] vtsspp: KPTI is enabled
[ 782.830858] vtsspp: KASLR is detected
[ 782.834670] vtsspp: Use sched tracepoints
[ 782.838837] vtsspp: Failed to initialize driver
and the cpu counters symlink is absent entirely from
/sys/bus/event_source/devices
$ sudo ls -l /sys/bus/event_source/devices
total 0
lrwxrwxrwx 1 root root 0 Feb 22 18:29 breakpoint -> ../../../devices/breakpoint
lrwxrwxrwx 1 root root 0 Feb 22 18:29 kprobe -> ../../../devices/kprobe
lrwxrwxrwx 1 root root 0 Feb 22 18:29 msr -> ../../../devices/msr
lrwxrwxrwx 1 root root 0 Feb 22 18:44 power -> ../../../devices/power
lrwxrwxrwx 1 root root 0 Feb 22 18:29 software -> ../../../devices/software
lrwxrwxrwx 1 root root 0 Feb 22 18:29 tracepoint -> ../../../devices/tracepoint
lrwxrwxrwx 1 root root 0 Feb 22 18:29 uprobe -> ../../../devices/uprobe
Compare my home base
$ sudo ls -l /sys/bus/event_source/devices/cpu/events
total 0
-r--r--r-- 1 root root 4096 Feb 17 22:36 branch-instructions
-r--r--r-- 1 root root 4096 Feb 17 22:36 branch-misses
. . . .
Perf does not have the 'branches' event, which is the essential one for AutoFDO collection:
$ perf list | head -5
alignment-faults [Software event]
bpf-output [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
$ perf list | grep branches
$
Compare my own machine again
$ perf list | head -5
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
bus-cycles [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
I can run the full perf/AutoFDO build at home, but cannot transmogrify it into an automated Jenkins build on a GCE instance. Help!!! Everything points to this part of the PMU hardware not virtualized into the VM. Is there a magic flag on the VM to enable it? Can't believe it could be impossible: AutoFDO was
invented right
here. :)
Thanks,
-kkm