Profiling using hardware counters on Mac OS X

1,556 views
Skip to first unread message

Alexey Pirogov

unread,
Dec 11, 2016, 9:04:45 AM12/11/16
to mechanical-sympathy
I'm new to profiling Java apps using hardware counters, so maybe I'll ask something stupid.
I agree that it makes sense to profile application on the same hardware and software as in production.
Unfortunately, I have only MacBook Pro to develop and profile my code.
I would like to know if there are any ways to get information from hardware counters and connect it with Java on Mac OS?

In ideal case I would like to connect, let's say number of cache misses with line in Java code. Not sure if this very critical in real life.
Per my understanding it is possible to get such information from VTune and Oracle Solaris Studio.

What I tried to run on Mac OS:
1) VTune and SolarisStudio aren't available;
2) JMH: LinuxPerf*Profiler classes aren't available (because there is no Perf). I wonder if there any plans/possibilities to write similar profilers that use DTrace output;

3) VM on Mac OS (most counters aren't available):
- VirtualBox running Fedora:
[root@localhost ~]# perf stat -da
 
Performance counter stats for 'system wide':


     
11822.538041      cpu-clock (msec)          #    2.000 CPUs utilized          
             
1,006      context-switches          #    0.085 K/sec                  
               
12      cpu-migrations            #    0.001 K/sec                  
             
1,113      page-faults               #    0.094 K/sec                  
   
<not supported>      cycles                                                      
   
<not supported>      instructions                                                
   
<not supported>      branches                                                    
   
<not supported>      branch-misses                                              
   
<not supported>      L1-dcache-loads                                            
   
<not supported>      L1-dcache-load-misses                                      
   
<not supported>      LLC-loads                                                  
   
<not supported>      LLC-load-misses  


- VMware Fusion running Fedora (better than VirtualBox but still not all counters):
[root@localhost ~]# perf stat -da
 
Performance counter stats for 'system wide':


     
16614.742348      cpu-clock (msec)          #    2.001 CPUs utilized          
             
2,537      context-switches          #    0.153 K/sec                  
               
49      cpu-migrations            #    0.003 K/sec                  
               
926      page-faults               #    0.056 K/sec                  
           
693,283      cycles                    #    0.000 GHz                      (66.71%)
                 
0      instructions              #    0.00  insn per cycle           (83.38%)
       
45,732,264      branches                  #    2.753 M/sec                    (83.31%)
           
713,155      branch-misses             #    1.56% of all branches          (83.34%)
                 
0      L1-dcache-loads           #    0.000 K/sec                    (83.37%)
         
2,784,870      L1-dcache-load-misses     #    0.00% of all L1-dcache hits    (83.29%)
   
<not supported>      LLC-loads                                                  
   
<not supported>      LLC-load-misses


4) I tried to boot Linux (Fedora, Centos) from USB stick to run without Mac OS in the middle. But it didn't work. Probably there is an issue with drivers of new MacBook Pro or Mac OS;

So, the only solution I see is to start Linux on EC2 or some bare-metal server. And connect to it with VNC as I would like to have an IDE and profiling tools on the same box. To modify and check results quickly.

If I missed something (maybe something obvious) please let me know.

Sergey Melnikov

unread,
Dec 11, 2016, 11:33:21 AM12/11/16
to mechanical-sympathy
Hi Alexey,

As far as I see, there are few options to try:

1. Try to use BOOTCAMP and install windows on your laptop. Vtune should be available in this environment (I used it on HSW laptops).

2. I'm not sure if dtruss supports PMU (hardware counters)‎, but if so, it may be possible to write a wrapper over dtrace and pass it to JMH via JMH_PERF environment variable. JMH will run your wrapper and parse it's output. The wrapper will run Dtruss and convert it's output to a perf format.

3. Try to find perf/oprofile in macports or homebrew software repos.

‎Anyway, I'm not sure it's a good idea to use PMU in virtualized environment.

BTW, performance analysis on so low level may require "clean" system‎ (pc/mac) without any background "noise" and disabled performance management/updates/... .

--Sergey
Sent from my BlackBerry
From: Alexey Pirogov
Sent: Sunday, 11 December 2016 17:04
To: mechanical-sympathy
Subject: Profiling using hardware counters on Mac OS X

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Roman Leventov

unread,
Dec 11, 2016, 3:10:19 PM12/11/16
to mechanica...@googlegroups.com
"Starting a EC2 instance" is not a solution, as far as I know, because they are virtualized as well.

On 11 Dec 2016 10:33, "Sergey Melnikov" <melnikov...@gmail.com> wrote:
Hi Alexey,

As far as I see, there are few options to try:

1. Try to use BOOTCAMP and install windows on your laptop. Vtune should be available in this environment (I used it on HSW laptops).

2. I'm not sure if dtruss supports PMU (hardware counters)‎, but if so, it may be possible to write a wrapper over dtrace and pass it to JMH via JMH_PERF environment variable. JMH will run your wrapper and parse it's output. The wrapper will run Dtruss and convert it's output to a perf format.

3. Try to find perf/oprofile in macports or homebrew software repos.

‎Anyway, I'm not sure it's a good idea to use PMU in virtualized environment.

BTW, performance analysis on so low level may require "clean" system‎ (pc/mac) without any background "noise" and disabled performance management/updates/... .

--Sergey
Sent from my BlackBerry
From: Alexey Pirogov
Sent: Sunday, 11 December 2016 17:04
To: mechanical-sympathy
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

jake maloney

unread,
Dec 11, 2016, 3:58:00 PM12/11/16
to mechanical-sympathy
I looked briefly into this and the only two things I found that come close to this are instruments (you will need xcode) and sample.

Avi Kivity

unread,
Dec 11, 2016, 4:05:58 PM12/11/16
to mechanica...@googlegroups.com
The pmu is virtualized too, so summer if the functionality is still available.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Sergey Melnikov

unread,
Dec 11, 2016, 4:24:29 PM12/11/16
to Avi Kivity
Yes, PMU is virtualizable, but it's a question to what extend representative results you will get. I mean other guest OS (especially heavy loaded) may interact to your guest OS (virtualized CPU and it's PMU) somehow.

--Sergey
Sent from my BlackBerry
From: Avi Kivity
Sent: Monday, 12 December 2016 00:05
Subject: Re: Profiling using hardware counters on Mac OS X

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Avi Kivity

unread,
Dec 12, 2016, 2:51:53 AM12/12/16
to mechanica...@googlegroups.com

AWS partitions machines rather than shares them, so interaction on the cpu core is limited.  L3 caches are shared, but if you get a large instance (largest = full machine, next largest = full socket) you avoid that too.

Steve Gury

unread,
Dec 12, 2016, 12:51:57 PM12/12/16
to mechanica...@googlegroups.com
If you're ok with disabling macOS kernel security (loading only signed kernel module), you can use Intel Performance Counter Monitor.

--
Reply all
Reply to author
Forward
0 new messages