what is available in intel vtune that cannot be done with perf/pmc today ?

ymo

unread,

Aug 24, 2016, 4:29:02 PM8/24/16

to mechanical-sympathy

For the folks in touch with intel sales are there any special instrumentation that the CPU keeps tracks of and that is only available or known to vtune ? any special tricks you are aware of ?

Rajiv Kurian

unread,

Aug 25, 2016, 9:39:49 AM8/25/16

to mechanical-sympathy

It is possible that all of this information is available in perf or can be derived from hardware counters that perf provides access to but some of the Vtune features that I have found really useful in the past (and haven't easily found in perf) are:

1. Analysis of backend-bound programs. You can check port utilization of your programs. Often your programs might seem to have some ILP but actually end up being bound on a single or couple ports. Rewriting your program to use the other ports can help with that.

2. Looking at stalls as opposed to just cache misses. Vtune lets you look at what portion of cache misses actually cause stalls. In an OoO machine, not all cache misses are necessarily bad. If your program has enough overlap and ILP, cache miss latency can be hidden to some extent. For eg: Request a bunch of independent reads, process them all in independent streams. The first few cache misses might cause stalls getting the OoO machinery rolling, but the others will cause cache misses but no stalls if you can overlap the processing of the data from the earlier reads with the cache misses for the latter reads.

3. SIMD to memory fetch ratio. Often people move from scalar to vector code without changing their memory layouts too much. You usually see classes like Vec3 etc when this happens (AOS layouts). SIMD is of limited use here because you remain memory bound. Changing to SOA layouts often helps. The SIMD-memory fetch ratio needs to be large-ish for SIMD to be useful and this helps you figure it out.

There are a lot of other esoteric Vtune features which I haven't seen in other profilers. As I said before, that might just be due to my unfamiliarity with them as opposed to those features actually lacking.

Nitsan Wakart

unread,

Aug 25, 2016, 10:22:19 AM8/25/16

to mechanica...@googlegroups.com

VTune gives you a nice top level analysis and drill down to many counters. Perf can profile all these counters if you wanna, but you have to say what you want. You can use top-lev to expose the bottlenecks and then use perf to annotate/profile for those counters.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward