- Use binary rewriting to generate a detailed trace of execution, and then run that through a cache simulator (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.4016&rep=rep1&type=pdf)
While perhaps doable it sounds like a heavyweight approach. You might get away with doing it once for some really important lock in the RTS but it's not something I'd like to do on a daily basis.
- Use GHC events to count operations on particular IORefs. Then put that trace through our own model that reports whether the IORef is being used acceptably, or is "hot".
The trick with that third option is generating the model.
I'd like to see us output information about blocking on MVars/IORefs etc from the RTS so we can lock for hot locks in threadscope. I've seen such lock contention analysis systems in C++ before and I'd love to have them in Haskell. The basic version would be to tell the user (of threadscope) that many threads are blocked on MVar 123, where 123 is some opaque ID. That will not help them find the MVar in the source but at least they'd know that contention is a problem. The rolls-royce version would be to map this ID back to a source location.
Try it. Let us know.
BTW, this is probably a good moment to advertise that in ghc-7.4 will
come with a new and improved traceEvent. Instead of just being exported
traceEvent :: String -> IO () traceEvent msg = do withCString msg $ \(Ptr p) -> IO $ \s -> case traceEvent# p s of s' -> (# s', () #)
Very nice. Can we take a moment to confirm a few things about the performance of traceEvent? In the above example I'm sending it the same string repeatedly. But based on the definition of traceEvent:
Cost-centre stack profiling doesn’t currently work with multiple processors (+RTS –N2 and greater). I’m going to look into this as part of the profiling overhaul I’m currently working on.
Cheers,
Simon
From: parallel...@googlegroups.com [mailto:parallel...@googlegroups.com]
On Behalf Of Johan Tibell
Sent: 28 October 2011 16:26
To: parallel...@googlegroups.com
Cc: mona...@googlegroups.com
Subject: Re: Estimating contention on an IORef hammered with atomicModifyIORef
On Fri, Oct 28, 2011 at 6:15 AM, Ryan Newton <rrne...@gmail.com> wrote:
Is your reluctance because of the 20-100X+ runtime overhead? (Disclosure: the group I was in at Intel develops the "Pin" binary instrumentation tool which is the basis for the heavyweight (rewriting based) performance/parallelism analysis tools provided by Intel. But valgrind/cachegrind does similar stuff in OSS.)
I was more thinking of manual work on the part of the programmer. If there are tools that do it automatically that's much better.
- Use GHC events to count operations on particular IORefs. Then put that trace through our own model that reports whether the IORef is being used acceptably, or is "hot".
The trick with that third option is generating the model.
I'd like to see us output information about blocking on MVars/IORefs etc from the RTS so we can lock for hot locks in threadscope. I've seen such lock contention analysis systems in C++ before and I'd love to have them in Haskell. The basic version would be to tell the user (of threadscope) that many threads are blocked on MVar 123, where 123 is some opaque ID. That will not help them find the MVar in the source but at least they'd know that contention is a problem. The rolls-royce version would be to map this ID back to a source location.
Is the idea that by logging only threads blocking than all IORef accesses it wouldn't be so terrible in the average case to do this for all MVars/IORefs? (Rather than just specific ones that are instrumented by the programmer.)
I couldn't think of a good use case for logging all accesses, that's all. At work we automatically track contention on all locks (in C++) and that seems to work fine.
As for the MVar 123 problem -- within individual programs I tend to use StableNames for figuring out when one MVar is the same as another during debugging. But I know of no good trick to create a stable identity for an MVar between multiple executions of the same (deterministic) program. Correlating back to source location would help, but of course many MVars could be coined from the same static newEmptyMVar occurrence. If the "fork" mechanism could be hijacked it would perhaps be possible to give an MVar a deterministic identity based on a counter and its position in the fork tree.
What would be the best path forward for tracking source locations? I take it that the simplest way is to use template haskell and replace "newEmptyMVar" with something like $(newEmptyMVar) which would grab the source location and generate trace events tagged with that location:
Here's an approach:
For blocking operations (e.g. takeMVar) record the amount of time spent waiting for a lock together with the stack trace* of the blocked thread. Saving the stack trace is almost as good as tracking the source location of the lock and can sometimes be more useful. To make this less expensive we can sample these events at a frequency of say 1%. Using this information we can show contention profile like so:
total time total time (%) cost center
0.005 50% foo
0.003 30% bar
and even make hierarchical charts like we currently do for CPU profiling.
* We can either try to use the current profiling cost stacks to get a stack trace or we could just get the innermost function using the same trick we use to get assert to output the current file and line number.
-- Johan