Biographical profiling and void

15 views

Skip to first unread message

Adam Conner-Sax

unread,

May 26, 2015, 4:32:20 PM5/26/15

to haskel...@googlegroups.com

Hello,

I'm a relative Haskell beginner, though perhaps just in the stage where a little knowledge is a dangerous thing.

I've built my first application entirely in Haskell (FWIW, a personal finance monte carlo simulation) and it's been an amazing experience. I come from a C++, then C# background and Haskell is a great relief and joy. Haskell is much more fun and interesting to think about, at least for me.

The main work of the code is a completely parallelizable calculation. There is some serial set-up first (read in configuration XML, set up initial state data structures, calculate as many random seeds as required for the number of paths) then each actual calculation path is entirely independent (each starts with it's own seed and copy of the initial state, etc.).

I have several puzzles but the most confusing is that almost the entire heap is, biographically, void. It also grows more-or-less-linearly as the program runs. The latter fact is, I realize, a sign that each path retains some data. I was able to vastly reduce the heap size and growth by requiring one bit of strictness at the end of each path. Once I understood a little bit about retainer profiling, this was straightforward. But the fact that most of the data is void remains and I'm not sure what that tells me.

The void heap starts below 500k and then sawtooths up to about 2000k by the end of the run, 10s later.

I could ignore it, though once I'm running on 8 cores, I'm down to a productivity of 20% (productivity is around 68% on one core) or so, though the elapsed time productivity is better. Still, I think GC is costly and all that void heap might be driving more GC than I would otherwise need. So I'd like to track it down.

I've been able, by running with different heap profiler flags, to figure out what that void heap is made of and, sort of, what makes it. It's made of small pieces of data representing assets and cash flows, each an existential type wrapping a record with two or three strings and two or three numbers. Much of it seems to be coming from functions which take those pieces of data and update them as the simulation progresses or the functions which traverse the structures holding the assets and cashflows (Data.Map.Strict and []) and apply the update functions to them.

I am not surprised that memory is being retained. I can see how laziness would lead to that and my understanding of where that's good and where it's bad is pretty limited. But that most of the data is void seems suspicious to me.

Are there some obvious things to look for when you see a heap of mostly void memory?

If it's useful: each path of the simulation runs in a monad transformer stack, basically a ResultT w (StateT s (ReaderT r (EitherT String Identity)))). The ResultT is basically a WriterT but a custom built. It uses the WriterT.Stricter semantics. The ResultT gets added to the stack and then removed (run) often on each path as it's used during small parts of the update code to collect, e.g., cashflows that result from the time evolution of an asset.

Anyway, I'm not even sure what details might be helpful.

I'm just looking for general advice for how to hunt in a heap which is full of void.

Any help would be most appreciated!!

Adam

Reply all

Reply to author

Forward

0 new messages