Memory leak in binary serialisation

71 views
Skip to first unread message

AlanKim Zimmerman

unread,
Apr 11, 2013, 3:33:35 PM4/11/13
to parallel...@googlegroups.com
I am scratching my head around a memory leak using Control.Distributed.Process.Platform.ManagedProcess.

The code exhibiting the behaviour can be seen in leak.hs at https://github.com/alanz/hroq/tree/0cd76a99ce18ece2f8ec5706072dda4df341f779

I have tried stripping out the actual write to disk, and looping over the call to the write directly.

It only manifests itself if the handleCall is used.

Hopefully it is something stupid I have overlooked.

It is compiled with GHC 7.4.2 / binary-0.5.1.0

Alan

Tim Watson

unread,
Apr 11, 2013, 6:48:25 PM4/11/13
to alan...@gmail.com, parallel...@googlegroups.com
Hi!

On 11 Apr 2013, at 20:33, AlanKim Zimmerman wrote:

I am scratching my head around a memory leak using Control.Distributed.Process.Platform.ManagedProcess.

The code exhibiting the behaviour can be seen in leak.hs at https://github.com/alanz/hroq/tree/0cd76a99ce18ece2f8ec5706072dda4df341f779

I won't have time to build and run the profiling example until the weekend (at the earliest), but I will try and do so unless someone more familiar with Binary is able to help first.

I have tried stripping out the actual write to disk, and looping over the call to the write directly.

It only manifests itself if the handleCall is used.


I wouldn't be too quick to assume this is specifically to do with handleCall - the other two functions don't invoke any message passing APIs (i.e., they don't end up calling the `send` function in distributed-process) so there's no serialisation going on when you call them directly. You've also /said/ that this has something to do with Binary serialisation, so I'm assuming that if I go through the profiling runs later on, it'll become clear that that's where the leak is occurring.

With the caveat that I'm not an expert when it comes to this stuff, I think you might be able to boil this problem down to a loop that just does this repeatedly:

`createMessage $ (QE (QK "a") (qval $ "bar" ++ (show n)))`

At least, if we're right that this is really a problem with the binary serialisation: As I understand it, bytestrings will be pinned by default, so if that really minimal example (above) still leaks, then I'd guess there's something wrong with one of your binary instances. Maybe. If not we'll have to look elsewhere.

Hopefully it is something stupid I have overlooked.


I hope so too! ;)

One thing you might like to do is enable template haskell support and generate all your binary instances using `$(derive makeBinary ''<Type>)` instead of rolling them by hand. If that immediately fixes the leak, then you can switch them back one by one until you've found the culprit. If it doesn't (fix the leak) then we can take a closer look at ManagedProcess (and Async, which the call API relies on).

As I said, I'm a bit short on time atm so I might not get a chance to look at this until late into the weekend. I will do my best to get around to it though - and I've added a ticket at https://cloud-haskell.atlassian.net/browse/DPP-72 to track its progress.

Cheers!

Tim

Bryan O'Sullivan

unread,
Apr 11, 2013, 6:50:31 PM4/11/13
to Tim Watson, alan...@gmail.com, parallel-haskell

On Thu, Apr 11, 2013 at 3:48 PM, Tim Watson <watson....@gmail.com> wrote:
One thing you might like to do is enable template haskell support and generate all your binary instances using `$(derive makeBinary ''<Type>)` instead of rolling them by hand.

Or you can use the GHC generics support in recent versions of binary, and skip the TH nastiness.

Tim Watson

unread,
Apr 11, 2013, 6:54:45 PM4/11/13
to Bryan O'Sullivan, alan...@gmail.com, parallel-haskell
Oh yes, that'd be much better, providing you've got a recent enough version of binary and the various dependencies don't go bananas about upper bounds. I think d-p-platform should be able to cope with that.

AlanKim Zimmerman

unread,
Apr 12, 2013, 2:16:12 AM4/12/13
to Tim Watson, Bryan O'Sullivan, parallel-haskell
Thanks for the input, I will start with using generics, if that does not work will go TH.

AlanKim Zimmerman

unread,
Apr 12, 2013, 3:30:51 AM4/12/13
to Tim Watson, Bryan O'Sullivan, parallel-haskell
First quick result

1. Derving via Generics is not available for binary-0.5.1.0
2. Reverted to TH, using messy two-stage compile and -osuf, and the result is basically unchanged.
3. Running the main loop as

  let x = map (\n -> createMessage $ (QE (QK "a") (qval $ "bar" ++ (show n))) ) [1..800]

  logm $ "messages=" ++ (show (x)) -- Force evaluation of x

Shows memory being allocated and then reclaimed




Tim Watson

unread,
Apr 12, 2013, 2:54:51 PM4/12/13
to AlanKim Zimmerman, Bryan O'Sullivan, parallel-haskell
This (broken binary instance) was my first suspicion. I wonder if using the combinatory from Control.Applicative instead of liftM/liftM2 would alleviate the leak?

Sent from my iPhone.

AlanKim Zimmerman

unread,
Apr 12, 2013, 3:18:19 PM4/12/13
to Tim Watson, Bryan O'Sullivan, parallel-haskell
It is stripped down to passing through a String, with a serverDefinition of

handleCall ((\s v -> reply () s) :: State -> String -> Process (ProcessReply State ()))

It still leaks memory, which is reported as PINNED

I did a very quick dive into the relevant code of GenProcess, and followed the rabbit hole until I saw ForeignPtr's .

In the process I realised that this is a very complex piece of code.

My limited knowledge of GHC indicates that ForeignPtr's result in PINNED memory?

Alan

Tim Watson

unread,
Apr 12, 2013, 6:19:28 PM4/12/13
to AlanKim Zimmerman, Bryan O'Sullivan, parallel-haskell
Hi Alan,


On 12 April 2013 20:18, AlanKim Zimmerman <alan...@gmail.com> wrote:
It is stripped down to passing through a String, with a serverDefinition of

handleCall ((\s v -> reply () s) :: State -> String -> Process (ProcessReply State ()))
I'm still not sure whether the whole ManagedProcess/handleCall thing is a red herring. Can you please confirm whether the leak still occurs with a simple client/server pair like so:...

server <- spawnLocal $ forever' $ do
  receive [ match (\(s :: String) -> return ()) ]

mapM_ (\n -> (send server ("bar" ++ (show n))) :: Process ()  ) [1..800]


And let us know if it's still present?

It still leaks memory, which is reported as PINNED

If you allocate all those bytestrings then a bunch of pinned memory *is* going to be created, because bytestrings are always pinned. Or have I misunderstood? Although I really expected to see the same thing happen just with `createMessage` TBH, so maybe I've not understood this properly myself.

Also, this is only a leak if that memory can (and will) never be reclaimed. I'm not clear on whether pinned object will never get GC'ed, or simply won't move around during GC and screw things up during FFI calls - I hope/suspect it's only the latter though.
 
I did a very quick dive into the relevant code of GenProcess, and followed the rabbit hole until I saw ForeignPtr's .

In the process I realised that this is a very complex piece of code.

My limited knowledge of GHC indicates that ForeignPtr's result in PINNED memory?
They do, but then I suspect the code you're talking about is in "encodeFingerprint" and "decodeFingerprint" right? We're only talking about one ForeignPtr per type fingerprint, so this (a) shouldn't be growing much - at least, not unbounded - and (b) shouldn't take up all that much space. Also, as mentioned before, bytestrings are always pinned anyway - according to Data.ByteString.Internal, the bytestring is internally constructed using foreign pointers already:

-- Copied from "bytestring:Data.ByteString.Internal":
data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
                     {-# UNPACK #-} !Int                -- offset
                     {-# UNPACK #-} !Int                -- length

I'm not surprised you're seeing a steady increase in pinned objects if you're repeatedly calling the 'send' primitive (which ultimately ManagedProcess#call will do) because each invocation of 'send' will result in fully serialising the message/data to a bytestring and that will result in an ever increasing quantity of pinned memory.... I'm surprised (and somewhat confused) that you're not seeing that growth with `createMessage` though.

I'm also unclear on whether this does represent a leak or not. What happens to pinned objects (i.e., bytestrings) once they become eligible for collection? It's not like "pinned = never collected" after all, is it? Can some proper experts chip in and provide some clarification for Alan and I please!?

Alan - I'm also aware we've probably got lots of memory use optimisations to be done here, like memoizing the encoded value for each unique Fingerprint and using a single (shared) mutable buffer for small message serialisation (which the latest bytestring library apparently supports) - those are all on the TODO list as well.

Cheers,
Tim

AlanKim Zimmerman

unread,
Apr 12, 2013, 6:35:18 PM4/12/13
to Tim Watson, Bryan O'Sullivan, parallel-haskell
I will respond properly in the morning, going to bed now.

But to respond to one question, when running this code (commented out in my leak.hs)

let x = [] `seq` map (\n -> messageToPayload $ createMessage $ ("bar" ++ (show n)) ) [1..800]
let y = [] `seq` map (\m -> payloadToMessage m) x
say $ "messages=" ++ (show (x)) -- Force evaluation of x
say $ "messages=" ++ (show (y)) -- Force evaluation of y

The memory profile shows PINNED memory rising to a peak, then falling symetrically.

So it does get reclaimed. I will run your simple version in the morning, and see what happens.

Alan

Tim Watson

unread,
Apr 12, 2013, 8:14:14 PM4/12/13
to AlanKim Zimmerman, Bryan O'Sullivan, parallel-haskell
Alan...

On 12 Apr 2013, at 23:35, AlanKim Zimmerman wrote:

> I will respond properly in the morning, going to bed now.
>

Sure thing, thanks for keeping in touch!
> let x = [] `seq` map (\n -> messageToPayload $ createMessage $ ("bar" ++ (show n)) ) [1..800]
Ah yes, I keep forgetting that you need to force the bytestring (because it is lazy) ...

> The memory profile shows PINNED memory rising to a peak, then falling symetrically.
>
Great - that's what you'd expect I think
> So it does get reclaimed. I will run your simple version in the morning, and see what happens.
>
Ok cool. You'll probably need to add some synchronisation so that the outer Process waits for the spawnLocal'ed one to complete though. Your code (above) does pretty much the right thing, i.e., what send does: `messageToPayload . createMessage` and so on.

Cheers,
Tim

AlanKim Zimmerman

unread,
Apr 13, 2013, 10:11:06 AM4/13/13
to Tim Watson, Bryan O'Sullivan, parallel-haskell
I managed to do the Process level test, which I had to modify to be


  server <- spawnLocal $ forever' $ do
    -- receiveWait [ match (\(s :: String) -> return ()) ]
    receiveWait [ match (\(s :: String) -> do { say $ "got:" ++ s;return ()}) ]


  mapM_ (\n -> (send server ("bar" ++ (show n))) :: Process ()  ) [1..800]

Without the "say" it did not use any memory at all.

For this case, it shows a rise and then fall in memory, pretty much as expected. I have attached the memory profile report. I assume the tail is due to the sleep at the end of the code.

Running it again with a sleep in the send, as


  server <- spawnLocal $ forever' $ do
    -- receiveWait [ match (\(s :: String) -> return ()) ]
    receiveWait [ match (\(s :: String) -> do { say $ "got:" ++ s;return ()}) ]

  mapM_ (\n -> (do {waitMs 5;((send server ("bar" ++ (show n))) :: Process () ) } )) [1..800]

where
  waitMs x  = liftIO $ threadDelay (1000 * x)

results in a flat memory usage, as the messages can then be consumed as fast as they are produced.

Alan

hroq.ps

AlanKim Zimmerman

unread,
Apr 13, 2013, 11:03:39 AM4/13/13
to Tim Watson, Bryan O'Sullivan, parallel-haskell
Another test: it does not seem to make a difference whether call or cast is used, or if the server is stateful or stateless


AlanKim Zimmerman

unread,
Apr 13, 2013, 2:12:24 PM4/13/13
to Tim Watson, parallel-haskell, Bryan O'Sullivan

Correction: cast does not leak, call does

Tim Watson

unread,
Apr 13, 2013, 2:12:33 PM4/13/13
to AlanKim Zimmerman, Bryan O'Sullivan, parallel-haskell
So as soon as you introduce ManagedProcess,
The leak starts occurring again? And that's true even with all
Your binary instances generated via makeBinary?

Tim Watson

unread,
Apr 13, 2013, 2:14:09 PM4/13/13
to AlanKim Zimmerman, parallel-haskell, Bryan O'Sullivan
Even with the TH based binary instances?

AlanKim Zimmerman

unread,
Apr 13, 2013, 2:22:43 PM4/13/13
to Tim Watson, parallel-haskell, Bryan O'Sullivan

This result is with sending a String only, so no additional Binary instances.

Tim Watson

unread,
Apr 13, 2013, 2:32:06 PM4/13/13
to AlanKim Zimmerman, parallel-haskell, Bryan O'Sullivan
Hmm that is worrying.
Thanks for reporting this btw - i'll run through the code and see if there's something I'm doing in ManagedProcess.Client or AsyncSTM that might be causing this.
Can't promise an eta though, as I've got quite a few other open tickets at the moment, but I'll look at this tomorrow afternoon and see how I get on.

Cheers,
Tim

Tim Watson

unread,
Apr 13, 2013, 2:54:23 PM4/13/13
to AlanKim Zimmerman, parallel-haskell, Bryan O'Sullivan
Alan,

I notice in one of your earlier comments that you're using a version of the platform that includes GenProcess. That's very old now - the current API is called ManagedProcess. Can you try with the latest HEAD revision please? A lot of code has changed since we called it GenProcess.

Cheers,
Tim

On 13 Apr 2013, at 19:22, AlanKim Zimmerman <alan...@gmail.com> wrote:

AlanKim Zimmerman

unread,
Apr 13, 2013, 3:08:33 PM4/13/13
to Tim Watson, parallel-haskell, Bryan O'Sullivan
I installed the version from github, current head is 54c350bd266dacfd01420a025abcd728fc7e3adf from Feb 06

My reference to code is from that.

Alan

Tim Watson

unread,
Apr 14, 2013, 5:40:47 AM4/14/13
to AlanKim Zimmerman, parallel-haskell, Bryan O'Sullivan
On 13 Apr 2013, at 20:08, AlanKim Zimmerman wrote:

> I installed the version from github, current head is 54c350bd266dacfd01420a025abcd728fc7e3adf from Feb 06
>
> My reference to code is from that.
>

Ah yes, I see what you mean - the internal module is still called GenProcess. I'm attempting to install everything with profiling enabled under cabal-dev. Providing I get that done without incident, I'll have a go at reproducing the leak this afternoon.

Cheers,
Tim

Tim Watson

unread,
Apr 14, 2013, 6:13:54 AM4/14/13
to AlanKim Zimmerman, parallel-haskell, Bryan O'Sullivan
Reproduced - profiler report attached. The code I used to exercise this leak is very minimal - see https://github.com/haskell-distributed/distributed-process-platform/blob/misc-fixes/regressions/LeakByteStrings.hs. I'll attempt to track this down and fix.
call-leaks.ps

Tim Watson

unread,
Apr 14, 2013, 7:32:03 AM4/14/13
to AlanKim Zimmerman, parallel-haskell
On 14 Apr 2013, at 11:46, AlanKim Zimmerman wrote:
> Great.
>
> I started diving into the code for callAsync and wait, but then my head nearly exploded :)
>

He he he - yeah it's a nice simple API for the end user, but the insides are a bit complicated. There seems to be a leak here though, and the leak appears to be in the Async code rather than ManagedProcess itself. I'm able to reproduce the leak in pinned byte strings with the following server and client code (which is derived from how ManagedProcess#call works) - taken from https://github.com/haskell-distributed/distributed-process-platform/blob/misc-fixes/regressions/LeakByteStrings.hs:

call :: ProcessId -> String -> Process ()
call pid msg = do
asyncRef <- async $ do
mRef <- monitor pid
self <- getSelfPid
send pid (self, msg)
r <- receiveWait [
match (\() -> return Nothing)
, matchIf
(\(ProcessMonitorNotification ref _ _) -> ref == mRef)
(\(ProcessMonitorNotification _ _ reason) -> return (Just reason))
]
unmonitor mRef
case r of
Nothing -> return ()
Just err -> die $ "ServerExit (" ++ (show err) ++ ")"
asyncResult <- wait asyncRef
case asyncResult of
(AsyncDone ()) -> return ()
_ -> die "unexpected async result"

startServer :: Process ProcessId
startServer = spawnLocal listen
where listen = do
receiveWait [
match (\(pid, _ :: String) -> say "got string" >> send pid ())
, match (\() -> die "terminating")
]
listen

I've attached the profiling report - I suspect that there's something wrong in the underlying Async implementation, which as you've noticed is a bit complicated. A diagnosis will follow.

Cheers,
Tim

async-leaks.ps
call-leaks.ps

Tim Watson

unread,
Apr 14, 2013, 12:42:18 PM4/14/13
to AlanKim Zimmerman, parallel-haskell
Hi Alan,

TL;DR - I'm not at all sure what kind of heap profile we should expect to see here. I can confirm that if you're trying to make a synchronous call (i.e., waiting for a result/answer) then this problem will occur with both the basic 'send' APIs as well as ManagedProcess#call, so any fixes required will need to be made to the base library rather than the platform afaict. I'm also, probably not going to get much further with this without some input from the other devs, but below I've listed some of my findings thus far.

If I copy your first simple message passing example, like so:

doWorkSimple :: Process ()
doWorkSimple = do
server <- spawnLocal $ forever' $ do
receiveWait [ match (\(s :: String) -> do { say $ "got:" ++ s;return ()}) ]

mapM_ (\(n :: Int) -> (send server ("bar" ++ (show n))) :: Process () ) [1..800]
sleep $ seconds 2

Then I get a much shorter tail - I don't know why this is, but the output I see is as attached below (simple.ps). Either way, you can see that the pinned objects tail off along with the rest of the data allocated in main. I'm not sure if that really looks like a leak.

If we put Async into the mix, the same thing occurs, given a simple server and call mechanism defined like so:

callAsync :: ProcessId -> Process ()
callAsync pid = do
asyncRef <- async $ AsyncTask $ do
getSelfPid >>= send pid
expect :: Process String
(AsyncDone _) <- wait asyncRef
return ()

startServer :: Process ProcessId
startServer = spawnLocal listen
where listen = do
receiveWait [
match (\pid -> do
now <- liftIO getCurrentTime
let n = formatTime defaultTimeLocale "[%d-%m-%Y] [%H:%M:%S-%q]" now
send pid n)
, match (\() -> die "terminating")
]
listen

Note in that example I'm not actually sending a string, but returning one. We see a similar memory profile, with the pinned memory tailing off quickly once the server stops. Now if we switch from sending a string to sending our pid and receiving a string, then we get the same profile (see async-leaks.ps below). I *think* the essential point here is the the pinned memory use starts to tail off right after main, in *both* cases. It's just that main is holding out for longer in the Async case - there are a variety of reasons why that might be, but I can get a similar effect without async anyway. If we redefine `call' and our worker like so:

doWorkSimple :: Process ()
doWorkSimple = do
server <- spawnLocal $ forever' $ do
receiveWait [ match (\(pid, _ :: String) -> send pid ()) ]

mapM_ (\(n :: Int) -> (call server (show n)) :: Process () ) [1..800]
sleep $ seconds 4
say "done"
sleep $ seconds 1

call :: ProcessId -> String -> Process ()
call pid s = do
self <- getSelfPid
send pid (self, s)
expect :: Process ()


We get the same kind of profile here, despite none of the Async or ManagedProcess machinery being in the way (see simple-leaks.ps for the version that received a string and simple-leaks2.ps for the version above). I also see the same profile with both your leak.hs code and a simple server defined thus:

startGenServer :: Process ProcessId
startGenServer = do
sid <- spawnLocal $ do
catch (start () (statelessInit Infinity) serverDefinition >> return ())
(\(e :: SomeException) -> say $ "failed with " ++ (show e))
return sid

serverDefinition :: ProcessDefinition ()
serverDefinition =
statelessProcess {
apiHandlers = [
handleCall_ (\(s :: String) -> return s)
, handleCast (\s (_ :: String) -> continue s)
]
}


If you look at call-leaks2.ps and hroq.ps you'll see pretty much the same profile. This is consistent with the approaches above, using either send + expect, or typed channels (e.g., my Async example) - I'm wondering if perhaps this is indicative of the pinned memory being reclaimed after all? As I said earlier, I'm not sure whether I'm reading the profiles correctly, and my attempts to dig into a more complete analysis have been thwarted because my attempts to run a biographical profile (for drag,void) followed by a retainer profile have failed due to some kind of bug - see http://hackage.haskell.org/trac/ghc/ticket/7836 which I filed this afternoon to report that issue.

I'll anyone else is able to provide a bit more insight into what we *ought* to be seeing here, that'd be really helpful. I'll try profiling with a more recent version of GHC when I get a chance, but I'm going to have to park this for now as any spare time I get outside of work I need to focus on finishing the outstanding bugs for the next release. If this is a serious leak, then I'd definitely like to fix it in the 0.5.0 release, but I might need some more help tracking it down.

Cheers,
Tim
simple.ps
hroq.ps
async-leaks.ps
simple-leaks.ps
simple-leaks2.ps
call-leaks.ps
call-leaks2.ps

AlanKim Zimmerman

unread,
Apr 14, 2013, 12:55:22 PM4/14/13
to Tim Watson, parallel-haskell
Hi Tim

Thanks for the investigation.

Another data point: I only started paying attention to the memory situation because I found my app getting progressively slower as more messages were sent. This put me on to profiling, and the memory situation.

So it is not just affecting memory, it is affecting performance too.

Alan


On 14 Apr 2013, at 12:32, Tim Watson wrote:

> On 14 Apr 2013, at 11:46, AlanKim Zimmerman wrote:
>> Great.
>>
>> I started diving into the code for callAsync and wait, but then my head nearly exploded :)
>>
>
> He he he - yeah it's a nice simple API for the end user, but the insides are a bit complicated. There seems to be a leak here though, and the leak appears to be in the Async code rather than ManagedProcess itself. I'm able to reproduce the leak in pinned byte strings with the following server and client code (which is derived from how ManagedProcess#call works) - taken from https://github.com/haskell-distributed/distributed-process-platform/blob/misc-fixes/regressions/LeakByteStrings.hs:
>
> call :: ProcessId -> String -> Process ()
> call pid msg = do

>    asyncRef <- async $ do
>      mRef <- monitor pid
>      self <- getSelfPid

>      send pid (self, msg)
>      r <- receiveWait [
>            match (\() -> return Nothing)
>          , matchIf
>                (\(ProcessMonitorNotification ref _ _)    -> ref == mRef)
>                (\(ProcessMonitorNotification _ _ reason) -> return (Just reason))
>          ]
>      unmonitor mRef
>      case r of
>        Nothing  -> return ()
>        Just err -> die $ "ServerExit (" ++ (show err) ++ ")"
>    asyncResult <- wait asyncRef
>    case asyncResult of
>      (AsyncDone ()) -> return ()
>      _              -> die "unexpected async result"
>
> startServer :: Process ProcessId
> startServer = spawnLocal listen
>  where listen = do
>          receiveWait [
>              match (\(pid, _ :: String) -> say "got string" >> send pid ())
>            , match (\() -> die "terminating")
>            ]
>          listen
>
> I've attached the profiling report - I suspect that there's something wrong in the underlying Async implementation, which as you've noticed is a bit complicated. A diagnosis will follow.
>
> Cheers,
> Tim
>

Tim Watson

unread,
Apr 15, 2013, 3:07:08 PM4/15/13
to AlanKim Zimmerman, parallel-haskell
Right, I had a quick hack around this after work and I think we *might* be getting somewhere now...

Attached are two heap profiles, leaks2.ps and leaks3.ps - they both run a simple (string passing) test against a ManagedProcess (using handleCall) and then against a regular (plain old) Process using Async + send/receive. In the first run, we *shutdown* the server before proceeding with the second test, whilst in the second we do not - it's clear that the allocated (i.e., pinned) memory is associated wither with the process' internal state (viz the process mailbox/message queue) or the "server state" - i.e., the data structures I am using in Async/ManagedProcess.

I will proceed to try and replicate this with just plain-old message passing, no Async and no ManagedProcess, which should tell us if Binary and/or the infrastructure behind the lightweight process/thread is leaking memory, or whether its in my code. I really hope the latter, as it'll hopefully be easier to fix (though possibly harder to isolate), but either way, I'll report back once I've tracked it down.

Cheers,
Tim

leaks2.ps
leaks3.ps

Tim Watson

unread,
Apr 15, 2013, 5:09:57 PM4/15/13
to AlanKim Zimmerman, parallel-haskell
On 15 Apr 2013, at 20:49, Tim Watson wrote:

>> Something I was thinking of checking was to run the client and server in different executables, and profile them separately, to see which side was holding the memory.
>>
>
> That's a good idea, I'll give it a go. I want to run through a few other test profiles first though.

Actually there's no need - I think the following profiles prove that the problem lies solely in Control.Distributed.Process.Platform.Async, which is somehow hanging on to memory as long as the destination (server) is running.

The essentials (for those who're interested) are thus: we have a simple server that takes (sender :: ProcessId, _ :: String) pairs and replies to the sender with (). We interact with the server in one of two ways, either via *plain* message passing, or via the Async library - ManagedProcess/GenProcess uses the latter - like so:

call :: ProcessId -> String -> Process ()
call pid s = do
self <- getSelfPid
send pid (self, s)
expect :: Process ()

callAsync :: ProcessId -> String -> Process ()
callAsync pid s = do
asyncRef <- async $ AsyncTask $ do
self <- getSelfPid
send pid (self, s)
expect :: Process ()
(AsyncDone _) <- wait asyncRef
return ()

Our test harness(es) start the server and issues the calls, sleeps a while and then optionally kills the server process.

server <- spawnLocal $ forever' $ do
receiveWait [ match (\(pid, _ :: String) -> send pid ()) ]

mapM_ (\(n :: Int) -> (doCall useAsync server (show n)) :: Process () ) [1..800]
sleep $ seconds 4
say "done"
maybeKill shouldKill server
sleep $ seconds 1

The versions that use 'call' instead of Async, appear to reclaim memory whether the server is killed or not (see leaks-kill.ps and leaks-nokill.ps, below), whereas the versions that use Async allow GC to reclaim the pinned memory when the server is killed (leaks-kill-async.ps below), but when the server remains running they do not (leaks-nokill-async.ps below). There could be several reasons for this - use of typed channels is common to the Async implementations, as is the spawning of an "insulator process" that monitors the health (and completion) of the async task - in this case, the interaction between client and server - as well as a "worker process" that performs the actual task (viz sending a message in our case).

I'll have to dig into this some more to identify exactly where the leak occurs, but I suspect it's somewhere in Control.Distributed.Process.Async.AsyncChan (or the STM variant, AsyncSTM). I've updated the ticket at https://cloud-haskell.atlassian.net/browse/DPP-72 and will try and get this resolved asap.

Cheers,
Tim

leaks-kill-async.ps
leaks-kill.ps
leaks-nokill-async.ps
leaks-nokill.ps

Tim Watson

unread,
Apr 15, 2013, 6:27:35 PM4/15/13
to AlanKim Zimmerman, parallel-haskell, cloud-haskell-developers
Urgh, and I thought I was getting on top of this - now cc-ing the ch-dev mailing list as I'm /really/ getting a bit out of my depth here. The first attached profile report (server.ps) was produced by running https://github.com/haskell-distributed/distributed-process-platform/blob/misc-fixes/regressions/LeakByteStrings.hs - this uses `handleCall' and does not appear to be leaking memory - afaict anyway. The second was run with this instead:

runProcess node $ doWork True False -- use async, don't kill the server afterwards
runProcess node $ doWork False False -- don't use async, don't kill...

and that produced async.ps, below. That also doesn't look like what we were seeing before.

The thing is, for this test run, I built the distributed-process library from the development branch (1659f26) - could it be Alan, that you're using some other version of distributed-process - ISTR you're using the one from hackage right? Would you mind trying again with the same revision as me?

My working hypothesis is that the change in heap behaviour is due to commit d9fcd8d in distributed-process (optimising local sends to avoid hitting the network), but we'll see. Quite frankly, I'm rather baffled by the changing results. I guess this *could* mean that there was some leak when using the network-transport/socket layer to communicate with processes on the same node. Or it could manifest again with processes on different nodes - now I'm going to *have* to take your advice about profiling calls between different executables. I'm also *completely* bemused why

(a) killing the server process made any difference before (but doesn't now) and
(b) using "bare send" versus Async/handleCall made any difference if the leak was somehow situated below d-p-platform in d-process

I think from here, we need to take some steps:

1. confirm whether or not these profiles (attached) look any better (?)
2. profile two local nodes running in the same executable (is this a leak in the node controller, event loop listener or some such?)
3. profile two nodes running in discrete executables (one client, one server)

I will probably not have time for that for the next couple of days though. Any further feedback, advice or other information in the meanwhile would be greatly appreciated.

Cheers,
Tim

async.ps
server.ps

AlanKim Zimmerman

unread,
Apr 16, 2013, 3:53:38 AM4/16/13
to Tim Watson, parallel-haskell, cloud-haskell-developers
I have been running distributed-process from master on github, (6785222a), and distributed-process-platform 54c350b, which shows the leak.

If I switch to distributed-process 1659f and distributed-process-platform eabd75ef and run my github.com/alanz/hroq 4f1e2f, I get the following, which is what I would hope for. Max memory usage 230k

If I run my original app test against it, it performs as expected. The sawtooth on hroq.app.ps is from dropping buckets from memory to disk as the queue fills.

It will be interesting to see what happens in the actual distributed mode.

Alan
hroq.ps
hroq.app.ps

Tim Watson

unread,
Apr 16, 2013, 5:54:30 AM4/16/13
to AlanKim Zimmerman, parallel-haskell, cloud-haskell-developers
Alan,

On 16 Apr 2013, at 08:53, AlanKim Zimmerman wrote:

I have been running distributed-process from master on github, (6785222a), and distributed-process-platform 54c350b, which shows the leak.

If I switch to distributed-process 1659f and distributed-process-platform eabd75ef and run my github.com/alanz/hroq 4f1e2f, I get the following, which is what I would hope for. Max memory usage 230k


Excellent - I'm *very* relieved to hear it! 

If I run my original app test against it, it performs as expected. The sawtooth on hroq.app.ps is from dropping buckets from memory to disk as the queue fills.


That makes sense, and I'm able to replicate your results with GHC-7.6.2. Are you using the -threaded runtime in your main application btw?

It will be interesting to see what happens in the actual distributed mode.


Yes, I'm going to do some testing around that this week. I do have other (cloud haskell) priority tickets though, not to mention other project maintainer's responsibilities and a day job. I will not release distributed-process-0.5.0 (nor the first hackage release of distributed-process-platform-0.1.0) without investigating this though, which might mean that the release I was planning for March (!) ends up happening late May to early June.

Thank you very much for raising this and participating in the investigation.

Cheers,
Tim

AlanKim Zimmerman

unread,
Apr 16, 2013, 6:08:43 AM4/16/13
to Tim Watson, parallel-haskell
I am running with -threaded, and also glad to see the leak out of the way.

I am currently struggling with why my app slows down the more messages are inserted into it, but I suspect I am doing something silly with list membership tests.

Alan

Tim Watson

unread,
Dec 9, 2013, 4:09:12 PM12/9/13
to parallel...@googlegroups.com, Tim Watson
For those who're still following this thread, we've just had a very similar (and possibly related) bug report of PINNED memory leaks when using the Supervisor module...

See https://cloud-haskell.atlassian.net/browse/DPP-84 for the gory details.

Cheers,
Tim
Reply all
Reply to author
Forward
0 new messages