Memory leaks in iteratees

916 views
Skip to first unread message

James Roper

unread,
Apr 5, 2013, 8:08:08 AM4/5/13
to play-fram...@googlegroups.com
Hi all,

Try this:

Enumerator.repeat("hello") |>> Iteratee.ignore[String]

You'll get an OOME pretty soon.  That memory will of course be reclaimed pretty quickly after the OOME.  I've analysed it in a profiler, basically there's a big long chain of flatMap callbacks, each time another element goes into an iteratee/enumerator and that iteratee/enumerator calls Future.flatMap, that chain grows, indefinitely, until the iteratee is eventually done.  For infinite streams, that will be never.  This means an application that uses a global broadcast enumerator for example (eg for a chatroom), like this:

val (enumerator, channel) = Concurrent.broadcast[String]

will eventually always run out of memory.  And we have a user that's encountering this problem:


Any thoughts on how or whether this can be solved?

Cheers,

James

--
James Roper
Software Engineer

Typesafe - The software stack for applications that scale
Twitter: @jroper

Yann Simon

unread,
Apr 5, 2013, 8:28:29 AM4/5/13
to James Roper, play-fram...@googlegroups.com


2013/4/5 James Roper <james...@typesafe.com>

--
You received this message because you are subscribed to the Google Groups "Play framework dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to play-framework-...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Sadek Drobi

unread,
Apr 5, 2013, 9:02:18 AM4/5/13
to James Roper, play-fram...@googlegroups.com
Strange, I am getting a SOE rather and I am on 2.1.1


--

Luis Ángel Vicente Sánchez

unread,
Apr 5, 2013, 9:25:55 AM4/5/13
to Sadek Drobi, James Roper, play-fram...@googlegroups.com
I think that have been fixed in scala 2.10.1 but play 2.1.1 is using 2.10.0. I think that also master is being compiled using 2.10.0.

Regards,

Luis


2013/4/5 Sadek Drobi <s...@zenexity.com>

Julien Richard-Foy

unread,
Apr 5, 2013, 9:28:10 AM4/5/13
to Luis Ángel Vicente Sánchez, Sadek Drobi, James Roper, play-fram...@googlegroups.com
It should be possible to use 2.10.1, just set the following sbt setting to your project:

scalaVersion := "2.10.1"

Luis Ángel Vicente Sánchez

unread,
Apr 5, 2013, 9:42:59 AM4/5/13
to Julien Richard-Foy, Sadek Drobi, James Roper, play-fram...@googlegroups.com
Yes, I know that and I have done that myself on my projects. I wonder if people is aware of that. Maybe that's why James see a OOME and Sadek a SOE :-S


2013/4/5 Julien Richard-Foy <j...@zenexity.com>

Olivier NOUGUIER

unread,
Apr 5, 2013, 11:12:42 AM4/5/13
to Julien Richard-Foy, Luis Ángel Vicente Sánchez, Sadek Drobi, James Roper, play-fram...@googlegroups.com
AFAIU the leak is still there 
Using:
    Scala 2.10.1 and play 2.1.1
    Scala 2.10.1 and play 2.2-SNAPSHOT (compiled with scala 2.10.1).

"Computers are useless. They can only give you answers."
- Pablo Picasso -

Stephane Godbillon

unread,
Apr 5, 2013, 11:16:34 AM4/5/13
to Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, Sadek Drobi, James Roper, play-fram...@googlegroups.com
Hi,

The problem is in Scala Futures (2.10.0 and 2.10.1). Using a lot of flatMap calls leaks. Here is a snippet of code that shows the problem:

import scala.concurrent._

import scala.concurrent.duration._

import ExecutionContext.Implicits.global

import scala.util._


object FutureLeak extends App {

  val step  = 100000

  val upper = 1000000 //1500000000

  

  // code that leaks

  def loop(future: Future[Int]): Future[Int] = {

    future.flatMap { i =>

      if (i % step == 0) println(i)

      if (i < upper)

        loop(Future(i + 1))

      else Future(i)

    }

  }


  // code that does not leak

  def loop2(future: Future[Int]): Future[Int] = {

    val promise = Promise[Int]

    def inloop(future: Future[Int]): Unit = {

      future.onComplete {

        case Success(iif i < upper =>

          if (i % step == 0) println(i)

          inloop(Future(i + 1))

        case Success(i) =>

          println("done with " + i)

          promise.success(i)

      }

    }

    inloop(future)

    promise.future

  }

  Await.result(loop(Future(0)), 100 seconds) // leads to java.lang.OutOfMemoryError: Java heap space with -Xms32m -Xmx32m

  Await.result(loop2(Future(0)), 100 seconds) // runs fine with -Xms32m -Xmx32m

}


I ran into this problem with ReactiveMongo cursors (which use Enumerators - and so Scala Futures).


Hope that it helps,




2013/4/5 Olivier NOUGUIER <olivier....@gmail.com>

Luis Ángel Vicente Sánchez

unread,
Apr 5, 2013, 11:33:31 AM4/5/13
to Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Sadek Drobi, James Roper, play-fram...@googlegroups.com
Did you reported it/created an issue related to this bug? Using flatMap to transform a Future coming from a reactivemongo iteratee into a future coming from a WS is something I was planning to do in an application I'm working on and that would make this problem even worst.


2013/4/5 Stephane Godbillon <stephane....@gmail.com>

Viktor Klang

unread,
Apr 5, 2013, 11:40:38 AM4/5/13
to Stephane Godbillon, Sadek Drobi, play-fram...@googlegroups.com, Olivier NOUGUIER, Luis Ángel Vicente Sánchez, Julien Richard-Foy, James Roper

Your creating a looooong chain of futures that depend on eachother and of course they cant be collected until they start resolving.

What would a solution look like?

Cheers,
V

Sadek Drobi

unread,
Apr 5, 2013, 1:36:19 PM4/5/13
to Viktor Klang, Stephane Godbillon, play-fram...@googlegroups.com, Olivier NOUGUIER, Luis Ángel Vicente Sánchez, Julien Richard-Foy, James Roper
On Fri, Apr 5, 2013 at 5:40 PM, Viktor Klang <viktor...@typesafe.com> wrote:

Your creating a looooong chain of futures that depend on eachother


Actually not, the next future is NOT created until the first is fulfilled.
 

and of course they cant be collected until they start resolving.

They are resolved one at a time.

Viktor Klang

unread,
Apr 5, 2013, 2:21:30 PM4/5/13
to Sadek Drobi, Stephane Godbillon, play-fram...@googlegroups.com, Olivier NOUGUIER, Luis Ángel Vicente Sánchez, Julien Richard-Foy, James Roper
On Fri, Apr 5, 2013 at 7:36 PM, Sadek Drobi <s...@zenexity.com> wrote:



On Fri, Apr 5, 2013 at 5:40 PM, Viktor Klang <viktor...@typesafe.com> wrote:

Your creating a looooong chain of futures that depend on eachother


Actually not, the next future is NOT created until the first is fulfilled.

I stand corrected, clearly I shouldn't be reading code on my smartphone.
 
 

and of course they cant be collected until they start resolving.

They are resolved one at a time.

Has anybody run it through a profiler?

Cheers,



--
Viktor Klang
Director of Engineering - Typesafe
Twitter: @viktorklang

See you at Scala Days 2013 in NYC!
June 10th - June 12th

Viktor Klang

unread,
Apr 5, 2013, 2:49:08 PM4/5/13
to Sadek Drobi, Olivier NOUGUIER, play-fram...@googlegroups.com, Stephane Godbillon, Luis Ángel Vicente Sánchez, Julien Richard-Foy, James Roper

One thought is whether Scala closures play in here since they always seem to have an outer-pointer. Will have a look next week if this is the case.

Rich Dougherty

unread,
Apr 5, 2013, 5:39:56 PM4/5/13
to Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, Sadek Drobi, James Roper, play-fram...@googlegroups.com
On Sat, Apr 6, 2013 at 4:16 AM, Stephane Godbillon <stephane....@gmail.com> wrote:
>   // code that leaks
>   def loop(future: Future[Int]): Future[Int] = {
>     future.flatMap { i =>
>       if (i % step == 0) println(i)
>       if (i < upper)
>         loop(Future(i + 1))
>       else Future(i)
>     }
>   }

>   Await.result(loop(Future(0)), 100 seconds) // leads to java.lang.OutOfMemoryError: Java heap space with -Xms32m -Xmx32m

There are three futures involved in a flatMap.
1. The future that flatMap is called on, the callee-future.
2. The future returned immediately by flatMap, the flatMap-future.
3. The future returned by f when it is eventually called, the f-future.

trait Future[T] {
  def flatMap[S](f: T => Future[S])(implicit executor: ExecutionContext): Future[S]
}

Execution proceeds in the following way.
1. Assume the callee-future object exists but may not be complete.
2. Call flatMap on it with f.
3. This creates a flatMap-future (a Promise internally) that will only complete when the f-future eventually completes.
4. Once the callee-future is complete it calls f which creates an f-future.
5. Once the f-future is complete it completes the flatMap-future.
6. This will in turn kick off anything waiting on completion of the flatMap future.

Notice that the flatMap-future can only complete when its associated f-future is complete. Now what happens when we have a large chain of flatMaps going on, i.e. where f calls flatMap recursively?

The first flatMap-future waits on the first f-future to complete. But the first f-future creates a second flatMap-future which in turn waits on a second f-future. The second f-future creates a third flatMap-future which waits on a fourth f-future…

By now the first flatMap-future is effectively waiting on the fourth f-future to complete. Internally there is a chain of onComplete handlers and flatMap-futures (promises awaiting completion) going back from the latest f-future to the first flatMap-future. This is the cause of the memory leak.

In the example code above you are calling Await.result() on the first flatMap-future so there is no chance of garbage collecting this chain. You hold a chain of promises from the first flatMap-future to the last f-future. But even if you didn't hold a references to the first flatMap-future away, there would still be a memory leak. This is because each f-future's completion handler holds a strong reference back to its associated flatMap-future, whether or not that flatMap-future is actually referenced from anywhere else.

So what is the solution?

We could improve garbage collection in some cases by using weak references. But that's quite expensive and still wouldn't fix the problem in this example program.

We really need a way to avoid creating an unlimited length f-future to flatMap-future chain. Could we somehow propagate the initial flatMap-future's promise through the chain? It seems wasteful to create a chain of promises and onComplete handlers which will all eventually contain the same value.

Here's how a single promise can be used if it is coded explicitly. Maybe there's an automatic way to do the same thing somehow?

def loop(future: Future[Int], p: Promise[Int]) {
  future.foreach { i =>
    if (i % step == 0) println(i)
    if (i < upper)
      loop(Future(i + 1), p)
    else p.success(i)
  }
}

Cheers
Rich

Stephane Godbillon

unread,
Apr 5, 2013, 5:39:26 PM4/5/13
to Viktor Klang, Sadek Drobi, Olivier NOUGUIER, play-fram...@googlegroups.com, Luis Ángel Vicente Sánchez, Julien Richard-Foy, James Roper
I reported the bug here: https://issues.scala-lang.org/browse/SI-7336

Cheers,
2013/4/5 Viktor Klang <viktor...@typesafe.com>

Rich Dougherty

unread,
Apr 5, 2013, 8:07:15 PM4/5/13
to Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, Sadek Drobi, James Roper, play-fram...@googlegroups.com
On Sat, Apr 6, 2013 at 10:39 AM, Rich Dougherty <ri...@rd.gen.nz> wrote:
So what is the solution?

We could improve garbage collection in some cases by using weak references. But that's quite expensive and still wouldn't fix the problem in this example program.

We really need a way to avoid creating an unlimited length f-future to flatMap-future chain. Could we somehow propagate the initial flatMap-future's promise through the chain? It seems wasteful to create a chain of promises and onComplete handlers which will all eventually contain the same value.

Here's how a single promise can be used if it is coded explicitly. Maybe there's an automatic way to do the same thing somehow?

def loop(future: Future[Int], p: Promise[Int]) {
  future.foreach { i =>
    if (i % step == 0) println(i)
    if (i < upper)
      loop(Future(i + 1), p)
    else p.success(i)
  }
}

Actually a solution might be possible, if flatMap returned a special kind of future. If it did, we could detect recursive flatMaps and take action to break the chain. We'd want to expose the completion chain in this future so we could manipulate it. We'd want to use weak refs from completion handlers back to flatMap futures so that we could remove those futures from the chain that are no longer referenced from outside the chain.

Cheers
Rich

Viktor Klang

unread,
Apr 6, 2013, 8:50:10 AM4/6/13
to Rich Dougherty, Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, Sadek Drobi, James Roper, play-fram...@googlegroups.com
Lets just first check what is leaking what, it might be that the closure leaks through the outer pointer and hten moving the function creation out to a companion object might alleviate that as thisFuture wouldn't be captured in the closure attached to thatFuture.

Cheers,
 

Cheers
Rich

--
You received this message because you are subscribed to the Google Groups "Play framework dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to play-framework-...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--

Rich Dougherty

unread,
Apr 6, 2013, 4:16:47 PM4/6/13
to Viktor Klang, Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, Sadek Drobi, James Roper, play-fram...@googlegroups.com
I profiled the memory leak yesterday. Here's the chain in more detail, beginning with the promise that is returned by flatMap.

p$4: Promise$DefaultPromise ← promise n, created by flatMap
  _ref: $colon$colon ← promise n's internal state
    hd: CallbackRunnable ← the list contained in the promise n's internal state
      onComplete: Future$$anonfun$flatMap$1$$anonfun$apply$3 ← runs on completion of promise n, attached by flatMap
        $outer: Future$$anonfun$flatMap$1 ← outer pointer, referencing promise n-1 created by flatMap
          …

There's an outer pointer here, referencing promise n-1. But (as far as I can tell) we need to reference promise n-1 so we can complete it when promise n is complete. Restructuring the closure won't help, because we somehow need to complete promise n-1 on completion of promise n, and we presumably need some sort of reference from pormise n to promise n-1 to achieve that.

More concretely, consider the following snippet.

Future(1).flatMap(a =>  Future(2).flatMap(b => Future(3)))

There are five futures here. Three futures are explicitly created, and two are created when flatMap is called. Let's call these objects f1, f2, f3, fm1 and fm2.

Working backwards, here is the chain that causes the leak.
1. The promise fm2 is created by the second call to flatMap. It can only complete when f3 is complete. So there is a pointer from f3's completion handler to fm2.
2. The promise fm1 is created by the first call to flatMap. It can only complete when fm2 is complete. So there is a pointer from fm2's completion handler back to fm1.
3. This chain remains in place until f3 eventually completes.

Now consider Stephane's snippet.


def loop(future: Future[Int]): Future[Int] = {
  future.flatMap { i =>
    if (i < upper) loop(Future(i + 1)) else Future(i)
  }
}

Let's call the futures returned by each invocation of loop fm1, fm2, fm3, etc.

So, upon calling loop, fm1 is returned immediately. Eventually loop will be called again and fm2 will be returned. The eventual value of fm1 depends on that of fm2, so a completion handler is attached to fm2 pointing back to fm1. This is the first link of the chain, a reference from fm2 back to fm1, via fm2's completion handler.

Each call to loop adds another link to the chain, from fm(n) back to fm(n-1). The chain remains in place until the future produced by the final iteration of loop completes. Make n big enough and an OOME is the result. :(

Viktor Klang

unread,
Apr 6, 2013, 4:33:38 PM4/6/13
to Rich Dougherty, Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, Sadek Drobi, James Roper, play-fram...@googlegroups.com
On Sat, Apr 6, 2013 at 10:16 PM, Rich Dougherty <ri...@rd.gen.nz> wrote:
I profiled the memory leak yesterday. Here's the chain in more detail, beginning with the promise that is returned by flatMap.

p$4: Promise$DefaultPromise ← promise n, created by flatMap
  _ref: $colon$colon ← promise n's internal state
    hd: CallbackRunnable ← the list contained in the promise n's internal state
      onComplete: Future$$anonfun$flatMap$1$$anonfun$apply$3 ← runs on completion of promise n, attached by flatMap
        $outer: Future$$anonfun$flatMap$1 ← outer pointer, referencing promise n-1 created by flatMap
          …

There's an outer pointer here, referencing promise n-1. But (as far as I can tell) we need to reference promise n-1 so we can complete it when promise n is complete. Restructuring the closure won't help, because we somehow need to complete promise n-1 on completion of promise n, and we presumably need some sort of reference from pormise n to promise n-1 to achieve that.

More concretely, consider the following snippet.

Future(1).flatMap(a =>  Future(2).flatMap(b => Future(3)))

There are five futures here. Three futures are explicitly created, and two are created when flatMap is called. Let's call these objects f1, f2, f3, fm1 and fm2.

Working backwards, here is the chain that causes the leak.
1. The promise fm2 is created by the second call to flatMap. It can only complete when f3 is complete. So there is a pointer from f3's completion handler to fm2.
2. The promise fm1 is created by the first call to flatMap. It can only complete when fm2 is complete. So there is a pointer from fm2's completion handler back to fm1.
3. This chain remains in place until f3 eventually completes.

Now consider Stephane's snippet.


def loop(future: Future[Int]): Future[Int] = {
  future.flatMap { i =>
    if (i < upper) loop(Future(i + 1)) else Future(i)
  }
}

Let's call the futures returned by each invocation of loop fm1, fm2, fm3, etc.

So, upon calling loop, fm1 is returned immediately. Eventually loop will be called again and fm2 will be returned. The eventual value of fm1 depends on that of fm2, so a completion handler is attached to fm2 pointing back to fm1. This is the first link of the chain, a reference from fm2 back to fm1, via fm2's completion handler.

Each call to loop adds another link to the chain, from fm(n) back to fm(n-1). The chain remains in place until the future produced by the final iteration of loop completes. Make n big enough and an OOME is the result. :(

Thanks for the hard work Rich!

Alright, so if I interpret this correctly: We're being killed by the chain of Promises created by the loop.
The question is if this is fixable without breaking any semantics currently provided, and what such a fix would look like.

Ideas?

Cheers,
 

On Sun, Apr 7, 2013 at 1:50 AM, Viktor Klang <viktor...@typesafe.com> wrote:
Lets just first check what is leaking what, it might be that the closure leaks through the outer pointer and hten moving the function creation out to a companion object might alleviate that as thisFuture wouldn't be captured in the closure attached to thatFuture.

Will Sargent

unread,
Apr 6, 2013, 6:11:17 PM4/6/13
to Viktor Klang, Rich Dougherty, Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, Sadek Drobi, James Roper, play-fram...@googlegroups.com
> Thanks for the hard work Rich!
>
> Alright, so if I interpret this correctly: We're being killed by the chain
> of Promises created by the loop.
> The question is if this is fixable without breaking any semantics currently
> provided, and what such a fix would look like.

It's interesting this is coming up now, as I just read a blog post a
couple days ago about the memory problem with Promises:

http://sealedabstract.com/code/broken-promises/

Will.

Rich Dougherty

unread,
Apr 6, 2013, 7:46:35 PM4/6/13
to Viktor Klang, Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, Sadek Drobi, James Roper, play-fram...@googlegroups.com
On Sun, Apr 7, 2013 at 8:33 AM, Viktor Klang <viktor...@typesafe.com> wrote:
On Sat, Apr 6, 2013 at 10:16 PM, Rich Dougherty <ri...@rd.gen.nz> wrote:
Each call to loop adds another link to the chain, from fm(n) back to fm(n-1). The chain remains in place until the future produced by the final iteration of loop completes. Make n big enough and an OOME is the result. :(

Thanks for the hard work Rich!

Alright, so if I interpret this correctly: We're being killed by the chain of Promises created by the loop.
The question is if this is fixable without breaking any semantics currently provided, and what such a fix would look like.

Ideas?

So here's a formulation of the problem.

1. Each call to flatMap immediately returns a future. For the nth call to flatMap, we can call this future fmN.
2. The function passed to flatMap is eventually called, and will eventually result in a future. For the nth call to flatMap, we can call this future fN.
3. When fN completes, fmN must complete with the same value. We can say that fmN is bound to fN.
4. In the case of recursive calls to flatMap then fN is actually fmN+1. This means that fmN is bound to fmN+1.
5. For large values of N, we have fm0 bound to fm1, fm1 bound to fm2, … , fmN-1 bound to fmN.
6. In the example provided by Stephane, only fm0 is of interest. However, with our current implementation, all of fm1 through to fmN remain in memory in a binding chain until fmN is complete. If N is large then memory is exhausted before fmN can complete, resulting in an OOME.

First, a few workarounds. They require changes to user code but they trivially allow us to avoid building a binding chain entirely.

Users can write their loop in a pseudo continuation-passing style, where they explicitly pass in a promise to capture the result of the computation. Only one future for the result (explicitly passed in as k), so no binding chain needed.

  def loop(f: Future[Int]): Future[Int]
becomes
  def loop(f: Future[Int], result: Promise[Int]): Unit

Alternatively they can make their recursion explicit. In flatMapRec the user function can recurse by returning FlatMap. The flatMapping will be executed inside flatMapRec, so we can optimise its execution to avoid creating a binding chain.

  Future(1).flatMap(a => Future(2).flatMap(b => Future(3)))
becomes
  Future(1).flatMapRec(a => FlatMap(Future(2), b => Done(Future(3))))

(It might be possible (but is probably not worth the effort) for the compiler to translate some flatMaps to flatMapRecs automatically, similarly to how the compiler can perform limited tail-call optimisation.)

But we need to get more creative if we want to make things "just work" for the user.

I'd start by refining the problem a bit.
1. When flatMap is called it creates two futures fmN and fN and binds them together. If we could use a single future then we'd avoid the binding chain entirely and couldn't leak memory. However, I don't think we can avoid creating separate futures, because fmN needs to be returned immediately, whereas fN is only available after the function passed to flatMap is executed. Therefore we will always need to bind two separate futures together for each call to flatMap.
2. User code may hold strong references to fmN, fN or both fmN and fN. We can't know in advance. Therefore we need to bind them together.
3. Also we cannot know if fN will be fmN+1 in advance, i.e. we don't know if a call is a recursive flatMap call in advance. The future fN may be produced by flatMap or in some other way entirely. Therefore we need to handle both cases. (Are there other common cases of recursion that we need to handle?)
4. If user code holds strong references to fm0…fmN then we can't expect to save any memory. But in Stephane's example the only strong reference was to fm0. We have two problems to solve here. One is allowing collection of any fm that isn't referenced. The other is making sure fm0 is still bound to fmN (transitively or otherwise) even if fm1…fmN-1 is removed.
5. In general user code might refer to any subset of fm0…fmN. For example, user code might retain a reference to every 6th fm, for some reason! Our goal is to avoid OOME errors where possible, by allowing garbage collection of unused futures.

So here's a rough idea.

At the moment we use completion handlers to do our bindings. Completion values flow like this.

fm0 ← fm1 ← … ← fmN-1 ← fmN

First, we could changing our binding implementation to use a weak reference (⤎).

fm0 ⤎ fm1 ⤎ … ⤎ fmN-1 ⤎ fmN

But, after garbage collection, the chain would be broken.

fm0    fmN

An alternative would be to propagate the binding information forward and attach it to the final future in the chain.

{fm0 ⤎, fm1 ⤎, … fmN-1 ⤎} ← fmN

The final future would hold a collection of weak references to all other futures bound to it.

Now that the chain has been flattened out, it will not be broken even if intermediate futures are garbage collected. Taking Stephane's example, after garbage collection, we'd still have a working chain.

{fm0 ⤎} ← fmN

The rule for binding is simple.

Consider adding another future to the binding chain. We now want to bind fmN to fmN+1.

Before binding we have the following independent binding chains.

Ø ← fmN+1
{fm0 ⤎} ← fmN

To perform the binding we move fmN's bindings into fmN+1s.

{fm0 ⤎, fmN ⤎} ← fmN+1

That all looks quite nice to me so far.

But a more complicated problem arises when we want to bind another future to a future into the middle of the chain.

One way to do this is for each future to maintain a reference to the current head of the chain. But that probably requires a lot of maintenance as the head changes. Another idea is to maintain a reference to the chain data structure object (the {} above), and then try to keep the chain object the same where possible, minimise needing to update. In the case where the heads of two chains are bound together, we can migrate all futures from the smaller chain to the larger one.

A simple example, binding fmX to fm0:

Ø ← fmX
{fm0 ⤎, fmN ⤎} ← fmN+1

becomes

{fm0 ⤎, fmN ⤎, fmX ⤎} ← fmN+1 [and fmX is updated to point to the new chain object]

A more complex example involving merging two chains.

{fmY ⤎} ← fmX
{fm0 ⤎, fmN ⤎} ← fmN+1

becomes

{fm0 ⤎, fmN ⤎, fmX ⤎, fmY ⤎} ← fmN+1 [both fmY and fmX are updated to point to the new chain object]

So, assuming using weak references and flattening out the binding chain is roughly the right approach, how do we implement it?

One option is to generalise all types of futures to support binding chains. However there is quite a bit of overhead to all of this. Do we want to incur that overhead for every type of future?

A second option is to make a special type of future, perhaps called a BoundPromise, that supports efficient binding, and return that type of future from flatMap. The bind operation for BoundPromises would use an approach like the one described above. Other futures could implement binding by attaching a completion handler.

So a call to flatMap would create fm0, a BoundPromise. It would then call the user-supplied function which would return f0, which may or may not be a BoundPromise. If it was a BoundPromise (as in the case of a recursive call to flatMap) then our new efficient binding chain would kick in to action. However if f0 is another type of future then the existing completion semantics could be used.

This obviously leaves open the possibility of memory leaks when non-flatMapped futures are supplied, however the simple flatMap case should just work without the user being aware of it.

What do you think? (Assuming you made it this far!)

Cheers
Rich

James Roper

unread,
Apr 6, 2013, 10:33:58 PM4/6/13
to Rich Dougherty, Viktor Klang, Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, Sadek Drobi, play-fram...@googlegroups.com
A few considerations here:

1) In practice, we control nearly all the types of futures used by iteratees.  The only type we don't is things like calls returned by Enumerator.generateM, but those futures should be short lived anyway.  When you flatMap an iteratee, we control the future.  This means we can create our own future implementations that do this.  We've already got one of our own, play.core.server.netty.NettyPromise.  We could also create our own Future.successful, and Promise.

2) Again in practice, the only time when this memory leak really matters is for infinite streams.  Since the stream is infinite, the result of the iteratee is irrelevant (and is often Unit anyway).  In these cases, it is not necessary to carry to the result of the futures through the chain, therefore I think our implementations here (eg the enumerators in PlayDefaultUpstreamHandler) can probably be modified to not strictly implement their specs, because they don't need to.

Rich Dougherty

unread,
Apr 7, 2013, 2:03:39 AM4/7/13
to James Roper, Viktor Klang, Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, Sadek Drobi, play-fram...@googlegroups.com
On Sun, Apr 7, 2013 at 2:33 PM, James Roper <james...@typesafe.com> wrote:
A few considerations here:

1) In practice, we control nearly all the types of futures used by iteratees.  The only type we don't is things like calls returned by Enumerator.generateM, but those futures should be short lived anyway.  When you flatMap an iteratee, we control the future.  This means we can create our own future implementations that do this.  We've already got one of our own, play.core.server.netty.NettyPromise.  We could also create our own Future.successful, and Promise.

2) Again in practice, the only time when this memory leak really matters is for infinite streams.  Since the stream is infinite, the result of the iteratee is irrelevant (and is often Unit anyway).  In these cases, it is not necessary to carry to the result of the futures through the chain, therefore I think our implementations here (eg the enumerators in PlayDefaultUpstreamHandler) can probably be modified to not strictly implement their specs, because they don't need to.

In other words… how can we solve the problem right now for Play? Good question!

Are you proposing we make our own future implementation that handles recursive flatMap safely?

The easy route, in my opinion, is to make changes at the iteratee layer. For example, add methods like foldRec, flattenRec, flatMapRec that take a loop function and instead of directly calling dangerous flatMap methods. Then change our recursive code to use those methods. I think we can still have all code strictly implement proper iteratee specs, but, like you say, we could cut corners on spec compliance if it's not observable to users.

Cheers
Rich

Sadek Drobi

unread,
Apr 7, 2013, 4:54:50 AM4/7/13
to James Roper, Rich Dougherty, Viktor Klang, Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, play-fram...@googlegroups.com
On Sun, Apr 7, 2013 at 4:33 AM, James Roper <james...@typesafe.com> wrote:
A few considerations here:

1) In practice, we control nearly all the types of futures used by iteratees.  The only type we don't is things like calls returned by Enumerator.generateM, but those futures should be short lived anyway.  When you flatMap an iteratee, we control the future.  This means we can create our own future implementations that do this.  We've already got one of our own, play.core.server.netty.NettyPromise.  We could also create our own Future.successful, and Promise.

Possible, but that will make it extremely hard to maintain. It is very easy to get this memory leak without knowing.
 

2) Again in practice, the only time when this memory leak really matters is for infinite streams.  Since the stream is infinite, the result of the iteratee is irrelevant (and is often Unit anyway).  In these cases, it is not necessary to carry to the result of the futures through the chain, therefore I think our implementations here (eg the enumerators in PlayDefaultUpstreamHandler) can probably be modified to not strictly implement their specs, because they don't need to.

Not really. consider Enumerator.repeat(...) &> Enumeratee.take(1000)
Here the result is not unit.

Solving the problem only for Play is a good alternative. But it should be clear how to avoid it. Or otherwise we get into a maintenance hell.

Viktor Klang

unread,
Apr 7, 2013, 12:18:01 PM4/7/13
to Sadek Drobi, James Roper, Rich Dougherty, Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, play-fram...@googlegroups.com
On Sun, Apr 7, 2013 at 10:54 AM, Sadek Drobi <s...@zenexity.com> wrote:



On Sun, Apr 7, 2013 at 4:33 AM, James Roper <james...@typesafe.com> wrote:
A few considerations here:

1) In practice, we control nearly all the types of futures used by iteratees.  The only type we don't is things like calls returned by Enumerator.generateM, but those futures should be short lived anyway.  When you flatMap an iteratee, we control the future.  This means we can create our own future implementations that do this.  We've already got one of our own, play.core.server.netty.NettyPromise.  We could also create our own Future.successful, and Promise.

Possible, but that will make it extremely hard to maintain. It is very easy to get this memory leak without knowing.

Totally agree.
 
 

2) Again in practice, the only time when this memory leak really matters is for infinite streams.  Since the stream is infinite, the result of the iteratee is irrelevant (and is often Unit anyway).  In these cases, it is not necessary to carry to the result of the futures through the chain, therefore I think our implementations here (eg the enumerators in PlayDefaultUpstreamHandler) can probably be modified to not strictly implement their specs, because they don't need to.

Not really. consider Enumerator.repeat(...) &> Enumeratee.take(1000)
Here the result is not unit.

Solving the problem only for Play is a good alternative. But it should be clear how to avoid it. Or otherwise we get into a maintenance hell.

Representing infinities using Promise-flatMap-chains might not be the optimal way to encode that. So encoding it with a Promise and a recursive call as suggested before seems like the way to go forward.


Of course promises isn't a silver bullet, but he's missing so many points…

1) If you cannot fit things in memory, you'll have to reduce memory consumption, if that means storing URLs to files instead of the files themselves, I think that's definitely a Good Thing (tm)
2) As for debugging, of course you'll need a good tracing implementation, just like Java without stack traces would be a PITA.
3) I didn't get the part about the locks, just represent your database operations as Promises and let the database API deal with what gets executed when.
4) Of course promises need garbage collection.
5) Of course promises is not a silver bullet.

Cheers,

Sadek Drobi

unread,
Apr 10, 2013, 3:56:40 AM4/10/13
to Viktor Klang, James Roper, Rich Dougherty, Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, play-fram...@googlegroups.com
Here is a sample solving memory leaks in Enumerators and applying a simple buffer strategy which buffers messages if socket is not ready.


I will collect the fixes and push them into a pull request.

Rich Dougherty

unread,
Apr 12, 2013, 12:41:46 PM4/12/13
to Sadek Drobi, Viktor Klang, James Roper, Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, play-fram...@googlegroups.com
Hi Sadek

Here's another approach to the memory leak problem, based on the idea (mentioned in an earlier email) of building futures out of FlatMap objects. We create a mini language describing how futures can be composed together.

e.g.
  Future(1).flatMap(a => Future(2).flatMap(b => Future(3)))
becomes
  FlatMap(CompleteWith(Future(1)), a => FlatMap(CompleteWith(Future(2)), b => CompleteWith(Future(3))))

The motivation for a mini language is that, by restricting how futures are composed, we can also restrict how they are evaluated. In particular we restrict access to the evaluation of FlatMap so that the user cannot access the result of every flatMap operation, only the final one. This means we don't need to make a promise for each evaluation of flatMap, nor do we need to bind our promises together. That means we cannot have a memory leak. (I hope!)

The mini language describing future composition has type FutureC. So long as futures are built out of FutureC objects they cannot leak. That means we can also build a typesafe subset of iteratee/enumerator operations that cannot leak.

I've converted a bit of code in Iteratee and Enumerator to use this approach and it seems to work, e.g. see foldC, flattenC and repeatC. (Please forgive the ugly API for now.)


Cheers
Rich

Rich Dougherty

unread,
Apr 12, 2013, 8:26:01 PM4/12/13
to Sadek Drobi, Viktor Klang, James Roper, Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, play-fram...@googlegroups.com
Here's a tidied up version with a few more enumerators converted.

https://github.com/richdougherty/Play20/commit/396e4256531029352f9fae44908540e38b532eae

The conversion is very easy. Generally you just need to replace Future with FutureC.

To turn allow a Future to participate in a FutureC call FutureC.wrap(f). To begin evaluating a FutureC as a real Future, call FutureC.eval(fc).

Cheers
Rich

James Roper

unread,
Apr 12, 2013, 11:16:18 PM4/12/13
to Rich Dougherty, Sadek Drobi, Viktor Klang, Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, play-fram...@googlegroups.com
Hi Rich,

So this is something along the lines of what I was thinking earlier.  Would it be possible to do this in such a way that FutureC extends Future?

Cheers,

James

Rich Dougherty

unread,
Apr 13, 2013, 4:15:52 AM4/13/13
to James Roper, Sadek Drobi, Viktor Klang, Stephane Godbillon, Olivier NOUGUIER, Julien Richard-Foy, Luis Ángel Vicente Sánchez, play-fram...@googlegroups.com
Hi James

I could be wrong, but my instinct is that FutureC and Future need to be distinct types. A FutureC is a recipe for making Futures, and it needs to not be a future, to avoid the memory leak issue. (I think!)

In the traditional flatMap implementation, user code gets a reference to a promise on every call to flatMap. This promise is bound to flatMap's eventual value. Recursive flatMaps build chains of promises, with each promise bound to the next flatMap's promise. These chains of promises cause our memory leaks.

We really want to avoid creating chains of promises. We can do that by avoiding creating a promise for each flatMap.

Because FutureC is not a Future we do not need to create a promise for each flatMapped FutureC, and we can therefore avoid a chain of bound promises when flatMap is used recursively.

If a FutureC were a Future then, when flatMapped, it would have to contain a promise internally, to capture the flatMap's eventual value. And recursive flatMaps would cause these promises to be bound together… and we'd end up with the same memory leak problem all over again. 

We can therefore avoid creating chains of promises by asking users to flatMap FutureCs rather than flatMapping Futures.

Instead of being a Future, a FutureC is a recipe for making Futures. A FutureC is a tree structure containing instructions about how it is to be evaluated. A FutureC can CompleteWith a future, Map a future using a function or FlatMap a future with a function. A FutureC structure is evaluated using FutureC.eval(), at which point execution starts and a Future is generated to contain the result of the execution.

When FutureC.eval() is called, we only need to create a single promise to capture the eventual value of the FutureC computation. Single promise = no binding chain = no memory leaks.

In conclusion, FutureC and its eval() method allow us to use recursive flatMaps without memory leaks, and this can only happen because FutureC is not a Future. If FutureC was a future then, when flatMapped, we'd need to create promises for each flatMapped FutureC to capture their eventual values. And, when flatMapped recursively, we would end up with memory leaks due to binding chains.

I hope that makes sense. This stuff is pretty complicated. It feels hard to communicate without a whiteboard!

Cheers
Rich
Reply all
Reply to author
Forward
0 new messages