[2.2.x] Proper propagation of ThreadLocals through Play execution

258 views
Skip to first unread message

Bryan Barkley

unread,
Jul 18, 2014, 8:06:59 PM7/18/14
to play-fr...@googlegroups.com
Our company has integrated Play with existing Java infrastructure code that makes heavy use of ThreadLocals to propagate context throughout a request's execution. Unfortunately Play isn't very ThreadLocal friendly, and while we have done a number of hacks to propagate the values across different execution branching points we continue to run into cases that we missed which lead to some serious bugs. We're now looking for a holistic solution at the framework level.

James Roper was kind enough to look into this initially and suggested creating a custom ExecutorServiceConfigurator for configuring the default Akka ForkJoinExecutor and propagating things using a custom ExecutorService. This seems to work for a majority of the execution as long as things stay within the Play/Akka world. We've found one instance where it breaks though; when using WS.* to make remote requests. Any work done within a map/recover/etc loses the context. It looks like to get around this issue we need to modify Play to properly propagate things; WS uses the play.core.Execution.Implicits.internalContext instead of the play.api.libs.concurrent.Execution.Implicits.defaultContext, and the internalContext isn't configurable as far as I can tell. I'm guessing that customizing the internalContext with a ExecutorService that does propagation will work, but if so I wonder why Play didn't do this to copy the Java Http.Context around. I've also prototyped a solution similar to what is done in F.Promise when the Java WS is used which basically seems to grab the current context whenever map/flatMap/recover/etc are invoked. This seems like a messy solution and would need to be implemented on a case by case basis wherever Futures/Promises are being used outside of the defaultContext.

I guess at this point I'm looking for any comments/answers for the following:
  1. Will modifying the internalContext to copy things around work? This seems to be the cleanest approach, but I wonder why this isn't done to handle the Java Context.
  2. Aside from WS are there other areas we need to be aware of that may be executing outside of the defaultContext and calling back in to user code?
  3. What is the proper way to make this configurable? Add methods to Global to build ExecutionContexts/ExecutorServices? Specify FQCNs in config? Perhaps something more fine grained such as a class that defines what it means to capture context, restore context, and clear context? Eventually we'd like to contribute this work back as a pull request.
  
Thanks,
  Bryan

Bryan Barkley

unread,
Jul 30, 2014, 4:27:50 PM7/30/14
to play-fr...@googlegroups.com
As a brief followup to this I’ve found that my initial impression of how the Scala WS code was using the internalContext was (mostly) incorrect. I’ve fallen back to the approach of wrapping the Future returned by WSRequests’s execute() to delegate to an ExecutionContext which wraps the ExecutionContext in onComplete and captures the context when the EC wrapper is created. So my first question is resolved.

James Roper

unread,
Jul 30, 2014, 11:39:18 PM7/30/14
to play-fr...@googlegroups.com


On Saturday, July 19, 2014 10:06:59 AM UTC+10, Bryan Barkley wrote:
Our company has integrated Play with existing Java infrastructure code that makes heavy use of ThreadLocals to propagate context throughout a request's execution. Unfortunately Play isn't very ThreadLocal friendly, and while we have done a number of hacks to propagate the values across different execution branching points we continue to run into cases that we missed which lead to some serious bugs. We're now looking for a holistic solution at the framework level.

James Roper was kind enough to look into this initially and suggested creating a custom ExecutorServiceConfigurator for configuring the default Akka ForkJoinExecutor and propagating things using a custom ExecutorService. This seems to work for a majority of the execution as long as things stay within the Play/Akka world. We've found one instance where it breaks though; when using WS.* to make remote requests. Any work done within a map/recover/etc loses the context. It looks like to get around this issue we need to modify Play to properly propagate things; WS uses the play.core.Execution.Implicits.internalContext instead of the play.api.libs.concurrent.Execution.Implicits.defaultContext, and the internalContext isn't configurable as far as I can tell. I'm guessing that customizing the internalContext with a ExecutorService that does propagation will work, but if so I wonder why Play didn't do this to copy the Java Http.Context around. I've also prototyped a solution similar to what is done in F.Promise when the Java WS is used which basically seems to grab the current context whenever map/flatMap/recover/etc are invoked. This seems like a messy solution and would need to be implemented on a case by case basis wherever Futures/Promises are being used outside of the defaultContext.

I guess at this point I'm looking for any comments/answers for the following:
  1. Will modifying the internalContext to copy things around work? This seems to be the cleanest approach, but I wonder why this isn't done to handle the Java Context.

The internal context is for use by Play to execute Play code.  If it is ever used to execute application code, that's a bug in Play.  Report it to us, and we'll fix it.
 
  2. Aside from WS are there other areas we need to be aware of that may be executing outside of the defaultContext and calling back in to user code?

No, this should never happen.  If Play is executing your code on anything but the default context, or a context that you have explicitly passed, then that's a bug.  Of course, if you use an external library that executes asynchronous code, then that's a different matter, how you manage thread locals then will depend on that particular library.
 
  3. What is the proper way to make this configurable? Add methods to Global to build ExecutionContexts/ExecutorServices? Specify FQCNs in config? Perhaps something more fine grained such as a class that defines what it means to capture context, restore context, and clear context? Eventually we'd like to contribute this work back as a pull request.

There is some interesting discussion happening around this - there is a SIP in the works that may change the way thread local state is transferred in ExecutionContexts.  So at this stage, it's hard to comment.  I think the approach you're taking now of creating a custom ExecutorServiceConfigurator is the way to go - it may make sense to contribute some of this back into Play and document how you've achieved what you've done.

Another thing to note is that we're looking to get rid of Play's global state in Play 3.0.  This will most likely mean getting rid of the default execution context as it currently stands, and instead having it injected into components that need it (including Play components).  Play 3.0 will give you the necessary control you need over how all this is setup, which will probably make it a lot easier for you to provide a custom execution context.  If you're interested in following this work, then I'd suggest joining the dev mailing list, in particular this thread is where a lot of the discussion has been happening:


But there are a few other threads there as well.

Thanks,
  Bryan

Bryan Barkley

unread,
Jul 31, 2014, 12:30:59 AM7/31/14
to play-fr...@googlegroups.com
On Wednesday, July 30, 2014 8:39:18 PM UTC-7, James Roper wrote:

The internal context is for use by Play to execute Play code.  If it is ever used to execute application code, that's a bug in Play.  Report it to us, and we'll fix it.

Is there any situation where it would be used to bridge application code though? For example something executing on the default context calls out to Play code which uses the internal context and then executes additional work back in the default context. The context propagation would break down when it hits the internal context execution.
  
No, this should never happen.  If Play is executing your code on anything but the default context, or a context that you have explicitly passed, then that's a bug.  Of course, if you use an external library that executes asynchronous code, then that's a different matter, how you manage thread locals then will depend on that particular library.

Understood, but Play can execute code on the default context that doesn’t originate from another thread in the default context making TL propagation from one thread to another impossible. This is the problem I’ve hit using WS in Scala. For example:
WS.url(“http://www.google.com”).get.map(..context lost here..) // this is called from a netty thread so propagation using just the ExecutorServiceConfigurator doesn’t work
 
There is some interesting discussion happening around this - there is a SIP in the works that may change the way thread local state is transferred in ExecutionContexts.  So at this stage, it's hard to comment.  I think the approach you're taking now of creating a custom ExecutorServiceConfigurator is the way to go - it may make sense to contribute some of this back into Play and document how you've achieved what you've done.
The approach using the ExecutorServiceDecorator doesn’t fully work as mentioned in the WS example above. At this point we’re having to modify the WS code to achieve correct propagation. Thus far I’ve only tested the Scala side of things and haven’t evaluated any gaps that decorating the executor service has with Java’s async APIs. I’ve also bumped into an issue where dev-time hot reload breaks down with a fairly straightforward ExecutorServiceDecorator implementation; I’m guessing I’ll need to add some classloader propagating code similar to what HttpExecutionContext is doing.
 
Another thing to note is that we're looking to get rid of Play's global state in Play 3.0.  This will most likely mean getting rid of the default execution context as it currently stands, and instead having it injected into components that need it (including Play components).  Play 3.0 will give you the necessary control you need over how all this is setup, which will probably make it a lot easier for you to provide a custom execution context.  If you're interested in following this work, then I'd suggest joining the dev mailing list, in particular this thread is where a lot of the discussion has been happening:


But there are a few other threads there as well.
Thanks for the reply and the pointers on future work.
 

James Roper

unread,
Jul 31, 2014, 2:15:54 AM7/31/14
to play-framework
On Thu, Jul 31, 2014 at 2:30 PM, Bryan Barkley <bbar...@gmail.com> wrote:
On Wednesday, July 30, 2014 8:39:18 PM UTC-7, James Roper wrote:

The internal context is for use by Play to execute Play code.  If it is ever used to execute application code, that's a bug in Play.  Report it to us, and we'll fix it.

Is there any situation where it would be used to bridge application code though? For example something executing on the default context calls out to Play code which uses the internal context and then executes additional work back in the default context. The context propagation would break down when it hits the internal context execution.

It's still a bug.  Anywhere in Play that takes a callback that will be executed asynchronously should either take an execution context to execute it, or use the default context.  There are probably still places in Play where this doesn't happen.  We have been (slowly) improving Play over time to ensure that this is the case.
 
No, this should never happen.  If Play is executing your code on anything but the default context, or a context that you have explicitly passed, then that's a bug.  Of course, if you use an external library that executes asynchronous code, then that's a different matter, how you manage thread locals then will depend on that particular library.

Understood, but Play can execute code on the default context that doesn’t originate from another thread in the default context making TL propagation from one thread to another impossible. This is the problem I’ve hit using WS in Scala. For example:
WS.url(“http://www.google.com”).get.map(..context lost here..) // this is called from a netty thread so propagation using just the ExecutorServiceConfigurator doesn’t work

The WS API uses plain futures - it does not use the default context, every call to a plain future, including the map call you have above there, takes an implicit execution context to execute on.  You need to ensure that the context that you pass there (ie, the implicit execution context that is in scope) is your thread local propagating context.  If the callbacks are executing on the Netty thread, then my guess is that that means you have imported a trampoline execution context.  Never use a trampoline execution context for mapping a future from a WS call, this is dangerous especially if you're doing blocking in the callbacks.  The trampoline execution context won't have any significant performance improvement on your code unless you're deep in iteratee code.

So, when you have this code:

WS.url(“http://www.google.com”).get.map(..context lost here..) 

you need to somewhere before that:

import my.Contexts.myImplicitThreadLocalPropgatingContext

and not:

import play.api.libs.iteratee.Execution.Implicits.trampoline
 
 
There is some interesting discussion happening around this - there is a SIP in the works that may change the way thread local state is transferred in ExecutionContexts.  So at this stage, it's hard to comment.  I think the approach you're taking now of creating a custom ExecutorServiceConfigurator is the way to go - it may make sense to contribute some of this back into Play and document how you've achieved what you've done.
The approach using the ExecutorServiceDecorator doesn’t fully work as mentioned in the WS example above. At this point we’re having to modify the WS code to achieve correct propagation. Thus far I’ve only tested the Scala side of things and haven’t evaluated any gaps that decorating the executor service has with Java’s async APIs. I’ve also bumped into an issue where dev-time hot reload breaks down with a fairly straightforward ExecutorServiceDecorator implementation; I’m guessing I’ll need to add some classloader propagating code similar to what HttpExecutionContext is doing.

Anything that uses plain futures won't use the default context, since plain futures require an execution context to be passed in via the implicit parameter.
 
 
Another thing to note is that we're looking to get rid of Play's global state in Play 3.0.  This will most likely mean getting rid of the default execution context as it currently stands, and instead having it injected into components that need it (including Play components).  Play 3.0 will give you the necessary control you need over how all this is setup, which will probably make it a lot easier for you to provide a custom execution context.  If you're interested in following this work, then I'd suggest joining the dev mailing list, in particular this thread is where a lot of the discussion has been happening:


But there are a few other threads there as well.
Thanks for the reply and the pointers on future work.
 

--
You received this message because you are subscribed to the Google Groups "play-framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to play-framewor...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
James Roper
Software Engineer

Typesafe – Build reactive apps!
Twitter: @jroper

Bryan Barkley

unread,
Jul 31, 2014, 3:17:51 PM7/31/14
to play-fr...@googlegroups.com
On Wednesday, July 30, 2014 11:15:54 PM UTC-7, James Roper wrote:
The WS API uses plain futures - it does not use the default context, every call to a plain future, including the map call you have above there, takes an implicit execution context to execute on.  You need to ensure that the context that you pass there (ie, the implicit execution context that is in scope) is your thread local propagating context.  If the callbacks are executing on the Netty thread, then my guess is that that means you have imported a trampoline execution context.  Never use a trampoline execution context for mapping a future from a WS call, this is dangerous especially if you're doing blocking in the callbacks.  The trampoline execution context won't have any significant performance improvement on your code unless you're deep in iteratee code.

So, when you have this code:

WS.url(“http://www.google.com”).get.map(..context lost here..) 

you need to somewhere before that:

import my.Contexts.myImplicitThreadLocalPropgatingContext

and not:

import play.api.libs.iteratee.Execution.Implicits.trampoline


I’m afraid I’m not explaining the problem well enough. There is no trampoline context being used. Instead of a custom myImplicitThreadLocalPropagatingContext I’m using Play’s default context. Let’s say I have the following in a controller:

import play.api.libs.concurrent.Execution.Implicits._ 

object Application extends Controller {
  def index = Action.async {
    SomeThreadLocal.INSTANCE.set(SomeDataHolder())
    WS.url(“http://www.foo.com”).get.map { result =>
      println(SomeThreadLocal.INSTANCE.get())  // this will be null
      Ok(“sort of”)
    }   
  }
}


I’ve set up an ExecutorService similar to the following through an Akka executor configurator:
class ContextPropagatingExecutorService(delegate: ExecutorService) extends AbstractExecutorService {
  // other methods omitted - most just call the delegate
  override def execute(command: Runnable) = {
    delegate.execute(new Runnable() {
       val snapshotContext = SomeThreadLocal.INSTANCE.get()
       override def run() = {
          SomeThreadLocal.INSTANCE.set(snapshotContext)
          command.run()
       }
    }
  }
}


This works fine as long as control transfers between threads running within the default context. As soon as something comes in from elsewhere though the snapshotContext doesn’t exist on the incoming thread, and so it can’t be propagated along to our thread running in the default context.

This flow works fine:
setValueOnThreadLocal()
Future{
  getValueOnThreadLocal()
  doWork()
  setValueOnThreadLocal()
}.map {
  getValueOnThreadLocal()
}

The WS case doesn’t work because in WSRequest.execute the Promise is being resolved from a Netty thread (called in the AsyncCompletionHandler). So what I see in this case is that the thread where snapshotContext is taken is on a Netty thread (e.g. “New I/O worker #20”) whereas normally it comes from a thread from the default context (e.g. "play-akka.actor.default-dispatcher-5”).

The workaround I’m using is to wrap the Future returned by WSRequest.execute:
class ContextPropagatingFuture[T](wrapped: Future[T]) extends Future[T] {
  // most other methods delegate to wrapped
  override def onComplete[U](func: (Try[T]) => U)(implicit executor: ExecutionContext) = wrapped.onComplete(func)(new WrappingExecutionContext(executor))
}
class WrappingExecutionContext(delegate: ExecutionContext) extends ExecutionContextExecutor {
  val snapshotContext = SomeThreadLocal.INSTANCE.get() // snapshot is taken from the thread making call to map/flatMap/recover/etc
  
  override def execute(runnable: Runnable) {
    delegate.execute(new Runnable() {
      SomeThreadLocal.INSTANCE.set(snapshotContext)
      runnable.run()
    }
  }
}


Ideally we’d like this in a state where custom execution contexts are not needed and the default context does the right thing.

I’ve done a quick search through the rest of the Play code and don’t see any other cases that stand out that would cause problems similar to WS (basically something returning a Future that is resolved by a callback or from another execution context), but am looking for any further feedback from you or others more familiar with the codebase.

James Roper

unread,
Jul 31, 2014, 11:08:38 PM7/31/14
to play-framework
Hi Bryan,

I see what the problem is now.  It comes back to me giving you guys advice to use the ExecutorServiceConfigurator.  I had assumed that it returns a scala.concurrent.ExecutionContext.  It doesn't, it returns java.util.concurrent.ExecutorService.  In order to propogate the thread locals, it needs to be an execution context that implements the prepare method, like this:

val tl = new ThreadLocal[String]
val delegate: ExcutionContext = ...

val ec = new ExecutionContext {
  override def prepare = {
    val threadLocalState = tl.get()
    new ExecutionContext {
      def execute(r: Runnable) = delegate.execute(new Runnable { 
        def run = {
          val old = tl.get();
          tl.set(threadLocalState)
          try {
            r.run()
          } finally {
            tl.set(old)
          }
        } 
      })
      def reportFailure(t: Throwable) = delegate.reportFailure(t)
    }
  }
  def execute(r: Runnable) = delegate.execute(r)
  def reportFailure(t: Throwable) = delegate.reportFailure(t)
}

So you can wrap the Play default execution context in that and it will work, but you can't use that of course to provide the default execution context to Akka.

So, looking through the Akka code, it looks to me like what needs to be configured is a MessageDispatcherConfigurator, if you make that extend the default Akka Dispatcher, I think it will work.  Try this:

import akka.dispatcher._

class DispatcherConfigurator(config: Config, prerequisites: DispatcherPrerequisites)
  extends MessageDispatcherConfigurator(config, prerequisites) {

  private val instance = new ContextPropogatingDispatcher(
    this,
    config.getString("id"),
    config.getInt("throughput"),
    config.getNanosDuration("throughput-deadline-time"),
    configureExecutor(),
    config.getMillisDuration("shutdown-timeout"))

  override def dispatcher(): MessageDispatcher = instance
}

class ContextPropogatingDisptacher(
  _configurator: MessageDispatcherConfigurator,
  val id: String,
  val throughput: Int,
  val throughputDeadlineTime: Duration,
  executorServiceFactoryProvider: ExecutorServiceFactoryProvider,
  val shutdownTimeout: FiniteDuration) extends MessageDispatcher(
    _configurator, id, throughput, throughputDeadlineTime, executorServiceFactoryProvider, shutdownTimeout
  ) { self =>

  override def prepare = {
    val threadLocalState = threadLocal.get()
    new ExecutionContext {
      def execute(r: Runnable) = self.execute(new Runnable { 
        def run = {
          val old = threadLocal.get();
          threadLocal.set(threadLocalState)
          try {
            r.run()
          } finally {
            threadLocal.set(old)
          }
        } 
      })
      def reportFailure(t: Throwable) = self.reportFailure(t)
    }
  }
}

Let me know if that works for you.

--
You received this message because you are subscribed to the Google Groups "play-framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to play-framewor...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bryan Barkley

unread,
Aug 3, 2014, 6:46:10 PM8/3/14
to play-fr...@googlegroups.com
Yes, this looks like it’s working correctly and doesn’t require the WS hack I resorted to earlier. It also works nicely for our needs as some of the context we’re propagating has very specific visibility rules from parent to child tasks (it’s not a shared instance). The ability to access the state during the handoff from one task to another via execute but also fall back to what was captured at the call site during prepare fits the requirements. The only change from the code you outlined I made was to have ContextPropagatingDispatcher extend Dispatcher instead of MessageDispatcher since that’s what the DefaultDispatcher seems to do.

Out of curiosity is there a reason why the Java http context propagation is not implemented this way?

Thank you very much for all the assistance with this!

  - Bryan

James Roper

unread,
Aug 3, 2014, 10:28:03 PM8/3/14
to play-framework
On Mon, Aug 4, 2014 at 8:46 AM, Bryan Barkley <bbar...@gmail.com> wrote:
Yes, this looks like it’s working correctly and doesn’t require the WS hack I resorted to earlier. It also works nicely for our needs as some of the context we’re propagating has very specific visibility rules from parent to child tasks (it’s not a shared instance). The ability to access the state during the handoff from one task to another via execute but also fall back to what was captured at the call site during prepare fits the requirements. The only change from the code you outlined I made was to have ContextPropagatingDispatcher extend Dispatcher instead of MessageDispatcher since that’s what the DefaultDispatcher seems to do.

Yes, that was my mistake.  I've posted a full working solution here now:


Out of curiosity is there a reason why the Java http context propagation is not implemented this way?

Yes and no.  Setting/clearing thread locals every time you send a task to a thread pool is not going to help performance.  So we only really want to do it explicitly when working with Java promises.  Personally I'd also rather get rid of the thread locals in Java, or at least make them optional.  When you don't have to deal with thread locals, code becomes a lot easier to test and understand.
 
--
You received this message because you are subscribed to the Google Groups "play-framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to play-framewor...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages