Does SparkEnv need to be thread-local?

587 views
Skip to first unread message

Josh Rosen

unread,
May 13, 2013, 2:38:00 PM5/13/13
to spark-de...@googlegroups.com
Does SparkEnv need to be thread-local?  The current implementation causes thread-safety issues in SparkContext (https://spark-project.atlassian.net/browse/SPARK-534), and I'm not sure that it's properly thread-safe in all circumstances.

I'm not sure that we can absolutely guarantee that the threads calling into different Spark components will have properly-initialized SparkEnv thread-locals.  What if the calling thread is re-used in an Akka thread pool?  What if there's a source of thread creation that we don't know about and don't initialize properly?  What if a thread in a thread pool is recycled and used with two different SparkContexts?

Take TaskSetManager, for example: SparkEnv is assumed to be set properly by whatever thread called TaskSetManager's constructor, but from looking at the code I have no way of knowing whether it will be safe to call SparkEnv.get inside of methods in TaskSetManager.  I can work around this by storing a reference to SparkEnv at the time that TaskSetManager is constructed, but this is brittle and confusing (especially if I don't document my code).

Reynold Xin

unread,
May 13, 2013, 2:41:01 PM5/13/13
to spark-de...@googlegroups.com, Matei Zaharia
Matei, can you comment on this?

There might've been reasons previously for this to be thread local, but I am not sure if those reasons exist anymore. And if they do, I think we can certainly get rid of them to remove the thread local nature of this ...

Also, SparkEnv now serves as a configuration object, and it might be good to actually pass instances of it around for configuration in the future ..


--
You received this message because you are subscribed to the Google Groups "Spark Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-develope...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Evan Chan

unread,
May 14, 2013, 6:35:02 PM5/14/13
to spark-de...@googlegroups.com, Matei Zaharia
Hey guys,

I'm now running into this issue.   One design I played with for sharing a SparkContext amongst multiple jobs was to have one SparkContext created by an actor, but the actor runs each job in a future.    (I understand the current scheduler is FIFO, but there will be a fair scheduler coming right?)   When you attempt to run a job in a Future, which is on a separate thread, and the job calls newApiHadoopRDD(), it will lead to the following stack trace:

  "status": "ERROR: null\n  spark.broadcast.HttpBroadcast.<init>(HttpBroadcast.scala:24)\n  spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcast.scala:54)\n  spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcast.scala:50)\n  spark.broadcast.BroadcastManager.newBroadcast(Broadcast.scala:50)\n  spark.SparkContext.broadcast(SparkContext.scala:445)\n  spark.rdd.NewHadoopRDD.<init>(NewHadoopRDD.scala:33)\n  spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:337)\n  ooyala.rookery.spark.SparkUtils$SparkContextWrapper.rookeryQueryFromConfig(SparkUtils.scala:53)\n  ooyala.rookery.spark.RookeryJob$class.runJob(RookeryJob.scala:12)\n  ooyala.spark.examples.MyExample$.runJob(MyExample.scala:26)\n  ooyala.rookery.jobserver.Supervisor$$anonfun$wrappedReceive$1$$anonfun$1.apply(Supervisor.scala:64)\n  akka.dispatch.Future$$anon$3.liftedTree1$1(Future.scala:195)\n  akka.dispatch.Future$$anon$3

(sorry for formatting, this is on stock 0.7.0)

Running the job in a future is really nice as I can return right away.  

-Evan

Reynold Xin

unread,
May 14, 2013, 8:32:23 PM5/14/13
to spark-de...@googlegroups.com
This is a hack, but to get it to work ...

keep a reference to SparkEnv.get in a static object, and do a SparkEnv.set(MyStaticObject.sparkEnv) in your actor call.

--
Reynold Xin, AMPLab, UC Berkeley

Evan Chan

unread,
May 17, 2013, 7:38:15 PM5/17/13
to spark-de...@googlegroups.com
Thanks Reynold, the hack worked.


--
You received this message because you are subscribed to a topic in the Google Groups "Spark Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-developers/GLx8yunSj0A/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to spark-develope...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
--
Evan Chan
Staff Engineer
e...@ooyala.com  | 


Matei Zaharia

unread,
May 20, 2013, 6:12:13 PM5/20/13
to spark-de...@googlegroups.com
SparkEnv was thread-local mostly because our RDD.compute() object didn't receive it as a parameter. If we make it passed through as part of TaskContext, we should be able to eliminate that requirement. It's something I've been meaning to do eventually, but if someone wants to step in and do it, go ahead.

Matei

Jason Dai

unread,
Jun 1, 2013, 2:10:54 AM6/1/13
to spark-de...@googlegroups.com
Any reason why making SparkEnv a thread local variable instead of a static/singleton variable? We changed SparkEnv to a singleton variable (https://github.com/jason-dai/spark/tree/thread-safe), which ensures the thread-safety of SparkContext. Maybe I should send a pull request?

Thanks,
-Jason

Felipe

unread,
Jun 1, 2013, 2:34:37 AM6/1/13
to spark-de...@googlegroups.com

Mridul Muralidharan

unread,
Jun 1, 2013, 5:59:24 AM6/1/13
to spark-de...@googlegroups.com

If singleton, then you remove ability to use  multiple sparkcontext ....

Regards
Mridul

Jason Dai

unread,
Jun 1, 2013, 8:34:40 AM6/1/13
to spark-de...@googlegroups.com
I don't think you can create multiple spark context in the same driver program (or JVM) today - e.g., each context will create the same actor system while you cannot create it twice.

Thanks,
-Jason

Mridul Muralidharan

unread,
Jun 1, 2013, 9:37:17 AM6/1/13
to spark-de...@googlegroups.com

You can ... If it does not work, it is due to misconfig or bugs : not by design choice.
Refer to 'spark and concurrancy' in user group for example

Regards
Mridul

Jason Dai

unread,
Jun 1, 2013, 9:01:08 PM6/1/13
to spark-de...@googlegroups.com
While running multiple spark context in the same JVM is not explicitly "designed out", I doubt it is actually "designed in". It seems we cannot create the same Actor System twice, for instance (even if we set driver port to 0).

We may be able to make multiple spark context in the same JVM work; I'm not sure if it's worth it though.

Thanks,
-Jason

Matei Zaharia

unread,
Jun 1, 2013, 9:34:14 PM6/1/13
to spark-de...@googlegroups.com
Jason, our goal is actually to support multiple SparkContexts in the same JVM eventually. It's extremely useful for testing. The main problem stopping it right now is that we use system properties for configuration, and those are global; but this will change with a config system.

That said, SparkEnv should ideally be passed to RDD.compute as part of a TaskContext instead of being a thread-local. It will take just some refactoring to do it.

Matei

Mark Hamstra

unread,
Jun 1, 2013, 9:58:44 PM6/1/13
to spark-de...@googlegroups.com
It's extremely useful for testing.

Or any time when you want to access more than one cluster from a single driver process, combining or comparing in the driver context the results of jobs run concurrently on those separate clusters.

Evan Chan

unread,
Jun 2, 2013, 11:45:03 AM6/2/13
to spark-de...@googlegroups.com
I was able to create multiple SparkContexts in the same JVM.   You just need to set spark.driver.port to a different number every time, instead of "0".   The "0" does not work, for some reason, the successive port number scheme of "0" finds ports that have already been reused.

If/when we switch to a proper config scheme, instead of using system properties, that will make this much less hacky.

-Evan

Jason Dai

unread,
Jun 3, 2013, 7:46:04 AM6/3/13
to spark-de...@googlegroups.com
In this case, it may make most sense to pass the SparkEnv around through reference - on the slave side we can pass it through TaskContext as you mentioned; on the driver side, maybe we can actually pass the SparkContext around (which contains the SparkEnv instance) as we already do that in many cases?

Thanks,
-Jason

Matei Zaharia

unread,
Jun 4, 2013, 1:27:14 AM6/4/13
to spark-de...@googlegroups.com
Yup, this would be great. The only problem is some places in the code that don't have a great way to get it through a reference yet.

Matei

Christopher Nguyen

unread,
Aug 2, 2013, 2:47:14 PM8/2/13
to spark-de...@googlegroups.com
We ran into this on v0.7.2 through yet another path: master's TaskSetManager.taskFinished deserializing an object that contains an HttpBroadcast, whose readObject() assumes that it always has access to a non-null SparkEnv.get. This would start to happen more often once people pass more than very simple closures around. The problem is it's really esoteric for folks to diagnose as the chain of responsibility is so distributed.

There are a few orthogonal workarounds, none of which is as satisfying as removing this risky requirement. The least unsatisfying for us is to tweak SparkEnv.get to return a previously saved SparkEnv singleton, if SparkEnv.set hasn't been called on said thread. This adds some thread safety as getters are unlikely to be modifiers, while partially defeating ThreadLocal. If you are very concerned, we could return a clone which would make everyone happy 99% of the time.

Would you agree to put in this SparkEnv tweak until a larger refactoring can take place? 

Reynold Xin

unread,
Aug 5, 2013, 2:36:03 AM8/5/13
to spark-de...@googlegroups.com
I think that's fine. But why not just make SparkEnv non thread local?



--

Evan chan

unread,
Aug 5, 2013, 3:13:16 AM8/5/13
to spark-de...@googlegroups.com, spark-de...@googlegroups.com
If nobody else gets around to this, I'm going to attempt to fix the SparkEnv to be non thread local.  Can't be that hard. 

-Evan
To be free is not merely to cast off one's chains, but to live in a way that respects & enhances the freedom of others. (#NelsonMandela)
You received this message because you are subscribed to a topic in the Google Groups "Spark Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-developers/GLx8yunSj0A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spark-develope...@googlegroups.com.

Christopher Nguyen

unread,
Aug 5, 2013, 4:01:41 AM8/5/13
to spark-de...@googlegroups.com
I'll do a PR from our existing fix.

--
Christopher T. Nguyen
Co-founder & CEO, Adatao
Reply all
Reply to author
Forward
0 new messages