Vars as global thread-locals?

1,217 views
Skip to first unread message

Ernesto Garcia

unread,
Feb 8, 2017, 9:34:06 AM2/8/17
to Clojure
https://clojure.org/reference/vars says

Clojure is a practical language that recognizes the occasional need to maintain a persistent reference to a changing value and provides 4 distinct mechanisms for doing so in a controlled manner - Vars, Refs, Agents and Atoms. Vars provide a mechanism to refer to a mutable storage location that can be dynamically rebound (to a new storage location) on a per-thread basis.


Using def to modify the root value of a var at other than the top level is usually an indication that you are using the var as a mutable global, and is considered bad style. Consider either using binding to provide a thread-local value for the var, or putting a ref or agent in the var and using transactions or actions for mutation. 

Clojure encourages avoiding the use of vars as global thread-local storage, by restricting their use to dynamic binding only.

What is so bad about global thread-locals? It can't be the fact that they are global, as refs are typically made global. They also have a good thread-safe behavior.

Thanks,
Ernesto

Alex Miller

unread,
Feb 8, 2017, 11:12:27 AM2/8/17
to Clojure


On Wednesday, February 8, 2017 at 8:34:06 AM UTC-6, Ernesto Garcia wrote:
https://clojure.org/reference/vars says

Clojure is a practical language that recognizes the occasional need to maintain a persistent reference to a changing value and provides 4 distinct mechanisms for doing so in a controlled manner - Vars, Refs, Agents and Atoms. Vars provide a mechanism to refer to a mutable storage location that can be dynamically rebound (to a new storage location) on a per-thread basis.


Using def to modify the root value of a var at other than the top level is usually an indication that you are using the var as a mutable global, and is considered bad style. Consider either using binding to provide a thread-local value for the var, or putting a ref or agent in the var and using transactions or actions for mutation. 

This is saying that if you are doing something like this:

(def my-value 10)

(defn inc-value []
  (def my-value (inc my-value)))

Then you are not using vars as intended. Here you have an identity (my-value) that refers to a var, that holds the value 10. The inc-value function is going to locate the existing var and reset its root value. This is not following the Clojure update model for identities which expects that you update an identity by providing a function that can be applied to the old value to produce a new value. If you really wanted to do this, you could do so by using alter-var-root explicitly:

(defn inc-value []
  (alter-var-root #'my-value inc))

However, the point here is that keeping your state in a global var and bashing on it is generally bad style (particularly using def). Using a dynamic var and bindings for thread-local values effectively switches to a model where you are not sharing state across threads. 

Probably if you're doing the code above, that's not really satisfying your needs. Dynamic bindings are most commonly used to create per-thread ambient context that can be passed down the call stack without explicitly passing it as arguments. This should be used sparingly as you are creating implicit state - any code that calls into it (like tests) has to be aware of the ambient state and properly set it up.

Alternately, you can make the value held by your var a stateful thing (atom or ref) itself and thus avoid altering the global var ever:

(def my-value (atom 0))
(defn inc-value []
  (swap! my-value inc))

This is strongly preferred over the var-bashing in the first example.
 
Clojure encourages avoiding the use of vars as global thread-local storage, by restricting their use to dynamic binding only.

I don't think that's really the message here. It's saying that while vars are global stateful constructs, it is not idiomatic to modify them unless they are dynamic and you are doing so in a thread-local context. If you want a global thread-local state, then do that. If you want a global stateful construct, it's better to use an atom or ref held in a non-dynamic var. 

Generally my advice would be to avoid global stateful constructs as much as possible and not do any of these. :) Global state has all the same problems of implicit state I mentioned above with bindings.
 
What is so bad about global thread-locals?

Nothing - dynamic vars are effectively global thread locals (in combination with a global root value).
 
It can't be the fact that they are global, as refs are typically made global.

As I said above, atoms/refs can and often are created as part of your application state and passed around. Generally, I think you should strongly prefer this over creating a (global) var holding a ref. 
 
They also have a good thread-safe behavior.

Agreed. :)
 

Thanks,
Ernesto

Ernesto Garcia

unread,
Feb 8, 2017, 5:39:59 PM2/8/17
to Clojure
Hi Alex, thanks for your thorough response.

It seems to me that Clojure vars are just not intended to be used as thread-locals in general. They happen to use thread-local storage in order to implement dynamic scoping, which is the original intent.

That is why vars are either global (interned in a namespace), or confined to a dynamic scope (via with-local-vars).

If one needs thread-local storage, you use a Java ThreadLocal directly.

Didier

unread,
Feb 26, 2017, 12:10:46 AM2/26/17
to Clojure
"If one needs thread-local storage, you use a Java ThreadLocal directly."

No, you just use a dynamic Var for that or with-local-vars.

Normally, in Clojure, values are bound to symbols. This means that a symbol is mapped to a value. In general, you can not change this mapping once it is set. This is true in local contexts, such as when using (let) or with function arguments.

In cases where you need to change the mapping, you'll need to have an extra layer of indirection, since you can not actually change it. To do this, Clojure gives you a few constructs such as Vars, Refs, Agents and Atoms.

A Var is a mapping from thread to value, with support for an optional root value (like a default value if no value for the current thread exists on the Var). So instead of doing Symbol -> Value, you will do Symbol -> Var(CurrentThread) -> Value.

By default, Vars just have a root, and you can not dynamically add new thread to value mappings to them unless they are declared dynamic.

Now, when you do "def" or what is called "intern", you are creating a new mapping from Symbol to Var on the current Namespace. This can be thought of as the global context. When you use a symbol, Clojure will first look for it inside local contexts, and if it does not find it there, it will look for it in the current namespace. If it is a fully qualified symbol, it'll go looking for it directly inside the namespace you qualified. What this means is that, on a Namespace, you can not map Symbols to Values, you can only map Symbols to Vars. In local contexts, you can map symbols to whatever you want, but not in the global context.

The reason Clojure always forces you to map symbols to vars at the namespace level are unknown to me, but that's what it does.

All that to say, if you want a per-thread value, go ahead and use a dynamic Var. You can choose to make it global, by using def, or to make it local, by creating a local Var using (with-local-vars). No need to use Java's ThreadLocals directly.

Keep in mind that if you have sub-threads, they won't always inherit their parent's bindings. They only do inside (future) and if using (bound-fn).

Didier

unread,
Feb 26, 2017, 12:23:28 AM2/26/17
to Clojure
Re-reading your reply, sounds like I might have explained what you already know. So to better answer your question:

Dynamic scoping and Java ThreadLocals gives you equal functionality, so I'd use them equally. This is because Clojure supports thread bound dynamic scope.

On Wednesday, 8 February 2017 14:39:59 UTC-8, Ernesto Garcia wrote:

Ernesto Garcia

unread,
Mar 3, 2017, 10:02:21 AM3/3/17
to Clojure
On Sunday, February 26, 2017 at 6:23:28 AM UTC+1, Didier wrote:
Dynamic scoping and Java ThreadLocals gives you equal functionality, so I'd use them equally. This is because Clojure supports thread bound dynamic scope.

I wouldn't say that. While dynamically scoped Vars are meant to be context that you implicitly pass down to your call stack, ThreadLocals are references that you can pass around explicitly at will.

Java ThreadLocals also provide an .initialValue() callback to override, which is not provided by Vars. Vars can't even be (idiomatically) initialized to a thread-bound value.

I think that the fact that dynamically scoped Vars are implemented by ThreadLocals is an implementation detail, and it is not a feature, it is more of a limitation. (I guess implementing it otherwise is difficult, if viable at all).

Thanks

Didier

unread,
Mar 4, 2017, 4:59:12 PM3/4/17
to Clojure
Hum, you're having me question myself.

See, I don't think dynamically scoped Vars are intended for anything in particular, they are what they are. Yes, they are useful when you need to pass configuration down a call stack, as long as you make sure you handle the thread-boundaries along the way. They can also be used to share per thread data throughout a thread's execution, which is what ThreadLocal in java does. So basically, I see two use cases. One, you want shared read only data accessible throughout a call stack (for that, they aren't as good as true dynamics, because you'll need to manually handle the thread boundaries). Or you want shared writable state per-thread, making this not per-thread would be dangerous, and can be done if you re-bind it manually across threads or when using constructs that does so like future.

The reason they are made per-thread, and not truly dynamic, as in, taking on the value down the stack no matter what, is because threads could content on it, and it could cause bugs, so its made per-thread for safety. Here's an example of such caveat:

(def ^:dynamic *dd*)

(binding [*dd* (atom "John")]
  (let [a (future (reset! *dd* "Bob"))
        b (future (reset! *dd* "Mike"))]
        @*dd*))

Run this many times, and *dd*'s value could be either Mike or Bob based on which threads wins the race.

So the limitation of dynamic scope are inherent to dynamic scoping where you have threads. Clojure's attempt at solving this limitation is to make them per-thread, with explicit cross thread management through (bound-fn). After a few releases, Clojure decided to make future and a few other constructs implicitly propagate the dynamic var across threads as that was often what people were expecting. All to say, when using dynamic Vars in Clojure, you must be thread aware and understand the caveats. There is no better way that I'm aware of to handle dynamic scope in a multi-threaded environment.

Clojure has root bindings, which Java ThreadLocals does not have. Clojure lets you say, if this thread does not have a value for the dynamic Var, fetch the root value instead. This is really useful in the use case of read only data, like configuration, because you can set a default config. You can simulate this in Java with initialValue.

Now, the initialValue lets you do one more thing, it can let you generate per-thread values on get, without needing to set it. Such as setting a random int on the first get, and subsequent get from that thread will from then on return that same generated random int. In Clojure, this is what I've seen for such use case:

(def ^:dynamic *id*)

(defmacro with-id [& body]
  `(binding [*id* (rand-int 10000)]
    ~@body))

@(future (with-id (println (str "My request id is: " *id*))
                           "Success"))

It's not as convenient per-say, since you have to be explicit and call the macro from each Thread before getting the value, but it solves the use case.

Remember that ThreadLocals are more inconvenient to get though, since you need to get the ThreadLocal and then get the value out of it. So in some ways, Clojure dynamics can also be seen as more convenient.

Up to now, I feel like there is nothing I can not do with a Clojure Dynamic Var that I could with a ThreadLocal. They're not identical, but to me, the two fulfills equal use cases. And since under the hood, Clojure dynamic Vars are implemented with ThreadLocal, they should perform similarly too.

You say you can pass around references of ThreadLocals, but I'm not sure what the point of that would be, like what use case would it allow? In general, and even the Java doc page says so, you'll put the ThreadLocal in a static variable, i.e., private static final ThreadLocal<Integer> threadId. At that point, it's equal to (def ^:dynamic threadId).

Anyways, you can also do that in Clojure, you can pass around the Var such as:

(defn printVar [v]
  (println @v))

(with-local-vars [*threadId* 0]
  (.start (Thread. (bound-fn [] (var-set *threadId* 1) (printVar *threadId*))))
  (.start (Thread. (bound-fn [] (var-set *threadId* 2) (printVar *threadId*)))))

I have a function printVar that takes a Var and prints it. Then I create a new Var called *threadId* and I set it to two different values, one for each Thread. I pass the Var to my Var printing function, and as you can see if you run this, it works like it would in Java if you did the same with ThreadLocal, printing different values based on the thread.

I hope this helps.

Ernesto Garcia

unread,
Mar 9, 2017, 12:41:32 PM3/9/17
to Clojure
Hi Didier,

Thanks for your response, it helps for continuing to learn about the language.

As dynamic vars are implemented by using ThreadLocals, ThreadLocal is in this case a more primitive construct than dynamic vars, so I find it ok to use it if one just needs the ThreadLocal aspect, and not the dynamic scoping.

As a side-topic, when looking into the Var class implementation, I have seen that some fields are volatile, in particular the root value of the var. Does that mean that every time we access a var's root value, we are implicitly crossing a memory barrier, so a cross-processor synchronization primitive. Doesn't this hit performance? Or is access to vars not considered to happen frequently?

Ernesto

Alex Miller

unread,
Mar 9, 2017, 1:05:50 PM3/9/17
to Clojure
Regarding your last question, yes using volatiles has a performance impact, but is essential to both accomplish the var api and be thread-safe.

It is also possible to compile your code with aot and direct linking. In this case var lookups are replaced with a direct invocation (no var lookup).

Ernesto Garcia

unread,
Mar 9, 2017, 3:16:40 PM3/9/17
to Clojure
Thank you Alex!

Didier

unread,
Mar 10, 2017, 1:05:19 PM3/10/17
to Clojure
Absolutly, Clojure embraces its host platform. Always feel free to use the Java features when they work best for your use case.

But just to clarify, Java's ThreadLocal is an implementation of dynamic scoping. The scope is determined not by the source code, but by the runtime circumstances, in this case, the running Thread. While it's not the classic kind of dynamic scoping, it is still dynamic and not lexical.

It differs from Clojure's dynamic scope in that, Clojure's scope is determined by the runtime stack plus the current thread.

Clojure relies on ThreadLocal for the per-thread scoping implementation, and it uses its own implementation for the runtime stack based scoping.

This means that if you want to re-enter a thread after having executed in its context before, and still have access to its previously bound ThreadLocal, you can only do it with a ThreadLocal. With Clojure's dynamic Vars, I don't think this is possible, since Clojure restores the previous binding when you exit from the thread. Now, be careful with this, I've seen lots of bugs in Java due to people not realizing that the binding is permanently attached to the thread, unless explicitly unbound, especially with the use of thread pools.

So I'd consider Clojure dynamic Vars safer to use in most cases, and still would recommend you use them instead of ThreadLocal most of the time. Unless what you want to do is attach state to the thread, dynamic Vars will probably be better and easier.

Just my 2 cents.

Ernesto Garcia

unread,
Mar 20, 2017, 5:51:31 PM3/20/17
to Clojure
Thanks for your response Didier.

On Friday, March 10, 2017 at 7:05:19 PM UTC+1, Didier wrote:

But just to clarify, Java's ThreadLocal is an implementation of dynamic scoping.


This is how I see it:

A ThreadLocal is an object instance. It ensures a different object for each different thread. It implements automatic thread-confinement. In particular, a ThreadLocal may be referenced by different vars. You may move its reference around the program and it will refer to the same object by the same thread.

Scoping refers to the ability to refer to a var. Dynamic scoping lets you use a var that should be declared by one of the functions up in the call stack. It implements an implicitly injected parameter to a function. A function that uses a dynamic var will see a different var depending on the last function in the call stack that defined it.

Didier

unread,
Mar 21, 2017, 8:46:08 AM3/21/17
to Clojure
Right, except each thread gets its own binding. So it's not necessarily that you'll get the value of the last binding up the call stack. This will only be true if you are in the same thread also.

I'm not sure if we agree on the rest, but explain it differently or not. ThreadLocal is an object, and so is a Clojure Var. When I say Clojure Var, I don't mean a variable, but an instance of the class clojure.lang.Var.

So you have two classes, ThreadLocal and Var. Each one gives you variable like behaviour. You'll need a true variable to point to the instance of each, in Clojure you can attach a local symbol to it, or a global symbol, same in java. In both, you'll probably want to store the instance through a globally accessible name, like with def in Clojure or a static in Java. You don't have too, but I don't see the use case for a local reference to the ThreadLocal or the Var.

So up to now, there's no difference between the two. Now, where the difference appears, is in their specific features, when working with the instances themselves. The ThreadLocal scope values by thread, and so does the Var. The only difference is that in Clojure, you can not set and get as you like, you must do it through a binding block, which will set on entry and unset on exit. In Java, you have to manually unset, and most probably should, as you exit.

Then there's the details, like Vars have a default global scope value, while ThreadLocal has a default init method if you get before a set.

The access scope to the Var instance or the ThreadLocal instance is up to you, put it in a local or a global variable, neither prevents you from doing either. The scope of the values returned by Var and ThreadLocal are both Thread scope, with Clojure adding an extra binding scope to Var on top of it.

I think we're saying the same thing at this point. In practice, you can think of ThreadLocal as thread scoped variables, and Vars as binding block scoped variables per thread.

Ernesto Garcia

unread,
Mar 28, 2017, 5:44:58 PM3/28/17
to Clojure
Right, except each thread gets its own binding. So it's not necessarily that you'll get the value of the last binding up the call stack. This will only be true if you are in the same thread also.

The last binding up in the call stack implies that you are in the same thread, but I think I know what you mean, which brings up the limitation of implementing dynamically scoped vars with ThreadLocal: It would be reasonable to expect that the bindings of dynamic vars propagate to all code inside the same function, even if executed by a different thread.
 

ThreadLocal is an object, and so is a Clojure Var.


A Clojure var is not only an object, it is a language construct. And when you make it dynamic, you can bind and re-bind it with different values as your functions are invoked within each other. This is something a ThreadLocal can't do, and it makes the dynamic var a different kind of beast, used for different purposes. A dynamic var emulates a local var that you don't need to pass as a parameter to functions down the stack.
 

In both, you'll probably want to store the instance through a globally accessible name, like with def in Clojure or a static in Java. You don't have too, but I don't see the use case for a local reference to the ThreadLocal or the Var.


Dynamic vars are required to be global in Clojure, because Clojure will check that your symbols have been defined, but they wouldn't need to.

ThreadLocals don't need to be global either, you can define them in the smaller scope where they are used.
 

Then there's the details, like Vars have a default global scope value, while ThreadLocal has a default init method if you get before a set.


This is not unimportant, and indicates that vars and ThreadLocals are meant for different purposes. A ThreadLocal will guarantee a new, different value for each thread. For Vars, you need to manually do that at thread creation, and it may be tricky for threads that you don't create, if possible.


Regression: The reason that I brought up this discussion is that I didn't understand why clojure.tools.logging uses a dynamic var for enforcing the use of a specific *logger-factory*. Does anybody have an explanation for that?

Thanks,
Ernesto

Didier

unread,
Mar 28, 2017, 11:59:13 PM3/28/17
to Clojure
which brings up the limitation of implementing dynamically scoped vars with ThreadLocal: It would be reasonable to expect that the bindings of dynamic vars propagate to all code inside the same function, even if executed by a different thread

This is not a limitation, this was done on purpose. Traditional dynamic vars are not thread safe and don't play well inside threaded environments. Clojure purposely restricted the scope to single threads. You can propagate it explicitly by using (bound-fn). Dynamic vars in Clojure were designed as a concurrency primitive, not as an implicit parameter, though it is fine to use them for that if you know what you are doing. You can see that from Rich Hickey's Clojure slides: https://i.imgur.com/jx2vtPb.png

A Clojure var is not only an object, it is a language construct. And when you make it dynamic, you can bind and re-bind it with different values as your functions are invoked within each other. This is something a ThreadLocal can't do, and it makes the dynamic var a different kind of beast, used for different purposes. A dynamic var emulates a local var that you don't need to pass as a parameter to functions down the stack.

You don't need to make them dynamic to re-bind them. The symbol is immutably bound to the Var, so to allow re-binding of def in Clojure, all symbols are mapped to Vars instead of the value being defed, which are mutable data structures with thread safety features. The first thread safety feature is volatile access across threads of a root pointer with automatic deref. Volatile is needed to guarantee that as soon as you re-def, all access will see the new def and not the old. The second thread safety feature is thread isolated value overrides. This is useful when you don't need threads to cooperate on the same data, but you need all of them to have their own copy to work with. Each feature is optional. So if you only want thread isolation, you create a dynamic with no root. If you only want volatile root, you create a non dynamic var. If you want both, you can do that too.

So yes, it is not exactly like ThreadLocal, but I guess this is the biggest thing I disagree about, it's that for every use case Var can be used for, there is nothing wrong with using it, even if a ThreadLocal could also be used. In fact, I'd frown upon someone using ThreadLocal when Var would work.

Dynamic vars are required to be global in Clojure, because Clojure will check that your symbols have been defined, but they wouldn't need to.

Dynamic vars are not required to be global in Clojure, you can use (with-local-vars) if you only need them to be local. I know it is a bit complicated to refer to their instance directly, and there's no constructor for them, only indirect ones like def and (with-local-vars), but they can be used as a simple instance if you want. You can refer to the instance by using (var). You can pass this instance around. You can create local versions of them. You can even call into their java constructor if you want more control on them, at which point, you can fine tune when to push and pop on them. Now, I don't advise you use them through their Java interface, but you can if you want.

This is not unimportant, and indicates that vars and ThreadLocals are meant for different purposes. A ThreadLocal will guarantee a new, different value for each thread. For Vars, you need to manually do that at thread creation, and it may be tricky for threads that you don't create, if possible.

This I agree with, and that's the only use case I can think of where ThreadLocal would be better. Slight correction though, ThreadLocals won't guarantee that, but can be made too, by overriding their init method. By default they return null otherwise. Vars and ThreadLocal are meant to be tools in my opinion, and they overlap quite a bit in functionality, and where it overlaps, I see no reason to use ThreadLocal over Vars, especially when considering that in most cases, you'll put your ThreadLocal inside a Var when using it from Clojure.

Regression: The reason that I brought up this discussion is that I didn't understand why clojure.tools.logging uses a dynamic var for enforcing the use of a specific *logger-factory*. Does anybody have an explanation for that?

This lets you override the factory used to create the logger if you need too, within a particular thread and binding block. I'm not sure when that would come in handy for tools.logging, but I assume it sometimes is.

By the way, as I revisit your initial question:

Clojure encourages avoiding the use of vars as global thread-local storage, by restricting their use to dynamic binding only.

I can see now what you were asking. The "dynamic" part is the idea that the value is pushed and poped within entry and exit of a binding block. I doubt Clojure has anything against global thread-local storage, but it wanted to provide a "dynamic" behavior that was also thread safe. I think this is what you meant by they are designed for different purpose. In that sense, yes, you're correct 100%. Clojure would have implemented it with other things if ThreadLocal did not exist. But, it would have implemented it to still behave as it does no matter what, so there's no accidental abstraction leak here from having used the ThreadLocal.

Having said that, if you look at the details, you might realize that in most cases, ThreadLocal should have been designed the way Vars have, and it would have been a better construct. In my opinion, in all cases except when you want default initialization. The reason I say that is because it is not possible to re-use a thread in Java. So effectively, a Thread == A single run through a call stack. In that case, it would be way better if the ThreadLocal value for that thread would be automatically removed for you when exiting. The fact it is not can cause issues in certain scenarios. Now if using Thread Pools, the Threads are actually special Threads which contain a loop within them, so the "Call Stack" is infinite. The Threads in the pool are then assigned to pick up execution when some are available. So the Thread is not really re-used, but is simply never done being used. Once again, if you use a ThreadLocal in java, because that Thread is never done, the value for the ThreadLocal remains on it infinitely. This also causes issues most of the time, either it creates leaks, because references are not garbage collected, or it causes the next execution to not get a fresh value. Because of these details, I think Clojure did it better by restricting it to be dynamic.

P.S.: Thanks for all the healthy back and forth, it forced me to really refresh and solidify my understanding of Clojure Vars.
Reply all
Reply to author
Forward
0 new messages