RFC: laziness-safe, semi-dynamic environment Var(Lite)

8 views
Skip to first unread message

jon

unread,
Sep 29, 2009, 4:04:26 PM9/29/09
to Clojure
Hi.. long post.. it's a Request For Comment :)

Clojure's "thread local binding facility" for Vars has always seemed
like
a useful (but of course misusable) feature to have available in our
toolbox..
However it soon becomes apparent that Var bindings don't play nice
with
laziness - and since laziness can creep in all over the place
(eg. using standard sequence functions, direct use of (lazy-seq ..),
using (delay ..), perhaps from any (fn ..) you create)
that renders them of much less value.. almost too unsafe to tangle
with
in my opinion.
To quote Rich, "there is a fundamental tension between laziness and
dynamic
scope, combine with extreme caution."

The underlying problem is that when each (fn ..) gets created it
doesn't
capture the current dynamic environment so that it can subsequently be
made
available when it is eventually invoked.
To quote Rich once more, "The overhead for capturing the dynamic
context
for every lazy seq op would be extreme, and would effectively render
dynamics non-dynamic."

To help alleviate the problem somewhat, a (bound-fn ..) helper macro
has
been created (https://www.assembla.com/spaces/clojure/tickets/170)
but my guess is that its use would be impractical/ugly/risky..
it would need to be used "all over the place" and forgetting to use it
in any of those places could introduce a bug.

I've been thinking about an alternative to (bound-fn ..) and would
like your opinions on the following tweak to Clojure:

* Designate one "Var" (say clojure.core/*env*) as a special
"environment"
Var. It would either be bound to nil or something-non-nil (normally
a map).
* Modify clojure's implementation behind (fn ..) to do a light-weight
version of what (bound-fn ..) does -- ie.on instantiation, capture
the
current value of *env*, and when invoked, wrap the execution in a
bind/unbind of *env* with the captured value, but only if non-nil.
* Create a (with-env {...} ...) helper macro.
* Developers, just need to make sure not to wrap (with-env ..) around
code
that /loads/ their software (only around code that /runs/ their
software).

As a proof-of-concept, I implemented this the most simple, hackish
way,
but it seems to work quite well. The main details follow:

-In RT.java
add a public static 'ENV' field (similar to IN, OUT, etc) associated
to
clojure.core/*env* with root binding of nil

-In both RestFn.java and AFn.java:
add a new private 'env' field.
set 'env' to the deref of RT.ENV in the constructor.
rename *all* invoke() methods to invoke0().
add corresponding new invoke() methods for each invoke0() in order
to intercept execution.
Example:
-public Object invoke(Object arg1, Object arg2, Object arg3) throws
Exception{
- return throwArity();
-}
---
+public Object invoke0(Object arg1, Object arg2, Object arg3) throws
Exception{
+ return throwArity();
+}
+public Object invoke(Object arg1, Object arg2, Object arg3) throws
Exception{
+ try {
+ if (env != null)
+ <...something to push 'env' value onto RT.ENV...>;
+ return invoke0(arg1,arg2,arg3);
+ }
+ finally {
+ if (env != null)
+ <...something to pop 'env' value off RT.ENV...>;
+ }
+}

-In Compiler.java:
make the following change so that (fn ..) objects override
invoke0() instead of invoke().
- Method m = new Method(isVariadic() ? "doInvoke" : "invoke",
+ Method m = new Method(isVariadic() ? "doInvoke" : "invoke0",

-------Example of it working-------
user=> (def *other* {:addval 1})
#'user/*other*
user=> (map #(+ % (:addval *other*)) [1 3 5 7 9])
(2 4 6 8 10) ;<---AS EXPECTED.
user=> (binding [*other* {:addval 10}]
(map #(+ % (:addval *other*)) [1 3 5 7 9]))
(2 4 6 8 10) ;<---OOPS. BINDING DISAPPEARED.
user=> *env*
nil ;<---DEFAULTS TO nil
user=> (with-env {:addval 10}
(map #(+ % (:addval *env*)) [1 3 5 7 9]))
(11 13 15 17 19) ;<---GREAT. BINDING WAS REMEMBERED.
-------------------------------------

Now what about the overhead? Based on a little initial testing...
when using a regular Var to implement our special *env*,
when *env* is not utilized (ie.left bound to nil) the overhead appears
to be negligible, but when bound it is quite significant..
Consuming this 30-million entry lazy list:
(time (last (map identity (range 30000000))))
with *env* unbound = ~18 sec
with *env* bound = ~65 sec

However, if we choose to create clojure.core/*env* not referring
to a Var but something else (unfortunately Clojure is extremely
inextensible in this regard) -- we can instead invent and use
a "lighter weight Var" because we something simple is adequate.
I experimented by creating a VarLite class.
It extends Var (had to change Var to be non-final) and manages
the pushing/popping of its value with a simple stack (just for
itself) with a bit of caching, and dispenses with Validators, etc.
This reduces the overhead dramatically:
(time (last (map identity (range 30000000))))
with *env* unbound = ~18 sec
with *env* bound = ~22 sec

With a better integrated, better designed implementation, I'm
certain this could be improved further.
In that case, would this be a worthwhile enhancement to Clojure?
Seems like it could be a win-win situation, since it rescues
(semi-)dynamic bindings from the gnashing jaws of laziness
for those that want to use it, but shouldn't impact negatively
upon those that don't?
Or is there something fundamentally wrong with the idea?

Thanks for reading,
Jon

Rich Hickey

unread,
Sep 29, 2009, 5:31:40 PM9/29/09
to clo...@googlegroups.com
On Tue, Sep 29, 2009 at 4:04 PM, jon <superu...@googlemail.com> wrote:
>
> Hi.. long post.. it's a Request For Comment :)
>
> Clojure's "thread local binding facility" for Vars has always seemed
> like
> a useful (but of course misusable) feature to have available in our
> toolbox..
> However it soon becomes apparent that Var bindings don't play nice
> with
> laziness - and since laziness can creep in all over the place
> (eg. using standard sequence functions, direct use of (lazy-seq ..),
> using (delay ..), perhaps from any (fn ..) you create)
> that renders them of much less value.. almost too unsafe to tangle
> with
> in my opinion.
> To quote Rich, "there is a fundamental tension between laziness and
> dynamic
> scope, combine with extreme caution."
>
> The underlying problem is that when each (fn ..) gets created it
> doesn't
> capture the current dynamic environment so that it can subsequently be
> made
> available when it is eventually invoked.

This is not a 'problem', this is what dynamic means.

> To quote Rich once more, "The overhead for capturing the dynamic
> context
> for every lazy seq op would be extreme, and would effectively render
> dynamics non-dynamic."
>
> To help alleviate the problem somewhat, a (bound-fn ..) helper macro
> has
> been created (https://www.assembla.com/spaces/clojure/tickets/170)
> but my guess is that its use would be impractical/ugly/risky..
> it would need to be used "all over the place" and forgetting to use it
> in any of those places could introduce a bug.
>

I don't think so. There are people who are sending off jobs to agents
that they know will used the dynamic environment for which bound-fn
will work perfectly. And most code need never consider it. If you need
it all over the place you have too much use of dynamic vars and
laziness + side-effects. The person who needs to think about this is
the person using send/future etc with context-sensitive work. If there
were to be generic capturing points, it might be macros wrapping
those.

> I've been thinking about an alternative to (bound-fn ..) and would
> like your opinions on the following tweak to Clojure:
>

> ...Implementation details elided...

> With a better integrated, better designed implementation, I'm
> certain this could be improved further.
> In that case, would this be a worthwhile enhancement to Clojure?
> Seems like it could be a win-win situation, since it rescues
> (semi-)dynamic bindings from the gnashing jaws of laziness
> for those that want to use it, but shouldn't impact negatively
> upon those that don't?
> Or is there something fundamentally wrong with the idea?
>

Before leaping to implementation/performance issues, I think it is
important to think about the semantics of this - what does it mean? I
think you will get a lot of confusion, given:

(defn foo []
(send-off-something-that-uses-env (fn [] ... (use-env))))

(defn bar []
(establish-env env
(foo)))

If fns 'capture' these environments when created, and re-establish
them when called, then foo itself will have captured the environment
at *its* definition/creation point, and will re-establish that, thus
the environment setup by bar will not be conveyed through foo to
something-that-uses-env - *but*, if you substituted the body of foo
for its call, it would. That's bad.

Rich

Laurent PETIT

unread,
Sep 30, 2009, 3:46:32 AM9/30/09
to clo...@googlegroups.com
Hello Rich,

It's been a long time since I've not hijacked a thread, so let's get back to this bad habit exceptionally :)

While the example provided by jon may not be the best ones, as you pointed out below, I feel there still is a problem: dynamic scopes and threads.

I don't know how it can be correctly addressed (or if it can - theoretically, and then practically -, has it been adressed in other languages ?), but here is how I would state the problem:

When reusing libraries made by others, one could rely on dynamic bindings on some functions (a classical case may be having an *db* dynamic var -though it is arguable- ?).

But without knowing too much details concerning the internals of the used libraries, one can not know whether the library uses parallel computing some times to speed things. In those cases, one may have weird "bugs" because some bindings have been "reinitialized" to the root binding by switching to technical threads.

Where should the responsability be placed ? Should the user of the library, in doubt, place everywhere in his code bind-fn calls to protect it ? Should the library author use bind-fn before dispatching to other threads (with the problem that the library author may not know which dynamic vars are relevant) ...

Thanks in advance for even more insightful comments on this problem,

--
Laurent



2009/9/29 Rich Hickey <richh...@gmail.com>

Meikel Brandmeyer

unread,
Sep 30, 2009, 4:39:33 AM9/30/09
to Clojure
Hi Laurent,

On Sep 30, 9:46 am, Laurent PETIT <laurent.pe...@gmail.com> wrote:

> Where should the responsability be placed ? Should the user of the library,
> in doubt, place everywhere in his code bind-fn calls to protect it ? Should
> the library author use bind-fn before dispatching to other threads (with the
> problem that the library author may not know which dynamic vars are
> relevant) ...

I think the responsibility should be placed with the one creating
another thread. There are several scenarios:
* in the library:
* if only library code is involved the author knows (hopefully)
whether the dynamic environment must be saved or not
* if a user callback is involved, require it to be pure (ie.
depending only on the arguments) or
* use bound-fn to be safe if non-pure functions are allowed.
* in the user code:
* the user of the library should now, when library functions are non-
pure and hence bound-fn is necessary for a new thread.

Dynamic Vars have a high correlation with side-effects. So making such
things sufficiently ugly (but not too ugly) helps to make you aware of
side-effects and keep them apart of the (hopefully existing)
functional core. There are already examples where only non-side-
effecting functions are allowed: in transactions, as validators, ...

Does this make sense?

Sincerely
Meikel

Laurent PETIT

unread,
Sep 30, 2009, 5:02:54 AM9/30/09
to clo...@googlegroups.com
Hi Meikel !,

2009/9/30 Meikel Brandmeyer <m...@kotka.de>


Hi Laurent,

On Sep 30, 9:46 am, Laurent PETIT <laurent.pe...@gmail.com> wrote:

> Where should the responsability be placed ? Should the user of the library,
> in doubt, place everywhere in his code bind-fn calls to protect it ? Should
> the library author use bind-fn before dispatching to other threads (with the
> problem that the library author may not know which dynamic vars are
> relevant) ...

I think the responsibility should be placed with the one creating
another thread. There are several scenarios:
* in the library:
 * if only library code is involved the author knows (hopefully)
whether the dynamic environment must be saved or not
 * if a user callback is involved, require it to be pure (ie.
depending only on the arguments) or
 * use bound-fn to be safe if non-pure functions are allowed.

This one, is indeed, really the only one which annoys me.
 
* in the user code:
 * the user of the library should now, when library functions are non-
pure and hence bound-fn is necessary for a new thread.

Dynamic Vars have a high correlation with side-effects. So making such
things sufficiently ugly (but not too ugly) helps to make you aware of
side-effects and keep them apart of the (hopefully existing)
functional core. There are already examples where only non-side-
effecting functions are allowed: in transactions, as validators, ...

Does this make sense?


Certainly. But it really makes me wonder, at the end of the mental process : in the context of multithreading and lazyness : what is the niche where dynamic vars still can be used safely in real code ?




jon

unread,
Oct 1, 2009, 12:03:04 PM10/1/09
to Clojure
On Sep 29, 10:31 pm, Rich Hickey <richhic...@gmail.com> wrote:
> On Tue, Sep 29, 2009 at 4:04 PM, jon <superuser...@googlemail.com> wrote:
>
> Before leaping to implementation/performance issues, I think it is
> important to think about the semantics of this - what does it mean? I
> think you will get a lot of confusion, given:
>
> (defn foo []
> (send-off-something-that-uses-env (fn [] ... (use-env))))
>
> (defn bar []
> (establish-env env
> (foo)))
>
> If fns 'capture' these environments when created, and re-establish
> them when called, then foo itself will have captured the environment
> at *its* definition/creation point, and will re-establish that, thus
> the environment setup by bar will not be conveyed through foo to
> something-that-uses-env - *but*, if you substituted the body of foo
> for its call, it would. That's bad.

Hi Rich,

(Note - when I say 'environment' below I'm referring to one specially
designated 'environment Var', not the whole set of dynamically bound
Vars we are used to.)

I'm not sure whether you are taking into account the fact that the
semantics of my proposal are that each (fn ..) will only capture (and
later rebind) the environment if it is non-nil at the point it is
instantiated (in the java sense). The environment's root-binding is
nil, and all the user's code should be loaded in this state so that
the (defn ..)s themselves don't capture anything.. ie.no capturing
happens until the user code explicitly 'switches it on' with a (with-
env ..) and then only newly instantiated (fn ..)s will capture the
current environment.

For those kinds of environment data that don't require the fully
dynamic behavior of the regular Vars, wouldn't this be more intuitive
default behavior (ie.that all code "kicked-off" under a given
environment should be evaluated under that environment, even if that
happens later on) rather than the current situation in which any
eagerly evaluated code would see the current environment, and any
delayed-evaluation code would see 'whatever it happens to be at the
time', which would be harder to control?
For the (presumably rare) case that some library code (which may be
called under a non-nil environment) needs to create a (fn ..utilizing
*env*..) which is to be passed out and executed under some yet-to-be-
determined environment.. it could simply be wrapped like this:
(with-env nil (fn ..utilizing *env*..))

-----
The example you gave doesn't behave badly (ie.differently) when
replacing the body of foo for its call.. as demonstrated below.

user=> (defn send-off-something-that-uses-env [f] (+ 100 (f)))
#'user/send-off-something-that-uses-env
user=> (defn foo [] (send-off-something-that-uses-env (fn [] (+ 10
*env*))))
#'user/foo
user=> (defn bar [] (with-env 1 (foo)))
#'user/bar
user=> (bar)
111
user=> (defn bar [] (with-env 1 (send-off-something-that-uses-env (fn
[] (+ 10 *env*)))))
#'user/bar
user=> (bar)
111

Could you elaborate a bit more on the bad behavior you see..?
Thanks,
Jon

Rich Hickey

unread,
Oct 1, 2009, 1:08:55 PM10/1/09
to clo...@googlegroups.com

It simply doesn't compose or flow. Making the nil env special (e.g.
non-replacing) just moves the problem into higher-order functions that
use the construct:

(defn needs-x []
(use-env-x))

(defn needs-y []
(use-env-y))

(defn foo []
(with-env x (fn [f] (needs-x) (f))))

(let [f (foo)]
(with-env y (f needs-y)))

needs-y isn't going to get it.

Rich

Meikel Brandmeyer

unread,
Oct 2, 2009, 12:23:15 PM10/2/09
to clo...@googlegroups.com
Hi,

Am 01.10.2009 um 19:08 schrieb Rich Hickey:

> It simply doesn't compose or flow. Making the nil env special (e.g.
> non-replacing) just moves the problem into higher-order functions that
> use the construct:
>
> (defn needs-x []
> (use-env-x))
>
> (defn needs-y []
> (use-env-y))
>
> (defn foo []
> (with-env x (fn [f] (needs-x) (f))))
>
> (let [f (foo)]
> (with-env y (f needs-y)))
>
> needs-y isn't going to get it.

This is "simply" solved by having a chain of environments. The
algorithm works like this:

* The desired value is contained in the current map. => Use it.
("Younger" bindings override "older" ones)
* The desired value is not contained go up one step in the chain of
env maps and repeat.
* If there is no further step in the chain (ie. we arrived at nil),
use the root binding of the Var.

"simply" with quotes, because I have no clue about the performance
impact of that. It should fix your example, though.

Elk, an rather old, embeddable Scheme interpreter, constructs its
environment like that.

Sincerely
Meikel

Rich Hickey

unread,
Oct 2, 2009, 1:13:21 PM10/2/09
to clo...@googlegroups.com

Yes, I know how linked environments work, and in Clojure you wouldn't
do that but would instead just assoc onto the incoming env map. Then
at least lookups will remain fast. But environment extension will
still be an overhead, and more important will persist even if not
needed.

E.g. it still has scope issues, since when someone says (with-env x
...) they have an expectation of the dynamic extent of the binding
ending with the block, and it won't necessarily if any
closure/laziness escapes.

(defn bar []
(fn [] (doesnt-use-big-thing)))

(defn needs-a-big-thing []
(use-a-big-thing)
(bar))

(defn foo []
(with-env a-big-thing
(needs-a-big-thing)))

;a-big-thing is kept around needlessly

Overall these requests just seem to be - I wish dynamic binding knew
what I needed and did it (only when I need it to), without any real
semantics.

Note that I am interested in a construct with useful semantics, but we
need to get some semantics before implementation details.

Rich

Timothy Pratley

unread,
Oct 4, 2009, 3:16:58 AM10/4/09
to Clojure
I'm more interested in the compiler being able to detect obvious (to
it, not me) errors.

Example 1:
========
user=> (await1 (binding [*warn-on-reflection* true ] (send (agent 0) #
(if *warn-on-reflection* (inc %) (dec %)))))
#<Agent@238a47: -1>
user=> (await1 (send (agent 0) #(binding [*warn-on-reflection* true]
(if *warn-on-reflection* (inc %) (dec %)))))
#<Agent@da18ac: 1>

It looks to be a detectable thing.


Example 2:
========

(binding [*warn-on-reflection* true]
(map #(if *warn-on-reflection* (+ %) (- %)) [1 2 3 4 5]))
(-1 -2 -3 -4 -5)

(map #(let [*warn-on-reflection* true]
(if *warn-on-reflection* (+ %) (- %)))
[1 2 3 4 5]))
(1 2 3 4 5)


It seems to me that there is a 'right' way and a 'wrong' way - so long
as I get an electric shock for doing it wrong, I'm not too concerned
about how it looks (which to me is fine anyway). Please correct me if
I'm missing something important here.

So to me the big question is - can the compiler detect things like
this? Where to start if I want to implement such a feature?


Reply all
Reply to author
Forward
0 new messages