Alyssa Kwan <alyssa...@gmail.com> writes:
> ns-unmap isn't typically used. But for durability, we can think of
> JVM shutdown/startup as unmapping everything and starting fresh.
> Therefore, expected behavior for ns-unmap should be the same as
> behavior for deserializing and loading an object after a new JVM
> startup.
I think the point that you're missing is that vars are just plain old
first class objects. This includes being managed by the garbage
collector so they'll continue to exist until all references to them are
released. You can pass them around, put them in vectors etc.
> Here's another issue I ran into:
>
> => (def a 1)
> #'user/a
>
> => (defn b [] a)
> #'user/b
When you compile this, a reference to the var #'user/a is embedded in
your b function. This reference is directly to the var object.
> So far so good...
>
> => (ns-unmap *ns* 'a)
> nil
>
> => a
> java.lang.Exception: Unable to resolve symbol: a in this context
>
> => (b)
> 2
The var has been unmapped from the namespace, but it still exists
because the function b has a reference to it. Vars don't need to live
in a namespace. For example, the "with-local-vars" macro creates a
"local" var which doesn't belong to a namespace.
>
> => (binding [a 3]
> (b))
> java.lang.Exception: Unable to resolve var: a in this context
> So what's the expected behavior here? I would think that after a is
> unmapped, b should no longer work. Instead, it's hanging onto a with
> the last value.
The var still exists, b holds a reference to it.
> And the binding form doesn't work anymore, so there's
> no way to dynamically bind over it. It's like a weird hybrid of
> lexical and dynamic binding...
The only reason this is erroring is that the symbol 'a can no longer be
mapped to your var.
Actually there is a way to dynamically bind over a var which has been
unmapped, all you need is a reference to it:
user> (def a 1)
#'user/a
user> (def my-a #'user/a)
#'user/my-a
user> (defn b [] a)
#'user/b
user> (ns-unmap *ns* 'a)
nil
user> (b)
1
user> (with-bindings* {my-a 30} #(b))
30
--------------------------------------------------------------
| function | value | package | alist | ....
-------------------------------------------------------------
You can place a function in either the function or value slots.
The (defun...) function places it in the symbol's function slot.
You can define a naked function using lambda thus:
(lambda () 5)
You can execute this function thus:
(funcall (lambda () 5)) ==> 5
Now you can assign this function to the symbol A in two ways:
(setf A (lambda () 'value-slot)) ;fill the value slot
(setf (symbol-function A) (lambda () 'function-slot)) ;fill the function
slot
The first fills in the value-slot of A. You can call it with:
(funcall A) ==> value-slot
The second fills in the function-slot of A. You can call it with:
(A) ==> function-slot
So now you take a new symbol B with both a value-slot and a function-slot.
You want to "copy" something about the symbol A into B.
What is it you expect to copy?
(setf B A)
== copy the value slot of A into the value-slot of B
so that (funcall B) ==> value-slot
(setf B (symbol-function 'A))
== copy the function slot of A into the value-slot of B
so that (funcall B) ==> function-slot
(setf (symbol-function 'B) A)
== copy the value-slot of A into the function-slot of B
so that (B) ==> value-slot
(setf (symbol-function 'B) (symbol-function 'A))
== copy the function-slot of A into the function-slot of B
so that (B) ==> function-slot
Subsequent changes to A (either the function-slot or the value-slot)
will not affect the behavior of B. To do that you need to define B
in terms of A, such as:
(defun B () (A)) == call the function-slot of A
(defun B () (funcall A)) == call the value-slot of A
Which of the many behaviors were you expecting?
> I understand exactly why this situation exists. I just think the
> behavior is unexpected. When I create a function with a dynamic
> binding, I expect the function to keep a reference to the *name*, not
> the var that the name resolves to at compile/clinit time.
Oh, I see what you mean. I guess you're expecting something more like
Python's behaviour:
>>> x = 5
>>> def foo():
... return x
...
>>> foo()
5
>>> del x
>>> foo()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in foo
NameError: global name 'x' is not defined
In the case of Python, globals are a mutable map of names directly to
values and are presumably looked up at runtime. Python doesn't have
Clojure's concept of "vars".
>>> globals()
{..., 'x': 5, 'foo': <function foo at 0xb775b7d4>, ...}
> I guess the question is: what do other people expect? Am I alone in
> thinking that this is unexpected and undesirable?
It makes sense to me. I have a mental picture of functions closing over
the (lexical) environment as it existed when the function was defined
and that includes the dynamic vars as they were named at that moment.
Similarly in Python you can do this:
def foo():
return bar()
def bar():
return 5
Whereas in Clojure you would need to declare bar before foo.
That may mean dynamic vars are not exactly the same thing as traditional
dynamic scoping, but I don't see anything obviously unexpected or
undesirable about it. In fact quite the opposite, it's an intentional
design choice. It's consistent with the Clojure philosophy of
identities being first class. One of the major themes that distinguishes
Clojure's programming model from traditional languages is that names,
identities and values are distinct concepts.
A single var can be mapped into different namespaces under different
names (for example using the :rename argument to "use"). So the same
identity (var) may be referred to by multiple names (symbols), but it's
the identity you are dynamically binding a value to, not the name.
How are you planning to persist a function? The clojure reader can't
read functions output with spit or println.
I can think of at least three ways to build a framework for persisting
functions but none run into problems with vars.
If the functions are expressible in terms of parameters and a finite
set of codes/algorithms that use the parameters together with the
function arguments, you can persist just the parameters; for example,
(defn make-my-fn [foo bar]
[{:foo foo :bar bar} (fn [baz quux] (do-stuff-with foo bar baz quux))])
(defn call-my-fn [f baz quux]
((second f) baz quux))
(defn save-my-fn [f file]
(spit file (first f)))
(defn load-my-fn [file]
(let [parm-map (with-open ... blah blah ... file ... blah blah ...)])
(make-my-fn (:foo parm-map) (:bar parm-map)))
If the functions are more varied and the code is different every
time/arbitrary, you need something more sophisticated.
Option 2 is to turn to the Dark Side and use eval:
(defn make-fn* (arg-vec & body)
[[arg-vec body] (eval `(fn ~arg-vec ~@body))])
(defmacro make-fn (arg-vec & body)
(make-my-fn* ~arg-vec (quote ~@body)))
(defn call-fn [f & args]
(apply (second f) args))
(defn save-fn [f file]
(spit file (first f)))
(defn load-fn [file]
(let [[arg-vec body] (with-open ... blah blah ... file ... blah blah ...)])
(apply make-fn* arg-vec body))
This of course runs into the permgen-leak problem mentioned in another
thread recently.
A third option avoids evil eval and the permgen leak but won't run the
custom functions nearly as fast at runtime and is much, MUCH more work
to implement. It probably makes security easier, though (e.g. avoiding
infectability of your app by macro viruses like MSWord is vulnerable
to).
That option is to implement an interpreter for a custom language whose
code is written and read. It may again use sexps (and that will save
you from having to write a parser -- Clojure's reader will do that job
handily for you); the structure will probably be similar to make-fn
etc. directly above, but with call-fn invoking the interpreter on the
function source and the "function" object being just the source, not a
vector of the source and a compiled function. You might also create a
compiled representation (e.g. bytecode) and interpret that instead of
sexps in call-fn; make-fn* would compile a sexp to bytecode and the
"function" object would be a vector of bytecode. The bytecode or
directly-interpreted source would also be what got saved and loaded
from disk (so load-fn would no longer go through make-fn*, and in the
directly-interpreted case, make-fn* and make-fn would no longer exist
but you'd probably want call-fn to become call-fn* and have a call-fn
macro that quotes the second argument for you).
The var binding issue is not present in any of the three cases.
Closures would bind the appropriate vars where they appeared in your
project's source code and there'd be no mucking about with vars when
loading saved parameters to recreate a closure with particular
closed-over parameter values.
Eval would see whatever vars are visible in the environment in which
eval runs. Unfortunately, (doc eval) doesn't have much detail, and in
particular doesn't say what environment eval sees, though it doesn't
seem to include local variables in the eval's lexical surrounds:
user=> (let [x 1] (eval 'x))
#<CompilerException java.lang.Exception: Unable to resolve symbol: x
in this context (NO_SOURCE_FILE:10)>
It does however seem to see global variables in the *current*
namespace when eval is called:
user=> (def x 1)
#'user/x
user=> (eval 'x)
1
user=> (ns goober)
nil
goober=> (def x 2)
#'goober/x
goober=> (defn foo [] (eval 'x))
#'goober/foo
goober=> (foo)
2
goober=> (ns user)
nil
user=> x
1
user=> (goober/foo)
1
Note that foo returns 1 when called from in user and 2 when called
from in goober. This can be fixed with syntax quote:
goober=> (defn foo [] (eval `x))
#'goober/foo
goober=> (foo)
2
goober=> (ns user)
nil
user=> x
1
user=> (goober/foo)
2
or by putting (def goober-ns *ns*) in goober and putting a (binding
[*ns* goober-ns] ... ) around the call to eval to force the eval to
occur in the goober namespace.
Finally, the interpreter option allows full freedom: any of your
application's vars can be made visible or not within the interpreter,
under whatever names, and each one may be made read-only or
redefinable as you see fit. (If any are redefinable with the changes
needing to be visible outside the interpreter, then the interpreter
needs to have an instruction that results in a set! call.) Again,
though, the interpreter is considerably more work and will run the
custom functions slower than the other two options.
Alyssa Kwan <alyssa...@gmail.com> writes:
> For what I'm doing (making functions durable), it raises the
> question: If you persist a function that points to a var, restart the
> JVM, and deserialize/load the function from a data store, what should
> happen?
So you're doing something like this?
(def x 5)
(def myref (dref (fn [] x) :somekey store))
I guess the issue we're talking about at the moment is persisting var
references, not fns (which as Ken mentioned is its own kettle of fish)
so I'll simplify to just a dref containing a reference to a var:
(def x 5)
(def myref (dref #'x :somekey store))
> 1) In the loading thread, if a var exists with the same namespace and
> symbol, set the internal reference to that var.
> a) If the function is then passed to a different thread that has a
> different var with that same namespace and symbol, the function will
> still point to the one that was in the loader thread at the time the
> function was deserialized/loaded.
I don't think this situation is possible because Namespace.mapping
(which maps symbols to vars and classes) is *not* thread-local. For a
given namespace and symbol all threads will resolve the same var object.
It's the binding of the Var to a value that can be thread-local.
You shouldn't have to worry about it, as it happens in the dynamic
environment when the fn is called, not when it is loaded.
> 2) In the loading thread, if a var does not exist with that same
> namespace and symbol:
> a) Throw an exception saying that the var doesn't exist.
> b) Create the var with no namespace and symbol and no value. Wait
> until the function is called to throw an exception saying that the var
> is unbound.
Ow, this is making my head hurt. ;-) Consider this fairly common case:
;; sydney.clj
(ns sydney)
(def some-dref (dref nil :somekey store))
;; melbourne.clj
(ns melbourne
(:require sydney))
(def some-var 1)
(dosync
(when (nil? @sydney/some-dref)
(ref-set sydney/some-dref #'some-var))
Suppose you execute melbourne.clj. What happens? On the first run:
1. sydney.clj is loaded, some-dref is initialized to nil.
2. melbourne.clj is loaded, the transaction is exeucted, some-dref is
now #'some-var, this gets saved to disk.
Then you restart the JVM and:
1. sydney.clj is loaded, some-dref's value is deserialized, but whoops,
#'some-var doesn't exist yet.
Okay, perhaps you can get around that by only deserializing when
first derefed, instead of at initialization like you currently do.
My preference would probably lean towards (2a). Anyone using drefs is
going to have to deal explicitly with the issue of objects that aren't
serializable (sockets, streams), so make them be very careful about what
they put in them.
> c) Create the var with no namespace and symbol but with the value
> that the var had in the persisting thread. There's no way to access
> the var to modify it.
> d) Create the var with no namespace and symbol but with the value
> that the var had in the persisting thread. It's accessible somehow
> (maybe the meta map), so the user can recover it and dynamically bind
> over it.
> e) Create the var with the namespace and symbol in the function,
> modifying the RT var-space of the loading thread. Don't initialize
> the var. This gives the user a chance to dynamically bind over it.
> If the function is called before binding the var to something, throw
> an exception saying that the var is unbound.
> f) Create the var with the namespace and symbol in the function,
> with the value that the var had in the persisting thread.
Ppersisting the value of the var quickly leads to a cascade of extra
stored data, because you not only have to persist it, but any vars it
references in turn.
Setting :static aside for the moment, suppose you persist this:
(fn [x] (empty? x))
Then you have to persist the value of #'empty? which is:
(fn [coll] (not (seq coll)))
So then you have to persist #'not and #'seq as well and so on.
> The normal case (1) is straightforward, and (1a) is what would happen
> without se/des anyways. (2) is tricky. (2f) is most robust, but
> really violates least surprise. I think Meikel's comments lean
> towards (2a).
>
> Also, the thread-local nature of vars raises the question of what
> should happen when deserializing the same function from different
> threads. If I create durable refs from different threads pointing to
> the same store/key combo, does the thread that gets there first
> control which var the deserialized function references? Or does each
> thread get it's own ref? If so, how do you reconcile that there are
> multiple refs being stored under the same store/key? That's
> untenable, but it seems totally arbitrary to simply say that the first
> thread that gets there determines which var the function is bound to
> for the duration of the JVM instance.
As I said above, it's the value of the var that is thread-local, not the
var itself. So for any of the options where you're not persisting the
value, it's not a problem.
If you do want try to persist the value of the var, then you should
probably only worry about the root binding. The thread-local dynamic
bindings depend on the call stack of the accessing thread, so unless
you're planning on trying persist threads somehow, thread-local values
shouldn't be preserved between JVM instances. So again, no problem.
> What do you think?
For the sake of practicality, I'd probably just be totally draconian
about it and make fns and vars not durable at all. Almost any way you
do it they'd make the program very brittle. They'd tie using the data
very tightly to the structure of the source code, meaning that if you
change the program (say rename a var) you're likely to not be able to
read the data any more.
Cheers,
Alex
> Perfect! That makes things so much easier! I assume that interning
> vars is synchronized then? This is the second big source code read
> FAIL in two days. Obviously I can't read. :)
They're stored in an AtomicReference (basically an atom).
> All persistence requires dealing with migrations and compatibility.
> That's not a reason not to persist. What are needed are good tools/
> idioms for dealing with it. ORM has succeeded on that front because
> the data is arbitrarily readable and writeable with standard tools, so
> the engineer can always manually modify stuff to migrate it. If a
> Clojure object data store were arbitrarily readable and writeable,
> then this is the first step to solving the problem. The second step
> is Ruby-style migrations.
Fair enough, I'm looking forward to seeing what you come up with. :-)
> Because it's not at compile time, I don't have access to the expr that
> generates the function. Stuart Halloway mentioned an invoke-time
> check for recompilation, which I assume requires the function to hang
> onto the expr and lexical environment which generates it. AFAICT,
> there is no such reference, and invoke-time lookups through vars are
> still being used; I'm probably not looking in the right place.
I can't find something like that either and I can't see why it would
exist. Perhaps he was talking about JIT recompilation, not source
recompilation?
Eww. Wouldn't a java.util.concurrent.ConcurrentHashMap give better
performance in highly parallel situations?
Then again, var interning isn't exactly the commonest of run-time
operations, except when the program running is a development REPL, and
that tends to be I/O bound, spending most of its time waiting for a
response from the 30- to 40-baud Homo sapiens at the other end of the
connection.
Still, it probably wouldn't *hurt*. Who knows what future applications
might need to intern vars in a tight loop for some purpose? One day we
might have a fairly fast AI on the other end of that REPL instead of a
human. Or something.
>> Because it's not at compile time, I don't have access to the expr that
>> generates the function. Stuart Halloway mentioned an invoke-time
>> check for recompilation, which I assume requires the function to hang
>> onto the expr and lexical environment which generates it. AFAICT,
>> there is no such reference, and invoke-time lookups through vars are
>> still being used; I'm probably not looking in the right place.
>
> I can't find something like that either and I can't see why it would
> exist. Perhaps he was talking about JIT recompilation, not source
> recompilation?
If so, Alyssa's concern is moot; JIT recompilation would just
translate the pre-existing bytecodes in a deterministic manner. If the
bytecodes work properly when invoked and the JIT compiler lacks bugs,
the JIT-compiled code will also work properly, lexical environment be
damned.
If the recompilation at issue involves regenerating bytecode from
s-expressions, on the other hand, it's a whole 'nother kettle of fish.