The code is also available in the load-tracking branch of my fork of clojure at github: http://github.com/scgilardi/clojure/tree/load-tracking .
Motivation:
Clojure's current load/require/use code pre-dates aot compilation. The "require" mechanism and the aot compiler each implement separate "load only if not already loaded" and "load compiled if its available and fresh" policies. The policies are not unified which led to some problems. We have a couple of tickets related to :reload and :reload-all not working properly in all cases and one requesting we restore detection of cyclic load dependencies. That detection has been disabled since it broke in the presence of aot-compilation and gen-class.
The "what's already loaded" tracking currently used by "require" is primitive. It only skips loading if a lib has already been loaded. It leaves reloading as a (frequent!) problem for the developer. The new code is integrated with aot-compilation and determines how to load a lib based on whether or not it's already been loaded (compiled code can be loaded only once) and the timestamps and availability of the lib's source and compiled code resources.
Changes:
Here is a summary of the changes:
- We've needed a name for "the unit of clojure code associated with a source code (".clj") resource". I chose the name "unit" based on that description and with precedent in the C/C++ notion of "translation unit"
- A unit may or may not be a lib. It is a lib if it contains an ns form that creates a namespace of the same name.
- Clojure now tracks each unit's direct dependencies. If B loads while A's load is the first one pending, B is a direct dependency of A.
- Every top level load (load with no other load pending) checks the tree of dependencies rooted at the unit being loaded and ensures that what's loaded for the entire tree is up to date with what's in the resources in classpath.
- Every load is skipped if what's already loaded is up to date with what's in classpath, i.e., "require" is implied for all loads. (This can be overridden for a particular load using the new :force flag if, for example, the unit has side effects on load)
- Cyclic dependencies are detected and throw an exception with a helpful diagnostic message. Here's one produced by test code:
"Cyclic load dependency: load.cyclic3->[ load.cyclic4 ]->load.cyclic5->load.cyclic6->[ load.cyclic4 ]"
- (load-records) provides a dump of what Clojure is tracking about the state of loaded units
- Clojure warns when a code resource is stale (needs reload) but a source code resource for it is not available.
- Hooks are available for tools to take over reporting of :verbose output and/or the load warning
- The set of acceptable arguments for cloure.core/load has changed in a backward compatible way. It now accepts arguments just like require/use/compile with the addition that one can also specify units to load using paths in classpath (strings rather than symbols).
- clojure.core/compile now accepts arguments just like load with the exception of paths.
- Gen-class initializers now use clojure.core/load-impl which is exactly like load except that it's always considered "top level". This avoids confusion about whether a gen-class'd class initializer (which can only be loaded once) is a dependency of the lib that's loading when it loads. Logically it is not a dependency and using load-impl implements that.
- loading can be forced with a new ":force" flag indicating "load a unit even if it's not stale"
- A new :maybe flag is available to suppress the exception that's normally thrown if a unit's code is not available.
- changed RT.java to use this instead of its own "maybeLoadResourceScript" for loading user.clj
- Moved loading of all clojure.core dependencies into clojure.core.
- The legacy :reload and :reload-all flags are no longer necessary, but are still accepted as no-ops.
- clojure.core/refresh will refresh either a set of named units or all loaded units. Refreshing this way is a little more efficient than loading all the top-level libs individually.
- Even though the load code is defined within clojure.core, clojure.core is tracked as a unit like any other.
- another unrelated bug currently prevents reloading clojure.core, but in principle that could be done.
- Made clojure.lang.RT/compile public so it can be called via the updated clojure.core/compile
- Added tests for the new load that use a new test-stage and manipulate compiled and source resources to check expected outcomes.
- Moved the "load" code closer to the bottom of core.clj because it uses features that were previously defined below it.
Non-changes:
This patch does not change the clojure.core/refer behavior, or aguments of (:require) and (:use). We've discussed desired changes for the dependency clauses in "ns" and I'd like to work on those next.
Impact on Contrib:
All of contrib loads and tests fine without changes with the new load code. The only exception is "clojure/contrib/pprint/cl-format.clj" whose name (arguably) should always have been cl_format.clj (underscore instead of hyphen). That convention is now enforced. Until contrib is updated, you can work around the problem by symbolic linking cl_format.clj to cl-format.clj.
I'd appreciate you giving this updated load code a try and helping me polish and refine it with the goal of making it acceptable as a part of clojure.core.
--Steve
Thanks for your response!
On Nov 25, 2009, at 6:49 AM, Laurent PETIT wrote:
> Seems like a tremendous work, Stephen !
Thanks very much!
> Overall, all this seems very cool, and, I guess, would also help IDEs
> do more incremental reloading of an entire project, which is really,
> really cool, to have at clojure level, rather than reimplemented at
> each IDE level !!
>
> Without looking at the code, but reacting to your email :
>
> First, a general question concerning the context: is this mostly a
> "single-brain" work on which you would now like others to react upon,
> or most of the "specs" parts have already been debated on #irc and I
> didn't see it in the logs ?
The specs have not been debated on #irc. I've consulted with a few people about whether what I had in mind seemed reasonable, but this is the first exposure these ideas have had to a wider audience. I welcome your questions and feedback and suggestions and those of everyone else with an interest in it.
The new load code is my attempt at addressing the deficiencies I see in the old load code including the tickets noted and the fact that :reload and :reload-all were [1] not working reliably, and [2] (in my opinion) put too much of the burden of knowing the "state" of resources on classpath vs. what's loaded in memory on the developer when that's something that Clojure can track well.
The general concept of using the mod dates on resources to determine what to load is inspired by the loader code Rich added to load the results of aot compilation.
> Then, a general remark: this stuff is important enough that it
> deserves its own page on the wiki, even at this stage.
I'll look at doing that, thanks for the suggestion.
> And now the questions / remarks in your email:
>
> 2009/11/24 Stephen C. Gilardi <scgi...@gmail.com>:
>> I've uploaded "load-tracking.diff" which includes changes and tests for an update to clojure's load/require/use code. This fixes tickets 8, 42, and 113, though currently I've only updated ticket 8.
>>
>> The code is also available in the load-tracking branch of my fork of clojure at github: http://github.com/scgilardi/clojure/tree/load-tracking .
>>
>> Motivation:
>>
>> Clojure's current load/require/use code pre-dates aot compilation. The "require" mechanism and the aot compiler each implement separate "load only if not already loaded" and "load compiled if its available and fresh" policies. The policies are not unified which led to some problems. We have a couple of tickets related to :reload and :reload-all not working properly in all cases and one requesting we restore detection of cyclic load dependencies. That detection has been disabled since it broke in the presence of aot-compilation and gen-class.
>>
>> The "what's already loaded" tracking currently used by "require" is primitive. It only skips loading if a lib has already been loaded. It leaves reloading as a (frequent!) problem for the developer. The new code is integrated with aot-compilation and determines how to load a lib based on whether or not it's already been loaded (compiled code can be loaded only once) and the timestamps and availability of the lib's source and compiled code resources.
>
> With this work may be the opportunity to also cut down unnecessary
> complexity. I mean getting rid of this "lib/libspec" terminology,
> which, apart from trying to abstract away from the concept of
> namespace, does not add anything since there's a 1-1 relationship
> between the 2.
There is not a one to one relationship between libs and namespaces: they are different objects with different behaviors. As an example, one can create and use namespaces at the repl with no resources involved. As another, the "user" namespace exists whether or not there is a user.clj resource in classpath. A lib has exactly one namespace, but a namespace may or may not have a lib.
> Why not just s/lib/ns or s/lib/namespace and s/libspec/nsspec everywhere ?
> This would remove one concept that, as far as I'm concerned, does not
> bring anything to the table, and since by definition there's a 1-1
> relationship between lib and ns, does not have great promises to bring
> anything in the future, too.
Early on in designing Clojure's current loading mechanism, that suggestion came up and got (I think properly) a veto from Rich.
Libs and namespaces are different and pretending they're the same, while possibly removing some terms from our lexicon, does not make discussing the concepts they embody any clearer.
>> Changes:
>>
>> Here is a summary of the changes:
>>
>> - We've needed a name for "the unit of clojure code associated with a source code (".clj") resource". I chose the name "unit" based on that description and with precedent in the C/C++ notion of "translation unit"
>
> Why not just call this as is currently stated in the documentation for
> 'load : a resource (or a file) ? I'm not sure it's worth introducing a
> new term/concept just for that. But I'll let the specialists have the
> final word on that :)
The unit is:
- the code: definitions and expressions
- whether a unit is loaded from its .clj file or from the aot-compiled representation of its .clj file, it's still the same unit
- a unit is invariant under (correct) compilation. When compiling, the form of the code changes from UTF8 representing Clojure objects to .class files containing bytecode, but the essence of the code remains unchanged.
Resources are:
- files on disk, or
- binary streams in jar files, or
- (in principle) any other binary streams accessible by a classloader.
A unit has (in the general case) many resources associated with it:
- at most one "<unit-name>.clj" resource, and
- at most one "<unit-name>__init.class" resource, and
- any number of ".class" resources containing the compiled form of the unit's definitions.
The reason to introduce unit is that not every group of "clojure code associated with a single .clj file" defines a namespace. We considered enforcing that restriction early on, but that ends up making writing Clojure code more restrictive and burdensome than it needs to be. Consider the unit clojure.contrib.pprint.cl-format . It's perfectly reasonable for it to define names in the clojure.contrib.pprint namespace while still being in a separate unit due to its own size and internal cohesiveness. We could restrict the lib writer to having *all* the definitions for a namespace in one resource, but if we don't make that restriction (and I think we should not) then we must allow non-lib units.
> I'm asking because I think every new concept introduced in the
> terminology should bring heavy stuff with it, or it's just another
> thing to think about :-(
I simplified the model as much a I thought was possible without losing important distinctions. Let's come up with the best terminology that gives a name to every important concept, but does not contain any extra names.
>> - A unit may or may not be a lib. It is a lib if it contains an ns form that creates a namespace of the same name.
>>
>> - Clojure now tracks each unit's direct dependencies. If B loads while A's load is the first one pending, B is a direct dependency of A.
>>
>> - Every top level load (load with no other load pending) checks the tree of dependencies rooted at the unit being loaded and ensures that what's loaded for the entire tree is up to date with what's in the resources in classpath.
>
> See, in the above sentence, just replacing unit with resource would be
> enough, and I guess, also enough for the level of abstraction. No need
> to try to abstract away from the concept of resource, I think.
I addressed this above.
>> - Every load is skipped if what's already loaded is up to date with what's in classpath, i.e., "require" is implied for all loads. (This can be overridden for a particular load using the new :force flag if, for example, the unit has side effects on load)
>>
>> - Cyclic dependencies are detected and throw an exception with a helpful diagnostic message. Here's one produced by test code:
>>
>> "Cyclic load dependency: load.cyclic3->[ load.cyclic4 ]->load.cyclic5->load.cyclic6->[ load.cyclic4 ]"
>>
>> - (load-records) provides a dump of what Clojure is tracking about the state of loaded units
>>
>> - Clojure warns when a code resource is stale (needs reload) but a source code resource for it is not available.
>>
>> - Hooks are available for tools to take over reporting of :verbose output and/or the load warning
>>
>> - The set of acceptable arguments for cloure.core/load has changed in a backward compatible way. It now accepts arguments just like require/use/compile with the addition that one can also specify units to load using paths in classpath (strings rather than symbols).
>
> Above again I would not be shocked if you replace unit with resource.
>
>>
>> - clojure.core/compile now accepts arguments just like load with the exception of paths.
>>
>> - Gen-class initializers now use clojure.core/load-impl which is exactly like load except that it's always considered "top level". This avoids confusion about whether a gen-class'd class initializer (which can only be loaded once) is a dependency of the lib that's loading when it loads. Logically it is not a dependency and using load-impl implements that.
>>
>> - loading can be forced with a new ":force" flag indicating "load a unit even if it's not stale"
>>
>> - A new :maybe flag is available to suppress the exception that's normally thrown if a unit's code is not available.
>> - changed RT.java to use this instead of its own "maybeLoadResourceScript" for loading user.clj
>> - Moved loading of all clojure.core dependencies into clojure.core.
>>
>> - The legacy :reload and :reload-all flags are no longer necessary, but are still accepted as no-ops.
>
> Would it make sense to make :reload / :reload-all have the same
> behaviour as the :force flag, then ?
The behavior of reload-all is not currently available in the new package. In the past :reload and reload-all were the only way to freshen stale code in memory. Now refreshing after every top level load is the default. I think :force should be used only rarely as I think code we load should be exclusively definitions and not contain side effects that execute as the code is loaded or reloaded.
This does beg the question "what should I use if the file I want to load is all about side effects: some kind of script I want to execute". I've argued we should standardize on a notion of providing some kind of (core) "run" or "run-script" command that would:
- load (require) a lib definition containing only definitions, and
- jump to a well-known entry point (much like "main" for Java or C).
I think that would be a great way to do scripts while keeping loading free of side effects other than the essential side effect of making code available to run.
>> - clojure.core/refresh will refresh either a set of named units or all loaded units. Refreshing this way is a little more efficient than loading all the top-level libs individually.
>
> I'm confused by the above sentence. In the beginning you write about
> units, and in the end you write about libs. And you write "loading
> libs", but to me "loading" is about resources (or units if you like),
> and "requiring/using" is about namespaces (or libs if you like).
My thought there was that non-lib units exist to cooperate in the implementation of libs. In the common case top-level loads will be libs. That's not necessarily true though and I should have said "top level units" there.
I do think the name "load" is not quite right for something that may or may not load. "require" is better for that. We could make "load" always :force and leave "require" as "load if necessary". Any opinions on that?
> And also, does "refresh" mean that they will be reloaded ? That is if
> version n-1 of a resource (sorry, unit :)) defined symbols a, b and c,
> and version n defines just a, b (no more c), what will be the state
> after the refresh ?
They will be reloaded if their resources are newer than what's in memory. This doesn't change Clojure's behavior regarding old definitions in namespaces. In the example you give, c will still exist. Only definitions that are new or changed will be "fresh" after the refresh. It's still possible for other code to be referring to things that are no longer defined by the unit. That's a kind of "staleness" that this patch does not address.
I think it's worth discussing whether we should clean up that form of staleness as well. Upon encountering an "ns" form while loading, we could save the current ns-map for the namespace (if any), note all the definitions made while the lib's root unit loads, and then if there are any definitions in the saved ns-map that were not made again while loading, they could be ns-unmapped. This would preserve the integrity of all prior references to vars that *changed*, but not allow new resolution of vars that were *deleted*.
> Will I see c from version n-1, or no more c ?
You will see c from version n-1.
> If c is still visible, how different then is clojure.core/refresh from
> clojure.core.load with the :force option ?
Refresh does not :force. It checks dependency trees for staleness and reloads any units whose resources are out of date relative to what's in memory. Refresh of a single lib is the same as load of a single lib. Refresh of a group of libs is identical in effect, but a little more efficient than loading each of them individually (The efficiency comes from checking the staleness of the set of libs all together rather than each one individually). Refresh without arguments efficiently refreshes all loaded code.
>> - Even though the load code is defined within clojure.core, clojure.core is tracked as a unit like any other.
>
> Now I'm totally confused. Because you use the term "unit" for a
> namespace (clojure.core), which is defined through several files (and
> you defined a unit as the clojure code associated with the content of
> a clj file).
In addition to being the name of a namespace, the symbol clojure.core is also the name of a unit. That unit is also the root unit for a lib. The source resource for that unit is found at clojure/core.clj within classpath.
The symbol clojure.core names:
[1] a namespace
[2] a lib
[3] a unit (the root unit for the lib)
[4] a group of units (all the units for the lib)
All libs have a root unit whose name is the same as the namespace name. Libs may or may not also contain other units that don't define a namespace of their own, but instead help to define the lib's namespace.
There is perhaps an opportunity for simplification here. Remove item [4] from the above list and make this definition:
A unit is the body of clojure code associated with a ".clj" resource.
A lib is a unit that contains an "ns" form defining a namespace with the same name as the unit.
The fact that a lib may pull in other non-lib units to complete the definition of the namespace is perhaps not an important concept for us to track.
>> - another unrelated bug currently prevents reloading clojure.core, but in principle that could be done.
>>
>> - Made clojure.lang.RT/compile public so it can be called via the updated clojure.core/compile
>>
>> - Added tests for the new load that use a new test-stage and manipulate compiled and source resources to check expected outcomes.
>>
>> - Moved the "load" code closer to the bottom of core.clj because it uses features that were previously defined below it.
>>
>> Non-changes:
>>
>> This patch does not change the clojure.core/refer behavior, or aguments of (:require) and (:use). We've discussed desired changes for the dependency clauses in "ns" and I'd like to work on those next.
>>
>> Impact on Contrib:
>>
>> All of contrib loads and tests fine without changes with the new load code. The only exception is "clojure/contrib/pprint/cl-format.clj" whose name (arguably) should always have been cl_format.clj (underscore instead of hyphen). That convention is now enforced. Until contrib is updated, you can work around the problem by symbolic linking cl_format.clj to cl-format.clj.
>>
>> I'd appreciate you giving this updated load code a try and helping me polish and refine it with the goal of making it acceptable as a part of clojure.core.
With Tom's permission, I've changed the name of cl-format.clj to cl_format.clj.
Thanks again for your questions and comments!
--Steve
> Hi,
>
> On Nov 26, 8:38 am, "Stephen C. Gilardi" <squee...@mac.com> wrote:
>
>>> Seems like a tremendous work, Stephen !
>>
>> Thanks very much!
>
> Yes. Great work! The first require/use/load framework was already
> awesome. And I have no doubt, that you will iron out its issues with
> new proposal!
Thanks for the vote of confidence!
>> The behavior of reload-all is not currently available in the new package. In the past :reload and reload-all were the only way to freshen stale code in memory. Now refreshing after every top level load is the default. I think :force should be used only rarely as I think code we load should be exclusively definitions and not contain side effects that execute as the code is loaded or reloaded.
>
> This is a difficult point! There are subtle side effects, which happen
> while loading a unit.
>
> Clojure 1.1.0-alpha-SNAPSHOT
> user=> (load-file "foo.clj")
> #'foo.bar/xxx
> user=> (foo.bar/xxx "Hello")
> java.lang.IllegalArgumentException: No method in multimethod 'xxx' for
> dispatch value: class java.lang.String (NO_SOURCE_FILE:0)
> user=> (load-file "frob.clj")
> #<MultiFn clojure.lang.MultiFn@9d6065>
> user=> (foo.bar/xxx "Hello")
> Hello
> nil
> user=> (load-file "foo.clj")
> #'foo.bar/xxx
> user=> (foo.bar/xxx "Hello")
> java.lang.IllegalArgumentException: No method in multimethod 'xxx' for
> dispatch value: class java.lang.String (NO_SOURCE_FILE:0)
>
> So refresh might break your code! It must load *all* depending code.
> Even if its source didn't change!
The new load does load all dependent code that may be stale even if the source for a particular unit didn't change. It checks the entire tree of dependencies, not just a particular unit. Just to be clear, the load-file function you're using in your example is low-level code that was not affected by the changes I made. It's doing things "the old way". Can you please post the source code for all the .clj files that participate in your example? To try them with the new code, the calls would look like this:
(load 'foo)
(foo.bar/xxx "Hello")
(load 'frob)
(foo.bar/xxx "Hello")
(load 'foo)
(foo.bar/xxx "Hello")
(String arguments to load are also supported, but symbols are canonical.)
>> Refresh does not :force. It checks dependency trees for staleness and reloads any units whose resources are out of date relative to what's in memory. Refresh of a single lib is the same as load of a single lib. Refresh of a group of libs is identical in effect, but a little more efficient than loading each of them individually (The efficiency comes from checking the staleness of the set of libs all together rather than each one individually). Refresh without arguments efficiently refreshes all loaded code.
>
> As showed above, I think this is not enough. refresh needs to reload
> all the dependent libs in case a lib is out-of-date and hence
> reloaded.
I believe this is already implemented in what I posted. I look forward to seeing whether or not your example works properly with it.
> Thanks to you for doing this awesome work!
:-)
--Steve
> So refresh might break your code! It must load *all* depending code.
> Even if its source didn't change!
I'd still like to get your example files, but I see your point now about side-effects (the side effect of a defmulti deleting its old method table). With the new code I believe this can only happen when using low-level calls like load-file or by using :force.
I do have a fix in mind: loading a unit with :force should mark it as stale and then refresh all loaded units. This will reload both the :force'd unit and any that depend on it.
It's also feasible to fix up the calls like load-file to ensure that all loads are tracked and there is no low-level bypass. The low-level code would only load core.clj. I'll look at that.
Nice catch.
Thanks,
--Steve
Am 26.11.2009 um 17:44 schrieb Stephen C. Gilardi:
> On Nov 26, 2009, at 4:50 AM, Meikel Brandmeyer wrote:
>
>> So refresh might break your code! It must load *all* depending code.
>> Even if its source didn't change!
>
> I'd still like to get your example files, but I see your point now
> about side-effects (the side effect of a defmulti deleting its old
> method table). With the new code I believe this can only happen when
> using low-level calls like load-file or by using :force.
The example files were pretty simple. One was:
(ns bar)
(defmulti xxx type)
The other was:
(ns baz
(:require bar))
(defmethod bar/xxx String [x] (println x))
I just wanted to demonstrate the issue. One would normally not
conceive defmethod as a side-effect. I was caught by this before. From
your description of refresh it was not clear to me how this situation
is handled. So I thought I point it now. If the new code handles that
correctly then we are fine anyway. If not, now is the time to think
about this issue.
I tested this against your github branch with above settings and a
named refresh will break things:
user=> (require 'baz)
nil
user=> (bar/xxx "Hello")
Hello
nil
; Changed bar.clj here...
user=> (refresh 'bar)
nil
user=> (bar/xxx "foo")
java.lang.IllegalArgumentException: No method in multimethod 'xxx' for
dispatch value: class java.lang.String (NO_SOURCE_FILE:0)
user=>
We have to track, which namespaces depend in the refreshed one. So we
basically need a map {bar #{baz} baz #{}}. Then the system knows: if I
reload bar, I also have to reload bar - even if its source is up-to-
date.
The full refresh, will not have this issue, because it also loads the
unit when any dependency was reloaded. But this check is not done in
the case of a named refresh.
And indeed:
user=> (require 'baz)
nil
user=> (bar/xxx "Hello")
Hello
nil
; Changed bar.clj here...
user=> (refresh)
nil
user=> (bar/xxx "Hello")
Hello
nil
Hope this helps.
Sincerely
Meikel