I am still concerned about reference semantics in Julia. It appears (obviously ;), there is a strong bias towards references on assignments and function calls among Julias developers. Besides the respect this deserves, I want to count some potential problems this may introduces.
For me, reference semantics is an artefact from the movement of numerical applications towards general purpose languages. At first, nobody really wanted reference semantics, but languages like C and Java simply did not offer the option. Mathematical DSLs (e.g. FORTAN, Matlab) usually provide value semantics (if the underlying GPL allows it, see numpy, where it does not). I think, it is even fair to state, that a reasonable number of potential users of a numerical DSL doesn't even know about the concept of pointers in C. If one goal of a good language design is to give convenience _and_ performance, what's left is: value semantics with lazy copies on write.
IMO, the advantages of a copy on write system outweight its disadvantages (more complexity, but it can be hidden from the user). I understand that every developer/designer likes to keep things simple. But for me, forcing reference semantics for numerical arrays is keeping things simpler only for the language designer. Reference semantics violate an important rule in SW development: decoupling. The user is faced with much higher complexity. If algorithms grow, the bug potential does as well. I also like simple things though, but this value/reference semantics thing is just not simple.
One statement came up in this group several times: "copy on write introduces hidden performance costs and kills performance on surprise"
Could someone please clarify why this ? It is important, because the performance is the only argument left against value semantics AFAICS. Copy on write - if done right - does only copy in all cases where a copy is really needed. This corresponds to all situations where the user should have used copy() anyway. So why would copy on write use more memory / be slower than ? The "surprise" - if I get its meaning correct - arises, if mutating an array leads to a hidden copy of its storage. But, again, this happens only, when otherwise an explicit copy() was necessary. Forgetting explicit things (like "copy()") seems to introduce more potential for surprises.
There are other arguments against reference semantics:
* Functions can alter their arguments on surprise. This has been discussed here already but IMO not to an end. Marking mutating functions with ! is not feasable: i) it is hard to force this convention and in reality (just like in Julias current code base) will not be consistent. ii) new problems arise for functions with multiple arguments. Which is the one which is potentially mutated? Which one is used for inplace operations? iii) Nested functions. When writing a function the user is faced with a new complexity: she must take care which functions to use: ! or without !. Now from a users point of view, it is not only important, _what_ a function does, but also _how_. (The last point is a minor problem, I guess.)
* The cell issue: Having cells which are capable of storing arrays of arrays, it gets hard to think of any case where storing references inside cells would be needed. If one stores an array in a cell she wants to have it safe and be sure external modifications to the source array will not alter the copy inside the cell. So, storing into cells should always make a copy. The same is true for fetching an array from a cell. Problems arise, if cells are stored into cells. Without copy on write, one will end up getting cascading copies - which _really_ kills performance.
Regarding slices: I personally dont care, if a slice produces a copy or not. Cases, where creating _and working with_ a "reference slice" efficiently are limited. (see: strided storage in numpy). So I rather have a copy coming out of subarrays than having to keep in mind potential side effects on write. Having built a system with such reference slices ourself, we turned away from it and since than concentrate on improving the memory management. The higher cost of most copies can be kept very small, if the target memory comes from a pool (and hence still stays in the cache). This holds for a general copy on write mechanism as well.
I apologize if this topic is pushed too much here. But I still think, it deserves a reasonable, conscious decision. At the end, both schemes are usable of course (not sure about cells though). And I promise, once the Julia staff declares a design lock on the issue - I will keep silence once for all .. :)