Feature request: Type invariants

138 views
Skip to first unread message

Øyvind Grotmol

unread,
Mar 7, 2012, 4:27:02 AM3/7/12
to julia-dev
Hi,

Congratulations on the launch of a beautiful new language!

In the Julia Manual page for Constructors, there is an example as
follows:

type OrderedPair
x::Real
y::Real

OrderedPair(x,y) = x > y ? error("out of order") : new(x,y)
end

How about introducing type invariants as a language feature?
I think it would be better to write this type along the lines of:

type OrderedPair
x::Real
y::Real

invariant: x <= y
end

The most important reason is that now the invariant can be checked
after each time the fields are updated. Positive side effects are that
the intent is clearer from the code, and that the default constructor
can be used, and that if multiple inner constructors are provided,
then the invariant would apply to all of them.

One question is of course when the invariant should be checked (in
addition to right after the constructor). It cannot be checked after
every individual field update, because sometimes the invariant will
only hold after a group of several fields have been updated. One
natural way of addressing that could be to use tuple assignments in
such cases, like "pair.x, pair.y = (new_x_value, new_y_value)", and
only check the invariant after all the fields have gotten their new
value. In Eiffel this is solved by checking the invariants at the end
of every public function call on the object - in Julia there are no
function calls "on" the object, but possibly an alternative way could
be to check the invariants at the end of every function call that
modifies an object's fields.

I hope you would consider this feature suggestion!

Best regards,
Øyvind Grotmol

Stefan Karpinski

unread,
Mar 7, 2012, 6:24:58 PM3/7/12
to juli...@googlegroups.com
Thank you! I'm glad you appreciate the design.

I actually proposed invariants like this at an early stage of Julia's development. However, it became clear after a while that this style of programming was against the spirit of the language. Specifically, one of the principles of Julia is that there is no magic: the code you write gets executed; what isn't written doesn't. It's a bit like C that way. Having what is essentially assertion code running automatically every time you change an object is probably too much overhead and certainly too much magic.

That being said, I'll let the cat out of the bag a bit here and say that Jeff and I have been tossing around the idea of making all composites immutable. In order to have mutable fields you would actually need to make them references — a reference would be a special type that holds a single value and the value it points to can be changed. This is similar to a construct by the same name in Clojure. (Arrays would remain mutable, however; mutability of arrays is far too central to efficient linear algebra to be done away with.) An implication of composite immutability is that every object is guaranteed to be as it was produced by an inner construtor, making the current soft guarantee a hard one. For example, it would be impossible to have a Rational number that was not in lowest terms, since the construtor guarantees that property and there would be no way to alter a Rational value after construction.

kem

unread,
Mar 7, 2012, 7:19:14 PM3/7/12
to juli...@googlegroups.com
Do you mind saying a bit about your thinking--in particular, reasons why you might not want immutable composites? I can't give any specific examples, but I am generally wary about the idea of immutable composites, for reasons that are vaguely similar to why you wouldn't want arrays to be immutable. It seems you might want to leave some flexibility for unforseen use cases.

I'm having a blast learning Julia, by the way. It's a lot of fun and a breath of fresh air.

Jeff Bezanson

unread,
Mar 7, 2012, 7:24:42 PM3/7/12
to juli...@googlegroups.com
It's not that it will be impossible to have a modifiable field, just
that immutability will be the default. Mutable fields will have to be
specially marked, and will have a bit of extra overhead. But in
exchange, what we feel are the most common cases can be much more
efficient. This will also let us make the memory layout for composites
and arrays of composites the same as C structs and arrays of structs.

Stefan Karpinski

unread,
Mar 7, 2012, 7:29:25 PM3/7/12
to juli...@googlegroups.com
This will also let us make the memory layout for composites and arrays of composites the same as C structs and arrays of structs.

This *the* prime motivation: compact storage of arrays of composite values. This is really huge for big data stuff and something that no languages but C/C++ manage. You simply cannot have an inline array of objects in Java, for example: you can do it for ints, but not for Integers, and also not for some more complex class.

As a side-effect, it may be really good for concurrency, but honestly that's actually not our prime motivation (unlike Clojure where that seems to be the main driver for wanting all data structures to be immutable).

Konrad Hinsen

unread,
Mar 8, 2012, 3:35:26 AM3/8/12
to juli...@googlegroups.com
Stefan Karpinski writes:

> This will also let us make the memory layout for composites and
> arrays of composites the same as C structs and arrays of
> structs.
>
> This *the* prime motivation: compact storage of arrays of composite
> values. This is really huge for big data stuff and something that
> no languages but C/C++ manage. You

That would be very nice indeed. It makes array operations a lot more
useful to be able to have efficient arrays of composites.

> As a side-effect, it may be really good for concurrency, but
> honestly that's actually not our prime motivation (unlike Clojure
> where that seems to be the main driver for wanting all data
> structures to be immutable).

In Clojure the main motivation is indeed concurrency, but immutable
data structures also make for more robust code in non-parallel
applications once they reach a certain size. I wonder if anyone has
ever done statistics on the causes of hard-to-find bugs. In my
personal experience, whenever it took me more than a day to track down
a bug, the cause was almost always some piece of data that was
modified in an unexpected way.

Konrad.

Øyvind Grotmol

unread,
Mar 8, 2012, 7:00:11 AM3/8/12
to julia-dev
On 8 Mar, 01:29, Stefan Karpinski <ste...@karpinski.org> wrote:
> > This will also let us make the memory layout for composites and arrays of
> > composites the same as C structs and arrays of structs.
>
> This *the* prime motivation: compact storage of arrays of composite values.
> This is really huge for big data stuff and something that no languages but
> C/C++ manage. You simply cannot have an inline array of objects in Java,
> for example: you can do it for ints, but not for Integers, and also not for
> some more complex class.

Mostly for my own curiosity, could you outline the technicalities of
how the compact arrays of immutable types would work?

Let's call a type "closed" if it is not abstract and all its fields
(if any) are declared to be of a closed type. Then it is known exactly
how large the whole data structure is, and like C structs, an array of
a closed type can be efficiently packed inline in memory. As someone
who doesn't know the inner workings of Julia, I would have thought
this would be the way to implement tight packing. As a user, I would
be very happy with such a feature - whenever I have a lot of data to
store, I would declare all the fields to be of a non-abstract type,
and get the same performance as with C structs.

-Øyvind

Stefan Karpinski

unread,
Mar 8, 2012, 12:41:14 PM3/8/12
to juli...@googlegroups.com
On Thu, Mar 8, 2012 at 3:35 AM, Konrad Hinsen <google...@khinsen.fastmail.net> wrote:
 > As a side-effect, it may be really good for concurrency, but
 > honestly that's actually not our prime motivation (unlike Clojure
 > where that seems to be the main driver for wanting all data
 > structures to be immutable).

In Clojure the main motivation is indeed concurrency, but immutable
data structures also make for more robust code in non-parallel
applications once they reach a certain size. I wonder if anyone has
ever done statistics on the causes of hard-to-find bugs.  In my
personal experience, whenever it took me more than a day to track down
a bug, the cause was almost always some piece of data that was
modified in an unexpected way.

Yes, this is very true. That's part of why strings are immutable in Julia. Also, there's an important psychological distinction between things we think of as mutable versus immutable: values that are *defined* by their values should be immutable, whereas values that *contain* other values but have an independent identity can be mutable. Numbers are a prototypical example of being defined by their values — if you change the imaginary part of a complex number, you have a different number, not the same number altered (it sounds weird to even talk about "altering" a number). Arrays are the prototypical containers: change the contents of an array all you want and it still has an independent identity (psychologically).

The reason I believe strings are so dangerous as mutable values in high-level languages is that we think of them as being defined by their value: the string "foo" and "fox" are simply different values, regardless of whether the former was altered into the latter using the same memory. However, as a legacy from the implementation and the fact that in C they are just arrays of bytes, a low-level view of them is as containers. In C that's natural and fairly unsurprising because you're used to thinking about the actual memory buffer that represents a string. In a higher level language, it's really, really surprising when a string changes value because you passed it to some other function. Some of the nastiest bugs I've ever had to find were due to this.

Stefan Karpinski

unread,
Mar 8, 2012, 1:00:04 PM3/8/12
to juli...@googlegroups.com
On Thu, Mar 8, 2012 at 7:00 AM, Øyvind Grotmol <oyv...@grotmol.net> wrote:
 
Mostly for my own curiosity, could you outline the technicalities of
how the compact arrays of immutable types would work?

Let's call a type "closed" if it is not abstract and all its fields
(if any) are declared to be of a closed type. Then it is known exactly
how large the whole data structure is, and like C structs, an array of
a closed type can be efficiently packed inline in memory. As someone
who doesn't know the inner workings of Julia, I would have thought
this would be the way to implement tight packing. As a user, I would
be very happy with such a feature - whenever I have a lot of data to
store, I would declare all the fields to be of a non-abstract type,
and get the same performance as with C structs.

If you have an array of type Array{T} where T is a concrete type like Rational{Int}, with immutable composites, it would be possible to just store all the array data like a C struct {long,long} array — or effectively just a big array of longs. The real question is why immutability is necessary for this. Here's the example that hopefully demonstrates the issue:

julia> v = [1//2]
[1//2]

julia> q = v[1]
1//2

julia> v[1].den = 3
3

julia> v
[1//3]

julia> q
1//3

These semantics can't be supported without having v be a vector of pointers to heap-allocated objects: v[1] and q refer to the *same* rational object. That prevents compact C-like storage. This is why languages like C# have "value types". We're basically removing this semantic restriction by making the line `v[1].den = 3` illegal: you can't change the denominator of a rational number after it's created; instead you have to make a new rational number and assign it into the array:

julia> v = [1//2]
[1//2]

julia> q = v[1]
1//2

julia> v[1] = 1//3
[1//3] # see issue #544

julia> v
[1//3]

julia> q
1//2

This way, there's no question that v[1] and q should somehow be the same object.

Konrad Hinsen

unread,
Mar 8, 2012, 3:00:49 PM3/8/12
to juli...@googlegroups.com
Stefan Karpinski writes:

> Yes, this is very true. That's part of why strings are immutable in
> Julia. Also, there's an important psychological distinction between
> things we think of as mutable versus immutable: values that are
> *defined* by their values should be immutable, whereas values that
> *contain* other values but have an independent identity can be
> mutable.

Right. Nevertheless, immutable container types are of interest as
well, implemented by what the functional programming community calls
"persisten data structures": there are no operations to change those
data structures, but there are operations that produce modified
versions that share most of the memory storage with the original
version, which continues to exist.

The reason for having immutable containers is again robustness, in
particular, but not exclusively, with concurrency. I find myself
agreeing more and more with Rich Hickey, the inventor of Clojure, who
claims that "immutability is the right default". We do need mutable
containers, and in particular mutable arrays, for doing many things
efficiently, but it's usually best to start with immutable containers
and switch to mutable when required for efficiency.

> about the actual memory buffer that represents a string. In a
> higher level language, it's really, really surprising when a string
> changes value because you passed it to some other function. Some
> of the nastiest bugs I've ever had to find were due to this.

Me too!

Konrad.

Stefan Karpinski

unread,
Mar 8, 2012, 3:32:40 PM3/8/12
to juli...@googlegroups.com
Building a persistent collections library in Julia is certainly a cool project. The basic built-in arrays have to be mutable, however. But I think Rich Hickey is quite right about immutability being the right default. Hence our plan for composites.

Konrad Hinsen

unread,
Mar 9, 2012, 2:11:06 AM3/9/12
to juli...@googlegroups.com
Stefan Karpinski writes:

> Building a persistent collections library in Julia is certainly a
> cool project.

Once there are immutable composites, I don't see why this couldn't be
done.

> The basic built-in arrays have to be mutable, however.

Indeed. No one has yet come up with a good immutable substitute for arrays.

Konrad.

kem

unread,
Mar 9, 2012, 3:22:01 PM3/9/12
to juli...@googlegroups.com
Thanks for the example. After reading this and the other posts, I understand why immutability by default is a good idea. I'm not sure why this didn't occur to me before--I think I was thinking about the underlying memory-reference issues separately from other issues.

Jeff mentioned something about allowing optional mutability through special marking -- have you thought about the details about how this might be implemented?

Jeff Bezanson

unread,
Mar 9, 2012, 3:30:38 PM3/9/12
to juli...@googlegroups.com
Yes; we would use the "assignable cell" model. "Mutable" fields would
actually point to a "value cell" which is the true mutable object
(much like a 1-element array). You wouldn't have to deal with the cell
directly; it would happen automatically when reading and writing the
field. This allows the containing object to be freely copied around,
since all copies will refer to the same cells, so changing a cell
makes all the copies appear to change.

Guillermo

unread,
Mar 9, 2012, 11:34:31 PM3/9/12
to juli...@googlegroups.com
I've copied my post because my intention was to publish it here, if not inconvenient.

I must say that my review is not for who write here, but some approaches that I see often. Julia is a clear example that is not the case. I expose my example to show that it is not easy to take a decision because it is difficult to know all the use cases for that language. Maybe that's why less purist languages ​​ultimately are adopted by more people.

Congratulations for this new programming language. It's very clear and easy to learn. I like it!


But... be careful with a purist functional (or object oriented or...) approach like:

  • Immutable data is the right thing or you are wrong.
  • Magic abstractions (you loss the idea about what you code really does)
  • Only arrays, vectors or matrix need to be immutable.
  • Functions are good, objects are bad.
  • Think as a language designer, not as a language user.

An example, I'm working since some time with data analysis and my code is full of mutable composites like this:


template<typename ValueType>

class variance {

public:

   typedef ValueType value_type;

   

   variance()

   : _mean(0), _var(0) {}

   variance(const variance& v)

   : _mean(v._mean), _var(v._var), _count(v._count) {}

   

   void operator()(value_type x) {

      _count();

      value_type delta = x-_mean;

      _mean += delta/_count;

      value_type delta2 = delta*(x-_mean);

      _var += (delta2-_var)/_count;

   }

   

   operator    value_type() const { return _var; }

   value_type  sum() const { return _mean*_count; }

   value_type  mean() const { return _mean; }

   std::size_t count() const { return _count; }

   

private:

   value_type _mean, _var;

   counter    _count;

};


The are hundreds of function objects (accumulators), composed in other composites and in a long chain that compiler joins in a big expression to calculate each sample of the data in one loop (for each thread). So yes, I'm coding in a functional approach. But, however there are many (light) objects and 90% have a mutable state. But this is not all the answer because 90% of the time parts of the program access those composites as immutable values. The member function operator() modify its values but all the rest can not. All references are marked as const, for example:


   const variance& operator[](size_type i) const {  // const functon retreive a const variance from an array

      return _items[i];                             // array is mutable for not const member function of its owner

   }   


It's not a good idea to destroy and construct each piece of all the superstructure in each sample iteration. And you need the state because you want to know the full snapshot incrementally. There aren't immutable composites but there are one modifier function and many accesses that trait data as immutable.


C++ is impure, magnet of the details, neither functional nor object oriented and with a limited type abstraction... but maybe for those reasons I've found that is the best to get my work done and... very fast! Be careful with fall in love with a particular paradigm or programming hype.


"There are more things in earth and heaven, Horatio, that are dreamt of in your philosophy."


Regards,

Guillermo Ruiz Troyano.

Reply all
Reply to author
Forward
0 new messages