immutable composite types on the way

566 views
Skip to first unread message

Jeff Bezanson

unread,
Feb 28, 2013, 3:34:15 PM2/28/13
to juli...@googlegroups.com
There has been a bit of discussion already about my jb/immutable
branch, so it is time to give a more complete explanation, and also
fair warning, as some internals have been rearranged quite a bit.
Despite the chaos, this is a surprisingly non-breaking change, unless
you write code that deals directly with the likes of AbstractKind and
CompositeKind.

The following is a draft of a blog post about the changes. The stuff
for metaprogrammers is at the end.

---

Efficient Aggregates in Julia

We recently introduced an exciting feature that has been in planning
for some time: immutable aggregate types. In fact, we have been
planning to do this for so long that this feature is the subject of
our issue #13 on GitHub, out of 2421 total issues so far.

Essentially, this feature drastically reduces the overhead of
user-defined types that represent small number-like values, or that
wrap a small number of other objects. Consider an RGB pixel type:

immutable Pixel
r::Uint8
g::Uint8
b::Uint8
end

Instances of this type can now be packed efficiently into arrays,
using exactly 3 bytes per object. In all other respects, these objects
continue to act like normal first-class objects. To see how we might
use this, here is a function that converts an RGB image in standard
24-bit framebuffer format to grayscale:

function rgb2gray!(img::Array{Pixel})
for i=1:length(img)
p = img[i]
v = uint8(0.30*p.r + 0.59*p.g + 0.11*p.b)
img[i] = Pixel(v,v,v)
end
end

This code will run blazing fast, performing no memory allocation. We
have not done thorough benchmarking, but this is in fact likely to be
the fastest way to write this function in Julia from now on.

The key to this behavior is the new `immutable` keyword, which means
instances of the type cannot be modified. At first this sounds like a
mere restriction --- how come I'm not allowed to modify one? --- but
what it really means is that the object is identified with its
contents, rather than its memory address. A mutable object has
"behavior"; it changes
over time, and there may be many references to the object, all of
which can observe those changes. An immutable object, on the other
hand, has only a value, and no time-varying behavior. Its location
does not matter. It is "just some bits".

Julia has always had some immutable values, in the form of bits types,
which are used to represent fixed-bit-width numbers. It is highly
intuitive that numbers are immutable. If x equals 2, you might later
change the value of x, but it is understood that the value of 2 itself
does not change.
The `immutable` keyword generalizes this idea to structured data types
with named fields. Julia variables and containers, including arrays,
are all still mutable. While a `Pixel` object itself can't change, a
new `Pixel` can be written over an old one within an array, since the
array is mutable.

Let's take a look at the benefits of this feature.

1. The compiler and GC have a lot of freedom to move and copy these
objects around. This flexibility can be used to store data more
efficiently, for example keeping the real and imaginary parts of a
complex number in separate registers, or keeping only one part in a
register.

2. Immutable objects are easy to reason about. Some languages, such as
C++ and C#, provide "value types", which have many of the benefits of
immutable objects. However, their behavior can be confusing. Consider
code like the following:

item = lookup(collection, index)
modify!(item)

The question here is whether we have modified the same `item` that is
in the collection, or if we have modified a local copy. In Julia there
are only two possibilities: either `item` is mutable, in which case we
modified the one and only copy of it, or it is immutable, in which
case modifying it is not allowed.

3. No-overhead data abstractions become possible. It is often useful
to define a new type that simply wraps a single value, and modifies
its behavior in some way. Our favorite modular integer example type
fits this description:

immutable ModInt{n} <: Integer
k::Int
ModInt(k) = new(mod(k,n))
end

Since a given `ModInt` doesn't need to exist at a particular address,
it can be passed to functions, stored in arrays, and so on, as
efficiently as a single `Int`, with no wrapping overhead. But, in
Julia, the overhead will not *always* be zero. The `ModInt` type
information will "follow the data around"
at compile time to the extent possible, but heap-allocated wrappers
will be added as needed at run time. Typically these wrappers will be
short-lived; if the final destination of a `ModInt` is in a `ModInt`
array, for example, the wrapper can be discarded when the value is
assigned. But if the value is
only used locally inside a function, there will most likely be no
wrappers at all.

4. Abstractions are fully enforced. If a custom constructor is written
for an immutable type, then all instances will be created by it. Since
the constructed objects are never modified, the invariants provided by
the constructor cannot be violated. At this time, uninitialized arrays
are an exception to this rule. New arrays of "plain data" immutable
types have
unspecified contents, so it is possible to obtain an invalid value
from one. This is usually harmless in practice, since arrays must be
initialized anyway, and are often created through functions like
zeros() that do so.

5. We can automatically type-specialize fields. Since field values at
construction time are final, their types are too, so we learn
everything about the type of an immutable object when it is
constructed.

There are many potential optimizations here, and we have not
implemented all of them yet. But having this feature in place provides
another lever to help us improve performance over time.

For now though, we at least have a much simpler implementation of
complex numbers, and will be able to take advantage of efficient
rational matrices and other similar niceties.


Addendum: Under the hood

For purposes of calling C and writing reflective code, it helps to
know a bit about how immutable types are implemented. Before this
change, we had types AbstractKind, BitsKind, and CompositeKind, for
separating which types are abstract, which are represented by
immutable bit strings, and which are mutable aggregates. It was
sometimes convenient that the type system reflected these differences,
but also a bit unwarranted since all these types participate in the
same hierarchy and follow the same subtyping rules.

Now, the type landscape is both simpler and more complex. The three
Kinds have been merged into a single kind called DataType. The type of
every value in Julia is now either a DataType, or else a tuple type.
(UnionTypes still exist, but of course are always abstract.) To find
out the details
of a DataType's physical representation, you must query its
properties. DataTypes have three boolean properties `abstract`,
`mutable`, and `pointerfree`, and an integer property `size`. The
CompositeKind properties `names` and `types` are still there to
describe fields.

The `abstract` property indicates that the type was declared with the
`abstract` keyword and has no direct instances. `mutable` indicates,
for concrete types, whether instances are mutable. `pointerfree` means
that instances contain "just data" and no references to other Julia
values. `size` gives the size of an instance in bytes.

What used to be BitsKinds are now DataTypes that are immutable,
concrete, have no fields, and have non-zero size. The former
CompositeKinds are mutable and concrete, and either have fields or are
zero size if they have zero fields. Clearly, new combinations are now
possible. We have
already mentioned immutable types with fields. We could have the
equivalent of mutable BitsKinds, but this combination is not exposed
in the language, since it is easily emulated using mutable fields.
Another new combination is abstract types with fields, which would
allow you to declare that all subtypes of some abstract type should
have certain fields. That one is
definitely useful, and we plan to provide syntax for it.

Typically, the only time you need to worry about these things
is when calling native code, when you want to know whether some array
or struct has C-compatible data layout. This is handled by the type
predicate isbits(T).

Tom Short

unread,
Feb 28, 2013, 3:55:30 PM2/28/13
to juli...@googlegroups.com
Great work, Jeff, and great write-up. I learned a lot in the process.

Stefan Karpinski

unread,
Feb 28, 2013, 4:03:34 PM2/28/13
to Julia Dev
Bravo. This has been a long time coming. I want to point out that the aggregate code-size of this change is about -80 lines. Jeff added a huge amount of great functionality and *shrunk* the size of Julia's C code while doing it.

Toivo Henningsson

unread,
Feb 28, 2013, 4:24:15 PM2/28/13
to juli...@googlegroups.com
Very nice work Jeff! I've been waiting for this for a long time. (There is actually an @immutable macro in PatternDispatch.jl that tries to emulate identity by value)

A few questions:
* Won't abstract types with fields be kinda problematic for multiple inheritance?
* If I create an immutable type with a field of mutable type, that will be an immutable reference to a mutable object, right?
* What are the requirements for pointer-free? Only concrete, immutable, pointer-free fields?

Stefan Karpinski

unread,
Feb 28, 2013, 4:29:33 PM2/28/13
to Julia Dev

On Thu, Feb 28, 2013 at 4:24 PM, Toivo Henningsson <toiv...@gmail.com> wrote:
Won't abstract types with fields be kinda problematic for multiple inheritance?

We can burn that bridge in many different ways when we get there. One option would be the C++ thunk trick, but another option would be insisting on compatibility of inherited fields from all ancestors – in particular, as long as only one ancestor specifies any fields, everything will be fine.

Stefan Karpinski

unread,
Feb 28, 2013, 4:30:48 PM2/28/13
to Julia Dev
On Thu, Feb 28, 2013 at 4:24 PM, Toivo Henningsson <toiv...@gmail.com> wrote:
If I create an immutable type with a field of mutable type, that will be an immutable reference to a mutable object, right?

Yes, that's exactly right. It's a bit like const references to mutable arrays – you can change the contents (and even the size) of the array, but you cannot change what array is pointed to, and in particular, you cannot change the type of the array that is pointed to.

Jeff Bezanson

unread,
Feb 28, 2013, 4:56:04 PM2/28/13
to juli...@googlegroups.com
In the current implementation only structs of what used to be
bitskinds are pointer-free, but this will probably evolve to what you
suggest (any concrete, immutable, pointer-free values, with bitskinds
being the base case).

On Thu, Feb 28, 2013 at 4:24 PM, Toivo Henningsson <toiv...@gmail.com> wrote:

Tim Holy

unread,
Feb 28, 2013, 5:25:13 PM2/28/13
to juli...@googlegroups.com
Really, really cool. Thanks for the great writeup.

--Tim

Viral Shah

unread,
Feb 28, 2013, 11:41:23 PM2/28/13
to juli...@googlegroups.com
Great stuff Jeff, and great writeup. This goes a long way in establishing julia as a language that seriously cares about performance. Also, my favourite features are those that reduce code size!

I am hoping that abstract types with fields should make it a lot easier to reorganize the array type hierarchy.

-viral

Toivo Henningsson

unread,
Mar 1, 2013, 12:52:43 AM3/1/13
to juli...@googlegroups.com
Fair enough. I also think that abstract types with fields will come in handy, so I agree that it would have been a pity to dismiss them altogether.

Tim Holy

unread,
Mar 1, 2013, 8:08:01 AM3/1/13
to juli...@googlegroups.com
With your Pixel type, consider the following operation:

function lessgreen!(img::Array{Pixel})
for i = 1:length(img)
p = img[i]
img[i] = Pixel(p.r,p.g/2,p.b)
end
img
end

Will that involve copying 3 values per pixel, or will it simply tweak the
green channel (given sufficient compiler smarts)? If it does involve 3 copies,
then would an alternative like the following be possible?

function lessgreen!(img::Array{Pixel})
A = reinterpret(Uint, img, tuple(3,size(img)...))
A[2:3:end] /= 2
img
end

--Tim

Diego Javier Zea

unread,
Mar 1, 2013, 8:19:10 AM3/1/13
to juli...@googlegroups.com
Hi!!!

Looks like a great change for Julia, I learned a lot with the text too.
I have two questions, thinking on my case or the ModInt example ( I have now something like bitstype 8 AminoAcid <: Integer ).
Having a bitstype, what are the benefits I can gain passing to use an Immutable type? And what I can lose in the change?
It's more clear to me the example of Pixel. Except for the constructor, I don't see the gain of wrapping a Integer value.

Thanks

WANG

unread,
Mar 1, 2013, 9:19:35 AM3/1/13
to juli...@googlegroups.com
I like the immutable type!
I just hate its name : to long for input.

Would there be a shorter name for it?

immut, imt,
fixed, const, ...

Viral Shah

unread,
Mar 1, 2013, 9:21:39 AM3/1/13
to juli...@googlegroups.com
It is not long at all!

const is already a keyword, and in general, it is not a good idea to use up short names as keywords - so that people can use them in their own code.

-viral

Jeff Bezanson

unread,
Mar 1, 2013, 11:22:31 AM3/1/13
to juli...@googlegroups.com
My guess is the difference if any wouldn't be worth messing with
reinterpret. But I haven't compared yet.

Stefan Karpinski

unread,
Mar 1, 2013, 11:27:40 AM3/1/13
to juli...@googlegroups.com
This is where LLVM's optimization can kick in and hopefully figure out that the load and store back to the same location with the same value is a noop and can therefore be skipped. Fortunately, LLVM is generally quite good at this sort of thing.

Tim Holy

unread,
Mar 1, 2013, 11:38:24 AM3/1/13
to juli...@googlegroups.com
On Friday, March 01, 2013 11:27:40 AM Stefan Karpinski wrote:
> This is where LLVM's optimization can kick in and hopefully figure out that
> the load and store back to the same location with the same value is a noop
> and can therefore be skipped. Fortunately, LLVM is generally quite good at
> this sort of thing.

Great. And it sounds like the second will not generate an error, which is also
great.

--Tim

Steven G. Johnson

unread,
Mar 5, 2013, 4:41:48 PM3/5/13
to juli...@googlegroups.com
If there is a C API that requires me to pass a pointer to a structure, will I now be able to do it via an array whose elements are an immutable type with fields matching the C struct?

Jeff Bezanson

unread,
Mar 5, 2013, 4:44:25 PM3/5/13
to juli...@googlegroups.com
Yes. I think since Jameson's change it will also work for a single
struct not in an array (immutable or not).

Jeff Bezanson

unread,
Mar 5, 2013, 4:59:36 PM3/5/13
to juli...@googlegroups.com
Also, I am going to merge the branch today. Please bear with us in
case things get very ugly.

Steven G. Johnson

unread,
Mar 5, 2013, 5:09:58 PM3/5/13
to juli...@googlegroups.com


On Tuesday, March 5, 2013 4:44:25 PM UTC-5, Jeff Bezanson wrote:
Yes. I think since Jameson's change it will also work for a single
struct not in an array (immutable or not).

How do you pass a pointer to the struct if it is not an array?  Will & work?  What if the C code modifies the struct?

Jeff Bezanson

unread,
Mar 5, 2013, 5:27:03 PM3/5/13
to juli...@googlegroups.com
You can use Ptr{Struct} as an argument type, and & the argument. If
the C code modifies the struct I believe it will be modified in place.
That should probably be fixed for immutable structs. We can change the
semantics so that when &x is passed, the C code can modify a copy, and
then x is assigned to that copy after the call.

Jameson Nash

unread,
Mar 5, 2013, 5:34:30 PM3/5/13
to juli...@googlegroups.com
I'm adding this to the manual now, but good questions. I'll try my best to make sure I answer them clearly.

If there is a C API that requires me to pass a pointer to a structure, will I now be able to do it via an array whose elements are an immutable type with fields matching the C struct?
No, you want a constant pointer to an array (of non-constants). Not a pointer to an array of constants. A type is a better description than an Array for this relationship.

You can use Ptr{Struct} as an argument type, and & the argument. If
the C code modifies the struct I believe it will be modified in place.
That should probably be fixed for immutable structs. We can change the
semantics so that when &x is passed, the C code can modify a copy, and
then x is assigned to that copy after the call.
Yep, I was expecting to need to do a few hours of work after Jeff's immutable patch was merged to fix support for some of this. But in general, C should be able to modify the variable's contents (currently it has limitations), regardless of whether it is immutable type (so pointers to integer variable types works like C, etc)

Final important note: Arrays != *void Sometimes we can make them act close, but in general, Array{T} in Julia is very different from T[] in C

Jeff Bezanson

unread,
Mar 5, 2013, 5:49:03 PM3/5/13
to juli...@googlegroups.com
> If there is a C API that requires me to pass a pointer to a structure, will
> I now be able to do it via an array whose elements are an immutable type
> with fields matching the C struct?
>
> No, you want a constant pointer to an array (of non-constants). Not a
> pointer to an array of constants. A type is a better description than an
> Array for this relationship.
>

I don't understand this. Passing an Array{ComplexPair{Float64}} as a
Ptr{ComplexPair{Float64}} to C now works perfectly (Complex128 is also
now an alias for ComplexPair{Float64}).
In Julia you can't have a ComplexPair object that is backed by an
element of an array, so ComplexPairs are all constant no matter what
you do with arrays of them.

Jameson Nash

unread,
Mar 5, 2013, 6:13:36 PM3/5/13
to juli...@googlegroups.com
Yes, but that is a constant type'd array of non-constant values, which I intended to say should work (i guess I was thinking half in C previously and forgot the word type). However, I interpreted Steven question as asking whether `{1, 2.0, 3.0f1}` (e.g. a heterogeneous array) is equivalent functionally to
type X
a::Int
b::Float64
c::Float32
end
which is not and will never be true (AFAIK)

Steven G. Johnson

unread,
Mar 5, 2013, 7:56:09 PM3/5/13
to juli...@googlegroups.com


On Tuesday, March 5, 2013 6:13:36 PM UTC-5, Jameson wrote:
Yes, but that is a constant type'd array of non-constant values, which I intended to say should work (i guess I was thinking half in C previously and forgot the word type). However, I interpreted Steven question as asking whether `{1, 2.0, 3.0f1}` (e.g. a heterogeneous array) is equivalent functionally to
type X
  a::Int
  b::Float64
  c::Float32
end
which is not and will never be true (AFAIK)

No, I understand that. I was asking whether, if the C routine expects a Ptr{X} if I have to pass an An{X} or if I could do &x::X, and if the latter would work if the C routine alters the contents of the struct.

A slightly more technical question that comes up in the Python interface.  When I register a new top-level function with Python, I have to pass Python a pointer to a PyMethodDef struct, and this pointer must remain valid for the lifetime of the Python interpreter (Python doesn't make a copy, nor does it alter the contents of this struct).  So, I want to just have a global variable const def::PyMethodDef of an immutable PyMethodDef type in Julia, and pass a pointer to this to Python.  If I pass &def to a ccall, will it pass a pointer to the actual def, or a pointer to some kind of copy (as I guess "&" did in Julia 0.1) that will disappear once ccall completes?  

I gather from the above discussion that &def will pass an actual pointer to def, but I'm not sure if this is documented behavior or guaranteed in future versions of Julia?

If not, my understanding is that I should just define def to be length-1 Vector{PyMethodDef}, in which case an actual pointer to def's contents, not a copy, will be passed when I ccall with def as an argument.

--SGJ

Jameson Nash

unread,
Mar 5, 2013, 8:19:19 PM3/5/13
to juli...@googlegroups.com
&def will pass the actual object, provided that def is mutable (and that ccall's call to convert didn't create a new object). This behavior was true in 0.1 and will remain true. If def is immutable, the compiler is free to make additional copies and I'm not yet exactly sure what that will look like. In the future, the goal is to do something similar with immutable types so that you can get values back that way too.

My point with the array comment is that Array{PyMethodDef} is only a *PyMethodDef if PyMethodDef is immutable and in the context of a ccall argument. Otherwise it is a jl_value_t*. Inside a type it is always an opaque type to c (a jl_array_t*, to be specific). The old recommendation (before you could pass &type to ccall or if you didn't directly need the contents) was to declare a vector of plain bytes of the appropriate length (or use Base.c_malloc).

Steven G. Johnson

unread,
Mar 5, 2013, 8:45:11 PM3/5/13
to juli...@googlegroups.com


On Tuesday, March 5, 2013 8:19:19 PM UTC-5, Jameson wrote:
&def will pass the actual object, provided that def is mutable (and that ccall's call to convert didn't create a new object). This behavior was true in 0.1 and will remain true. If def is immutable, the compiler is free to make additional copies and I'm not yet exactly sure what that will look like. In the future, the goal is to do something similar with immutable types so that you can get values back that way too.

My point with the array comment is that Array{PyMethodDef} is only a *PyMethodDef if PyMethodDef is immutable and in the context of a ccall argument. Otherwise it is a jl_value_t*. Inside a type it is always an opaque type to c (a jl_array_t*, to be specific). The old recommendation (before you could pass &type to ccall or if you didn't directly need the contents) was to declare a vector of plain bytes of the appropriate length (or use Base.c_malloc).

PyMethodDef will definitely be an immutable composite type; I understand why that is required in order for it to be laid out in memory like a C struct.  And it sounds like I have to use an Array, rather than &, to pass the actual pointer to my (immutable composite type) data.  That's fine, no big deal.

Jameson Nash

unread,
Mar 5, 2013, 9:04:51 PM3/5/13
to juli...@googlegroups.com
Just curious, but why does it need to be immutable? Mutable and immutable types are laid out the same in memory. The difference is that arrays of immutable types will be stored inline, whereas arrays of mutable objects will be stored as pointers. So the Array wrapper would be needed to force the allocation of fixed semi-mutable memory for the immutable type (or possibly something like https://github.com/JuliaLang/julia/issues/2322). But you can also use a mutable type in the first place. C probably shouldn't be changing the contents of a immutable value either (expect perhaps during an initialization step). 

Steven G. Johnson

unread,
Mar 5, 2013, 9:09:53 PM3/5/13
to juli...@googlegroups.com


On Tuesday, March 5, 2013 9:04:51 PM UTC-5, Jameson wrote:
Just curious, but why does it need to be immutable? Mutable and immutable types are laid out the same in memory. The difference is that arrays of immutable types will be stored inline, whereas arrays of mutable objects will be stored as pointers. So the Array wrapper would be needed to force the allocation of fixed semi-mutable memory for the immutable type (or possibly something like https://github.com/JuliaLang/julia/issues/2322). But you can also use a mutable type in the first place. C probably shouldn't be changing the contents of a immutable value either (expect perhaps during an initialization step). 

Good to know that mutable types are laid out the same way, and passing a pointer with & works.

Kevin Squire

unread,
Mar 9, 2013, 9:36:55 AM3/9/13
to juli...@googlegroups.com


On Thursday, February 28, 2013 12:34:15 PM UTC-8, Jeff Bezanson wrote:
(*snip*)

zero size if they have zero fields. Clearly, new combinations are now
possible. We have
already mentioned immutable types with fields. We could have the
equivalent of mutable BitsKinds, but this combination is not exposed
in the language, since it is easily emulated using mutable fields.
Another new combination is abstract types with fields, which would
allow you to declare that all subtypes of some abstract type should
have certain fields. That one is
definitely useful, and we plan to provide syntax for it.

If you haven't made a decision yet, can I suggest the keyword trait (borrowed from Scala) for these?  Or do you plan to overload/change the syntax for abstract?  

Multiple inheritance and `traits` give a lot of flexibiility, especially for things like Collections (and also give the possibility of things becoming really convoluted...).

Cheers!
   Kevin

Stefan Karpinski

unread,
Mar 9, 2013, 10:11:09 AM3/9/13
to Julia Dev
I'm not sure how Scala is using the word "trait" (I suspect correctly, since Martin Odersky et al. are very knowledgable about programming language theory), but that term already has a very specific meaning in object-oriented programming research – and our abstract types are essentially traits. Using a more intuitive word that the theory term seems ok to me (both Scala and Julia use Any and None in place of Top and Bottom), but then also using the theory term in a slightly non-standard way seems really dangerous.

Kevin Squire

unread,
Mar 9, 2013, 12:15:57 PM3/9/13
to juli...@googlegroups.com
Sorry, I was misremembering my scala.  There, traits are member variables and functions collected together, and which can't be instantiated on their own, but which can be added to other classes as a mixin.  Doesn't really apply here (especially since we don't have multiple inheritance yet.)

Cheers!

Kevin

Jeff Bezanson

unread,
Mar 9, 2013, 6:07:27 PM3/9/13
to juli...@googlegroups.com
Yes, that is very similar to what our abstract types are. You can
already use an abstract type to "bring in" a bunch of methods, but not
yet fields. The keyword "trait" would solve the syntactic problem of
writing fields inside an abstract type, but is a bit misleading since
it wouldn't actually be a different thing from an abstract type. Then
again, we already have the type / immutable distinction.
Reply all
Reply to author
Forward
0 new messages