Array/Cell - a useful distinction, or not?

694 views
Skip to first unread message

Oliver Woodford

unread,
Apr 29, 2014, 5:38:33 AM4/29/14
to julia...@googlegroups.com
A habitual MATLAB user, I've been trying out Julia over the last two weeks to see if it might be a suitable replacement. I want something that is as fast to develop using, but has much faster runtime. Perhaps I'll write about my general thoughts in another post. However, in this thread I want to address one linguistic thing I found confusing.

Ignoring subarrays, dense/sparse arrays, there are two main types of array in Julia. I will call them homogenous and heterogenous. Homogenous arrays are declared as having all elements be the same type: e.g. array{Float64}. They are efficient to store in memory, as the elements are simply laid out consecutively in memory. Heterogenous arrays have an abstract element type, e.g. array{Real}. The way Julia interprets this is that every element must be a concrete subtype of Real, but that they don't have to be the same type. Each element can therefore be a different type, with different storage requirements, so these arrays contain a pointer to each element, which is then stored somewhere else - this carries a massive overhead. In MATLAB these arrays would be termed an array and a cell array respectively, so there is a clear distinction. What I found confusing with Julia is that the distinction is less clear.

This confusion was highlighted in a stackoverflow question, which I'll outline it again, now:

f(x::Real) = x is equivalent to f{T<:Real}(x::T) = x, but f(x::Array{Real}) = x is different from f{T<:Real}(x::Array{T}) = x.

The second form for input arrays, requiring static parameters, is needed to declare that the array is homogenous, not heterogenous. This seems a funny way of doing things to me because:
1. The homogeneity/heterogeneity of the array is a characteristic of the array only, and not of the function
2. The static parameter T is not required anywhere else, and the Julia style guide explicitly counsels against the use of such parameters, where they are unnecessary.
3. To declare a function which can take homogenous or heterogenous arrays, I believe you'd have to do something like  f{T<:Real}(x::Union(Array{T}, Array{Real})) = x, which seems totally bizarre (due to point 1).

What I would advocate instead is two types of array, one homogenous, one heterogenous. Array for homogenous and Cell for heterogenous would work. It would do away with the need for static parameters in this case, and also, in my view, make people far more aware of when they are using the different types of array. I suspect many beginners are oblivious to the distinction, currently.

In the stackoverflow question, someone suggested two points against this:
1. Having an array whose elements are all guaranteed to be some subtype of Real is not particularly useful without specifying which subtype since without that information almost no structural information is being provided to the compiler (e.g. about memory layout, etc.)
Well, firstly I disagree; there is a lot of structural information being supplied - to read each element, the compiler knows that it just needs to compute an offset, rather than compute an offset, read a pointer and then read another memory location. However, I don't think this is exploited (though it could be) because the function will be recompiled from scratch for each element type. Secondly, this isn't about helping the compiler, it's about making the language more consistent and sensible - helping the user.
2. You almost always pass homogenous arrays of a concrete type as arguments anyway and the compiler is able to specialize on that.
Firstly, homogenous arrays that you pass in always have a concrete type. Secondly, you don't always know what that type will be. It might be Float64 or Unit8, etc.

I haven't yet heard a convincing counterargument to it making more sense to distinguish homogenous and heterogenous arrays by the array type rather than by static function parameter.

Let the discussion begin...

Ivar Nesje

unread,
Apr 29, 2014, 10:48:52 AM4/29/14
to julia...@googlegroups.com
Sorry for nitpicking, but point 3 is wrong, and it might cause trouble in the following discussion.

f{T<:Real}(a::Array{T})
Matches any array with a element type that is a subtype of Real (eg. Integer[1,2,BigInt(44)] and Real[1, 3.4])

I have had trouble with this too, but now that I somewhat understand the rationale, I'm less frustrated. I'm not at the point of defending the current behaviour (yet), so others will have to do that (again).

Jason Merrill

unread,
Apr 29, 2014, 11:10:37 AM4/29/14
to julia...@googlegroups.com
Do you find yourself dealing with heterogeneous arrays often? It might be useful to post some examples where this comes up to the list.

Julia compiles specialized versions of functions for the concrete types they are called with at run time, so you should usually think of the types in function signatures as documentation/assumption checks rather than a performance optimization.

This is in contrast to explicit types used in type declarations or inside function bodies, which can have important performance effects.

Oliver Woodford

unread,
Apr 29, 2014, 11:23:45 AM4/29/14
to julia...@googlegroups.com
On Tuesday, April 29, 2014 4:10:37 PM UTC+1, Jason Merrill wrote:
Do you find yourself dealing with heterogeneous arrays often? It might be useful to post some examples where this comes up to the list.

In my Matlab code I use cell arrays (heterogenous) a great deal, and of course normal arrays (homogeneous) too.
 

Julia compiles specialized versions of functions for the concrete types they are called with at run time, so you should usually think of the types in function signatures as documentation/assumption checks rather than a performance optimization.

Yes, that's exactly what I'm using types in function signatures for - documentation and error checking.
 

This is in contrast to explicit types used in type declarations or inside function bodies, which can have important performance effects.

Yes, understood.


Oliver Woodford

unread,
Apr 29, 2014, 11:28:28 AM4/29/14
to julia...@googlegroups.com

On Tuesday, April 29, 2014 3:48:52 PM UTC+1, Ivar Nesje wrote:
Sorry for nitpicking, but point 3 is wrong, and it might cause trouble in the following discussion.

f{T<:Real}(a::Array{T})
Matches any array with a element type that is a subtype of Real (eg. Integer[1,2,BigInt(44)] and Real[1, 3.4])
 
Are you saying that f{T<:Real}(a::Array{T}) covers both homogeneous and heterogeneous arrays, whereas f(a::Array{Real}) only covers heterogeneous arrays? If that's the case it strikes me as just as confusing.


I have had trouble with this too, but now that I somewhat understand the rationale, I'm less frustrated. I'm not at the point of defending the current behaviour (yet), so others will have to do that (again).

If this has already been discussed then by all means post a link to it. No point repeating things!

John Myles White

unread,
Apr 29, 2014, 11:38:59 AM4/29/14
to julia...@googlegroups.com
On Apr 29, 2014, at 8:28 AM, Oliver Woodford <oliver....@gmail.com> wrote:
 
Are you saying that f{T<:Real}(a::Array{T}) covers both homogeneous and heterogeneous arrays, whereas f(a::Array{Real}) only covers heterogeneous arrays? If that's the case it strikes me as just as confusing.

This is indeed how parametric types work in Julia.

Array{T} is a family of types, one for each value of T. Defining f{T}(a::Array{T}) allows you to generically define an infinite number of functions -- you get a unique function for each value of T. 

In contrast, Array{Real} is a single type -- specifically, the type generated by the parametric type Array{T} when its parameter T = Real. There is no variable T, so f(a::Array{Real}) defines exactly one function -- and that function accepts exactly one type of input.

Array{Any} is also a single type.

 -- John

Patrick O'Leary

unread,
Apr 29, 2014, 11:41:52 AM4/29/14
to julia...@googlegroups.com
It might be easier to understand if we start by stripping away the constraints and the function syntax and look just at the array type.

Array{Real} is a concrete type. It's an array whose elements are all subtypes of the abstract type Real.

Array{T} is a family of types. It describes a whole bunch of types, including Array{Real}, but also including Array{Float64}, Array{All} and Array{Bob}. T can itself be either abstract--a heterogenous array--or concrete.

The constrained type variable {T<:Real}(::Array{T}) still describes a family of types, but that family is smaller. Array{All} and Array{Bob} are out.

Hope this helps!

Stefan Karpinski

unread,
Apr 29, 2014, 11:41:43 AM4/29/14
to Julia Users
On Tue, Apr 29, 2014 at 11:38 AM, John Myles White <johnmyl...@gmail.com> wrote:
Array{Any} is also a single type.

Not to nit-pick, but there's a second type parameter, so it's actually still abstract. But Array{Any,1} is a single type with no subtypes, which I think is what you meant.

Oliver Woodford

unread,
Apr 29, 2014, 11:51:18 AM4/29/14
to julia...@googlegroups.com
On Tuesday, April 29, 2014 4:38:59 PM UTC+1, John Myles White wrote:
On Apr 29, 2014, at 8:28 AM, Oliver Woodford <oliver....@gmail.com> wrote:
 
Are you saying that f{T<:Real}(a::Array{T}) covers both homogeneous and heterogeneous arrays, whereas f(a::Array{Real}) only covers heterogeneous arrays? If that's the case it strikes me as just as confusing.

This is indeed how parametric types work in Julia.

Array{T} is a family of types, one for each value of T. Defining f{T}(a::Array{T}) allows you to generically define an infinite number of functions -- you get a unique function for each value of T. 

Yes, understood.
 

In contrast, Array{Real} is a single type -- specifically, the type generated by the parametric type Array{T} when its parameter T = Real. There is no variable T, so f(a::Array{Real}) defines exactly one function -- and that function accepts exactly one type of input.

Yes, understood.
 

Array{Any} is also a single type.

 -- John


However, you haven't addressed the point I make in my post. I will post it again, stated more clearly.


 

Stefan Karpinski

unread,
Apr 29, 2014, 11:51:52 AM4/29/14
to Julia Users
The real root issue here is parametric variance. Julia has opted for parametric invariance  – i.e. that P{A} <: P{B} only if A = B – which is what James mentions in his StackOverflow answer. The wikipedia page does a decent job of summarizing the issue:

A programming language designer will consider variance when devising typing rules for e.g. arrays, inheritance, and generic datatypes. By making type constructors covariant or contravariant instead of invariant, more programs will be accepted as well-typed. On the other hand, programmers often find contravariance unintuitive, and accurately tracking variance to avoid runtime type errors can lead to complex typing rules. In order to keep the type system simple and allow useful programs, a language may treat a type constructor as invariant even if it would be safe to consider it variant, or treat it as covariant even when that can violate type safety.

Julia has chosen the simple approach of invariance everywhere. This simplifies the subtyping rules, but there is no free lunch – variance issues still do crop up. But at least Julia's rules are simple if occasionally a bit inconvenient. Basically, this is the situation:
  • covariance: P{A} <: P{B} ⟺ A <: B – correct for read, incorrect for write.
  • contravariance: P{A} <: P{B} ⟺ B <: A – correct for write, incorrect for read.
  • invariance: P{A} <: P{B} ⟺ A = B – correct for both read and write.
The main purpose of variance in a programming language is to allow definitions to apply to more things. In particular, since the number of types that a parametric type may encompass is unbounded, you don't want to have to write an infinite number of method definitions to define some functionality for P{A} for all A <: B where B is some unbounded abstract type. Instead of accomplishing this with covariance or contravariance, Julia accomplishes it with more powerful dispatch – specifically, method signatures with quantified type parameters, such as:

frob{A<:B}(x::P{A})

This is effectively a covariant signature, applying to P{A} for any A <: B. It seems like you would be happier if parametric types were covariant and you were allowed to write this as:

frob(x::P{B})

where the method implicitly also applies to x::P{A} for every A <: B. This would certainly be less of an eyesore. There are a couple of important points to be made, however:
  1. In the longer, more explicit form, it is clear that B may not be the actual parameter type of x; the parameter type of x is some A which is a subtype of B.
  2. If the shorter form were shorthand for the longer form without explicit A, there would be no way to express that x has an exact parameter type of B.
The explicit distinction between A – the actual parameter type of x – and B, which is simply an upper bound on A, becomes clearer in situations like this:

frob{A<:B}(x::P{A}, y::A)
frob{A<:B}(x::P{A}, y::B)

The first definition only applies to x and y where y is of the actual parameter type of x and thus could be used as is by x. The second definition applies when y is of type B, regardless of x's actual parameter type, which means that it might not be usable as is by x.

We could actually make parametric types covariant without losing the ability to express this distinction; it would just mean that the second definition could be written as:

frob(x::P{B}, y::B)

Currently, this means that x must have exact parameter type B; if types were covariant then it would mean that x's parameter type could be any A <: B. Although it may seem questionable that y not be of the true parameter type of x, allowing this wiggle room and then doing conversion at run-time as necessary is actually the standard way to do things in Julia. Conversion is often done implicitly by assignment to a type location such as a field of x or an array slot. The fact that this is quite common is why covariance might actually be ok. The main thing we would lose with parametric covariance is the ability to talk about arrays that are of *exactly* element type B where B is some abstract type. But I'm just not sure how useful that actually is. Those types are generally discouraged in any case and it's rare to want to give them special behaviors that wouldn't also apply to parametric types with concrete parameters. So perhaps we ought to consider making parametric types in Julia covariant.

Matt Bauman

unread,
Apr 29, 2014, 11:53:05 AM4/29/14
to julia...@googlegroups.com
I use cell arrays very often in Matlab, too, but I've found that I often don't really need to even worry about the distinction in julia.  Square brackets will constrain the types as much as possible, and if it's not possible, Any[] == {}.

Moreover, most of what I used cell arrays for in Matlab are completely obviated in Julia — Cell arrays of strings (strings are first class) and passing/parsing/splatting varargs (keyword arguments are wonderful and any Julian collection can be splatted).

Oliver Woodford

unread,
Apr 29, 2014, 11:54:36 AM4/29/14
to julia...@googlegroups.com
On Tuesday, April 29, 2014 4:41:52 PM UTC+1, Patrick O'Leary wrote:
It might be easier to understand if we start by stripping away the constraints and the function syntax and look just at the array type.

Array{Real} is a concrete type. It's an array whose elements are all subtypes of the abstract type Real.

Array{T} is a family of types. It describes a whole bunch of types, including Array{Real}, but also including Array{Float64}, Array{All} and Array{Bob}. T can itself be either abstract--a heterogenous array--or concrete.

The constrained type variable {T<:Real}(::Array{T}) still describes a family of types, but that family is smaller. Array{All} and Array{Bob} are out.

Hope this helps!

Thank you, Patrick. We have now covered how types for arrays currently do work (which I mostly understood), and corrected my mistake in point 3 of my original post. We haven't yet started to address the real point of my post yet, though. I'd be interested to hear more on that.

Oliver Woodford

unread,
Apr 29, 2014, 12:01:50 PM4/29/14
to julia...@googlegroups.com


On Tuesday, April 29, 2014 4:53:05 PM UTC+1, Matt Bauman wrote:
I use cell arrays very often in Matlab, too, but I've found that I often don't really need to even worry about the distinction in julia.  Square brackets will constrain the types as much as possible, and if it's not possible, Any[] == {}.

Moreover, most of what I used cell arrays for in Matlab are completely obviated in Julia — Cell arrays of strings (strings are first class) and passing/parsing/splatting varargs (keyword arguments are wonderful and any Julian collection can be splatted).


Yes, I'd been thinking I would use heterogeneous arrays less in Julia, for the reasons you mention.

 

Oliver Woodford

unread,
Apr 29, 2014, 12:13:02 PM4/29/14
to julia...@googlegroups.com
Quick question, folks:

Does f(x::Array{Union(Int8,Int16)}) mean that x must be all Int8 or all Int16 (homogenous), or each element can be either Int8 or Int16 (heterogeneous)?

If the latter, then would I need to use f(x::Union(Array{Int8},Array{Int16})) to achieve the former?

If so, then what I am suggesting is having two array types, Array and Cell, and have f(x::Array{Union(Int8,Int16)}) be equivalent to the current f(x::Union(Array{Int8},Array{Int16})), and have f(x::Cell{Union(Int8,Int16)}) be equivalent to the current f(x::Array{Union(Int8,Int16)}).

Stefan, thank you very much for your answer. I will get back to you on it.

Jacob Quinn

unread,
Apr 29, 2014, 12:13:08 PM4/29/14
to julia...@googlegroups.com
In my experience, I can think of a single time when having an array of a specific abstract type was useful (how multiple Period arithmetic is handled: https://github.com/karbarcca/Dates.jl/blob/master/src/periods.jl#L88).

Almost always, I'm concentrating on making sure Arrays I work with are of a specific, concrete type to ensure performance and memory benefits. If I really realize a few different types might make it in an Array, I usually just go straight to a {} (Any Array) and move on.

-Jacob

Jason Merrill

unread,
Apr 29, 2014, 12:14:01 PM4/29/14
to julia...@googlegroups.com
Suppose the types that you want existed. Let's call them ConcreteArray, and Cell.

ConcreteArray gives the guarantee that all of it's elements are stored directly with a fixed stride.

Can you give some examples of functions that would use these in their type signature?

Stefan Karpinski

unread,
Apr 29, 2014, 12:20:31 PM4/29/14
to Julia Users
Thinking about it some more, the big problem with covariance is that it forces the confusion of two distinct meanings of P{B} for abstract B:
  1. The concrete instance of type P with actual parameter type B.
  2. The abstract type including all instances P{A} where A <: B.
If P{B} meant the latter, then we would either need a new way to express the former or we would be forced to identify the two. Unlike static languages, the concrete interpretation is not just a hallucination of the compiler – it exists at run-time – so we can't just have no name for it – there needs to be some concrete answer when you write typeof(x). Identifying the two meanings is very un-Julian as it would allow a concrete type to have subtypes, which would be a massive and to my mind distasteful change.

Oliver Woodford

unread,
Apr 29, 2014, 12:23:44 PM4/29/14
to julia...@googlegroups.com, quinn....@gmail.com
On Tuesday, April 29, 2014 5:13:08 PM UTC+1, Jacob Quinn wrote:
In my experience, I can think of a single time when having an array of a specific abstract type was useful (how multiple Period arithmetic is handled: https://github.com/karbarcca/Dates.jl/blob/master/src/periods.jl#L88).

Almost always, I'm concentrating on making sure Arrays I work with are of a specific, concrete type to ensure performance and memory benefits. If I really realize a few different types might make it in an Array, I usually just go straight to a {} (Any Array) and move on.

-Jacob


You will get performance and memory benefits if your arrays are homogeneous as opposed to heterogeneous. You don't always know what element type they'll have though. Sometimes you want Float64 for accuracy, sometimes Float32 for speed and lower memory use. To handle that you need a static function parameter, currently.

Oliver Woodford

unread,
Apr 29, 2014, 12:29:56 PM4/29/14
to julia...@googlegroups.com
Sure. sum(x::ConcreteArray{Number}). Because currently if I write sum(x::Array{Real}) and try to pass in an Array{Float64} I get an error.

Note that most arrays are actually homogeneous (as Jacob and Matt both stated), so if you want to make sure of that you need to use static parameters. I'm suggesting a system that doesn't require static parameters for the usual case.

Stefan Karpinski

unread,
Apr 29, 2014, 12:39:28 PM4/29/14
to Julia Users
On Tue, Apr 29, 2014 at 12:13 PM, Oliver Woodford <oliver....@gmail.com> wrote:
Quick question, folks:

Does f(x::Array{Union(Int8,Int16)}) mean that x must be all Int8 or all Int16 (homogenous), or each element can be either Int8 or Int16 (heterogeneous)?

If the latter, then would I need to use f(x::Union(Array{Int8},Array{Int16})) to achieve the former?

Yes and yes.
 
If so, then what I am suggesting is having two array types, Array and Cell, and have f(x::Array{Union(Int8,Int16)}) be equivalent to the current f(x::Union(Array{Int8},Array{Int16})), and have f(x::Cell{Union(Int8,Int16)}) be equivalent to the current f(x::Array{Union(Int8,Int16)}).

This is not a minor change. What you're asking for is having both invariant and covariant parametric types in the language – Array would be covariant and Cell would be invariant. If you're going to have invariant and covariant type parameters, you'll want contravariant type parameters too. At that point, you need to have a way of annotating the variance of a type's parameters – and worse still, you have to explain all this to everyone. At that point, you have Scala and all the complications and questions that come with that. I don't think that's the right path for a language that's meant to be used by people who are scientists rather than professional programmers (honestly, I don't really think it's a good idea for professional programmers either). Making all parametric types covariant is a more reasonable possibility, but as I pointed out, it has its own significant issues.

Matt Bauman

unread,
Apr 29, 2014, 1:11:05 PM4/29/14
to julia...@googlegroups.com
Perhaps the Julian curly-brace array syntax makes Any[] arrays seem special (especially to Matlab converts), when they're really not all that different.  Yes, users need to learn that Array{Real} is not a supertype of Array{Int}.  But that's a common theme throughout Julia's type system, and really needs to be learned as soon as learning about parametric types.  I think the manual does a good job here — Parametric Types.  Is there a way that it could be written more clearly?

Jason Merrill

unread,
Apr 29, 2014, 1:37:52 PM4/29/14
to julia...@googlegroups.com

Take a look at the actual implementations of sum in reduce.jl:

https://github.com/JuliaLang/julia/blob/master/base/reduce.jl#L179
https://github.com/JuliaLang/julia/blob/master/base/reduce.jl#L275

Julia often has method definitions that are significantly more generic than sum(x::ConcreteArray{Number}), which I think is really nice. The definitions above are fast for Array{Float64}, but the very same definitions also work for e.g. an array of matrices, or anything else with + defined (zero(T) might have to be defined too if the array is empty or has only 1 element).

I don't see much advantage of having an implementation like sum(x::ConcreteArray{Number}) if someone later comes along and defines sum(x::AbstractArray) with essentially the same body.

Tomas Lycken

unread,
Apr 29, 2014, 2:43:03 PM4/29/14
to julia...@googlegroups.com
Yet another point of view - incidentally the one that convinced me that Julia has got this right, while most others don't - is the fact that in the grand scheme of things, JIT is cheap while working with types that might have subtypes is really painful if you don't want to compromise with performance. Julia does an absolutely beautiful job with a function definition such as

function sum(x)
    # implementation goes here
end

with no arguments defined at all. Mostly, you don't need to define your argument types, since the compiler will generate a strongly typed function for you whenever you need it. For whatever type you need it. If, for some reason, you actually do have an instance of `Array{Real}`, containing both floats, ints and rationals, the above method will (probably) Just Work(TM) anyway, since most operations for real numbers are implemented to make them work with each other. And if you can write an implementation of `sum` that works more or less optimally on `Array{Float64}`, chances are the compiler can too - and it will, as soon as you need one, without you doing anything more than asking nicely (and you don't even have to realize that you're asking!). The same user code will be used to compile both methods, and for the Array{Float64} case it will be as fast as it can be, while for the more general method it will not be slower than it would be otherwise.

And if you want to be able to say to the compiler that "this is going to be an array of numbers - I don't know what kind of numbers yet, but they'll all be the same type", well, then that's already included in the language. You only need the type parameter on the function to say f{T<:Number}(x::Array{T}) and you're done.

My point is that the problems that types like Array{Real} solve in other languages - e.g. being able to sum a collection of numbers without knowing what type of numbers they are - is solved instead by JIT-compiling and multiple dispatch, just as efficiently (if not better). The syntax - and sometimes the mind-set of the programmer - needs to be a little different, but there are no problems (that I can see) that can't be solved just as efficiently, albeit in another way.

OK, well, just because I wrote that, I realize there is one problem that I can't off the top of my head say how I'd solve: type assertions for e.g. arrays. Say I have a variable x, and I want to make sure that it's an array of real numbers that are all of the same type, but I don't care which one. Can I say x::Array{T<:Real} as a type assertion? (I'm at a computer without Julia atm, so I can't test it for myself...)

But yeah, other than that, I can't really think of a problem that isn't just as solve-able in Julia as in any other language. One just has to embrace the slightly different way of doing things, that is a result of a slightly unusual (in a good way!) type system.

// T

Tim Holy

unread,
Apr 29, 2014, 3:07:08 PM4/29/14
to julia...@googlegroups.com
On Tuesday, April 29, 2014 11:43:03 AM Tomas Lycken wrote:
> Say I have a variable x, and I want to make sure that it's an array
> of real numbers that are all of the same type, but I don't care which one.
> Can I say x::Array{T<:Real} as a type assertion?

function f{T<:Real}(x::Array{T})
...
end

does this.

--Tim

Tomas Lycken

unread,
Apr 29, 2014, 3:20:09 PM4/29/14
to julia...@googlegroups.com
Yes, when declaring function arguments, which I believe fall under case 2. in this list. Is it possible for a usage of `::` that falls under category 1?

// T

Jacob Quinn

unread,
Apr 29, 2014, 3:24:42 PM4/29/14
to julia...@googlegroups.com
Excellent points Tomas. I think this would be particularly helpful for those coming from Java. Perhaps some of your comments could be worked into the manual or FAQ somewhere.

-Jacob

Milan Bouchet-Valat

unread,
Apr 29, 2014, 4:24:33 PM4/29/14
to julia...@googlegroups.com
Le mardi 29 avril 2014 à 11:51 -0400, Stefan Karpinski a écrit : [...]
The explicit distinction between A – the actual parameter type of x – and B, which is simply an upper bound on A, becomes clearer in situations like this:


frob{A<:B}(x::P{A}, y::A)
frob{A<:B}(x::P{A}, y::B)


The first definition only applies to x and y where y is of the actual parameter type of x and thus could be used as is by x. The second definition applies when y is of type B, regardless of x's actual parameter type, which means that it might not be usable as is by x.


We could actually make parametric types covariant without losing the ability to express this distinction; it would just mean that the second definition could be written as:


frob(x::P{B}, y::B)


Currently, this means that x must have exact parameter type B; if types were covariant then it would mean that x's parameter type could be any A <: B. Although it may seem questionable that y not be of the true parameter type of x, allowing this wiggle room and then doing conversion at run-time as necessary is actually the standard way to do things in Julia. Conversion is often done implicitly by assignment to a type location such as a field of x or an array slot. The fact that this is quite common is why covariance might actually be ok. The main thing we would lose with parametric covariance is the ability to talk about arrays that are of *exactly* element type B where B is some abstract type. But I'm just not sure how useful that actually is. Those types are generally discouraged in any case and it's rare to want to give them special behaviors that wouldn't also apply to parametric types with concrete parameters. So perhaps we ought to consider making parametric types in Julia covariant.
I am very sympathetic to such as change, but one reason why making e.g. frob(x::Array{B, 1}, y::B) be equivalent to frob{T<:B}(x::Array{T, 1}, y::B), B being abstract, would be a problem, is that the function code couldn't rely on the assumption that the x vector can hold any value of type B. Indeed, x could be either an Array{B, 1} or an Array{T, 1} with T any concrete type as long as T<:B.

Cf. https://github.com/JuliaLang/julia/issues/6248#issuecomment-38435635 (and the following discussion)

In practice, I really don't think this would matter in many occasions. So maybe it would be a good change, if there was a syntax to specify that the function really expects an Array{B, 1}. Or maybe the latter isn't even useful as the function can always call convert() if it's really needed (can we find use cases for it).

I believe this detail is one of the (quite rare) frustrating points when one starts coding in Julia. So it would be worth investigating for Julia's coming success.

Oliver Woodford

unread,
Apr 30, 2014, 8:52:16 AM4/30/14
to julia...@googlegroups.com
Really? In an Array{Real}, are the elements all Real, or concrete subtypes of Real, such as Float64, Uint8, etc.? I don't see how they can be just Real. Therefore, given that Array{Real} meets the type assertion, and it's elements can be of different types, the function declaration doesn't enforce the constraint on arrays that Tomas was after, i.e. that input arrays be homogeneous.

As I understand it, if you want to do that you need:
function f(x::Union(Array{Float64},Array{Float32},etc.))
...
end

Is that correct? If not, what really is the correct way to constrain input arrays to be homogenous?

Oliver Woodford

unread,
Apr 30, 2014, 9:02:37 AM4/30/14
to julia...@googlegroups.com


On Tuesday, April 29, 2014 7:43:03 PM UTC+1, Tomas Lycken wrote:
OK, well, just because I wrote that, I realize there is one problem that I can't off the top of my head say how I'd solve: type assertions for e.g. arrays. Say I have a variable x, and I want to make sure that it's an array of real numbers that are all of the same type, but I don't care which one. Can I say x::Array{T<:Real} as a type assertion? (I'm at a computer without Julia atm, so I can't test it for myself...)

No, you can't do x::Array{T<:Real} or function f{T<:Real}(x::Array{T}) and have that enforce homogeneous arrays.

This is exactly the scenario I'm in. Yes (to address an earlier point), if I didn't bother with the type assertion, Julia would do the optimal thing in each case. However, my function will be a lot slower if you pass in a heterogeneous array, and I want to avoid programmers accidentally doing that. This is why I started this thread! Secondly, I don't think that should be done using a static parameter, either (not that it is).

I agree with your other points on how nice Julia is, but I want to tackle the one narly point!

Patrick O'Leary

unread,
Apr 30, 2014, 9:09:45 AM4/30/14
to julia...@googlegroups.com
On Wednesday, April 30, 2014 7:52:16 AM UTC-5, Oliver Woodford wrote:
Is that correct? If not, what really is the correct way to constrain input arrays to be homogenous?

The tendency in Julia is to embrace that it's a dynamic language, and not excessively type constrain inputs. While I don't think there's a way to do exactly what you want, why do you want to do it?

Patrick

Oliver Woodford

unread,
Apr 30, 2014, 9:16:40 AM4/30/14
to julia...@googlegroups.com
When my function will be a lot slower if you pass in a heterogeneous array, and I want to avoid programmers accidentally and obliviously doing that. Now, I could convert heterogeneous arrays to homogeneous ones within the function, but the Julia style guide very sensibly counsels against that.

Patrick O'Leary

unread,
Apr 30, 2014, 9:31:43 AM4/30/14
to julia...@googlegroups.com

It's a flexible type system, but it doesn't provide the power of ML or Haskell. If you really, really want this, do a runtime check:

reduce((==), [typeof(el) for el in a])

Ivar Nesje

unread,
Apr 30, 2014, 9:32:45 AM4/30/14
to julia...@googlegroups.com
You can do something like
function f{T<:Real}(x::Array{T,1})
    @assert isleaftype(T)
    # function implementation
end

Patrick O'Leary

unread,
Apr 30, 2014, 9:40:33 AM4/30/14
to julia...@googlegroups.com
And today I learned about isleaftype()! This is a better implementation than what I posted.

Ivar Nesje

unread,
Apr 30, 2014, 9:53:02 AM4/30/14
to julia...@googlegroups.com
@Patric. It's not just a better implementation. It actually works, because your check only check that the actual types are equal, but not if the type of the container is specific. Number[1,2,34,54] would pass your test, but the array will still be pointers to boxed values.

Oliver Woodford

unread,
Apr 30, 2014, 9:57:39 AM4/30/14
to julia...@googlegroups.com
Stefan

Firstly, thank you for taking the time to write such a lengthy response. I think it comes closest to addressing the particular question I had. Sadly I'm an Engineer, not a Computer Scientist, so it's taking me a while to get my head round. I've put a few comments inline.

On Tuesday, April 29, 2014 4:51:52 PM UTC+1, Stefan Karpinski wrote:
The real root issue here is parametric variance. Julia has opted for parametric invariance  – i.e. that P{A} <: P{B} only if A = B – which is what James mentions in his StackOverflow answer. The wikipedia page does a decent job of summarizing the issue:

A programming language designer will consider variance when devising typing rules for e.g. arrays, inheritance, and generic datatypes. By making type constructors covariant or contravariant instead of invariant, more programs will be accepted as well-typed. On the other hand, programmers often find contravariance unintuitive, and accurately tracking variance to avoid runtime type errors can lead to complex typing rules. In order to keep the type system simple and allow useful programs, a language may treat a type constructor as invariant even if it would be safe to consider it variant, or treat it as covariant even when that can violate type safety.

Julia has chosen the simple approach of invariance everywhere. This simplifies the subtyping rules, but there is no free lunch – variance issues still do crop up. But at least Julia's rules are simple if occasionally a bit inconvenient. Basically, this is the situation:
  • covariance: P{A} <: P{B} ⟺ A <: B – correct for read, incorrect for write.
  • contravariance: P{A} <: P{B} ⟺ B <: A – correct for write, incorrect for read.
  • invariance: P{A} <: P{B} ⟺ A = B – correct for both read and write.
I'm not sure what you mean by "correct for read, incorrect for write" etc.
 
The main purpose of variance in a programming language is to allow definitions to apply to more things. In particular, since the number of types that a parametric type may encompass is unbounded, you don't want to have to write an infinite number of method definitions to define some functionality for P{A} for all A <: B where B is some unbounded abstract type. Instead of accomplishing this with covariance or contravariance, Julia accomplishes it with more powerful dispatch – specifically, method signatures with quantified type parameters, such as:

frob{A<:B}(x::P{A})

This is effectively a covariant signature, applying to P{A} for any A <: B. It seems like you would be happier if parametric types were covariant and you were allowed to write this as:

frob(x::P{B})

where the method implicitly also applies to x::P{A} for every A <: B. This would certainly be less of an eyesore.

Yes, I guess I am saying this, but only for a particular type, which would instantiate homogeneous arrays. I will call this Array, and the type for heterogeneous arrays I will call Cell. 
 
There are a couple of important points to be made, however:
  1. In the longer, more explicit form, it is clear that B may not be the actual parameter type of x; the parameter type of x is some A which is a subtype of B.
With homogeneous arrays it is clear that the input array must have concrete type, even if the declaration uses an abstract type, therefore the short form still makes sense. This is like x::Real: we don't have to use frob{T<:Real}(x::T).
 
  1. If the shorter form were shorthand for the longer form without explicit A, there would be no way to express that x has an exact parameter type of B.
 With homogenous arrays, x::Array{Real} would not be a valid concrete type, so there is no ambiguity.
The explicit distinction between A – the actual parameter type of x – and B, which is simply an upper bound on A, becomes clearer in situations like this:

frob{A<:B}(x::P{A}, y::A)
frob{A<:B}(x::P{A}, y::B)

The first definition only applies to x and y where y is of the actual parameter type of x and thus could be used as is by x. The second definition applies when y is of type B, regardless of x's actual parameter type, which means that it might not be usable as is by x.

We could actually make parametric types covariant without losing the ability to express this distinction; it would just mean that the second definition could be written as:

frob(x::P{B}, y::B)

Currently, this means that x must have exact parameter type B; if types were covariant then it would mean that x's parameter type could be any A <: B. Although it may seem questionable that y not be of the true parameter type of x, allowing this wiggle room and then doing conversion at run-time as necessary is actually the standard way to do things in Julia. Conversion is often done implicitly by assignment to a type location such as a field of x or an array slot. The fact that this is quite common is why covariance might actually be ok. The main thing we would lose with parametric covariance is the ability to talk about arrays that are of *exactly* element type B where B is some abstract type. But I'm just not sure how useful that actually is. Those types are generally discouraged in any case and it's rare to want to give them special behaviors that wouldn't also apply to parametric types with concrete parameters. So perhaps we ought to consider making parametric types in Julia covariant.

As I said before, with homogeneous arrays, Array{Real} would be an abstract type, just like Real, and not a concrete type. If you want to talk about heterogeneous arrays that can contain any kind of Real, then I'm suggesting the use of Cell{Real}.
 

Thinking about it some more, the big problem with covariance is that it forces the confusion of two distinct meanings of P{B} for abstract B:
  1. The concrete instance of type P with actual parameter type B.
  1. The abstract type including all instances P{A} where A <: B.
If P{B} meant the latter, then we would either need a new way to express the former or we would be forced to identify the two. Unlike static languages, the concrete interpretation is not just a hallucination of the compiler – it exists at run-time – so we can't just have no name for it – there needs to be some concrete answer when you write typeof(x). Identifying the two meanings is very un-Julian as it would allow a concrete type to have subtypes, which would be a massive and to my mind distasteful change.

I am suggesting having two types, Array and Cell. With Array, which would use covariance, you wouldn't be able to have a concrete instance with an abstract type, so no confusion there. Cell would use invariance, so you would be able to distinguish between the two in the current way, and we would know that the array would always be heterogeneous (even if all the elements have the same type).
 

On Tue, Apr 29, 2014 at 12:13 PM, Oliver Woodford <oliver....@gmail.com> wrote:
Quick question, folks:
Does f(x::Array{Union(Int8,Int16)}) mean that x must be all Int8 or all Int16 (homogenous), or each element can be either Int8 or Int16 (heterogeneous)?
If the latter, then would I need to use f(x::Union(Array{Int8},Array{Int16})) to achieve the former?
Yes and yes.
 
If so, then what I am suggesting is having two array types, Array and Cell, and have f(x::Array{Union(Int8,Int16)}) be equivalent to the current f(x::Union(Array{Int8},Array{Int16})), and have f(x::Cell{Union(Int8,Int16)}) be equivalent to the current f(x::Array{Union(Int8,Int16)}). 
 
This is not a minor change. What you're asking for is having both invariant and covariant parametric types in the language – Array would be covariant and Cell would be invariant. If you're going to have invariant and covariant type parameters, you'll want contravariant type parameters too. At that point, you need to have a way of annotating the variance of a type's parameters – and worse still, you have to explain all this to everyone. At that point, you have Scala and all thecomplications and questions that come with that. I don't think that's the right path for a language that's meant to be used by people who are scientists rather than professional programmers (honestly, I don't really think it's a good idea for professional programmers either). Making all parametric types covariant is a more reasonable possibility, but as I pointed out, it has its own significant issues.

Yes. Sorry, I've been answering in order! I don't know if I'll want contravariant parameters. I just want one (very important) type which behaves differently to others, because this is useful and will be intuitive to people.

Currently, if I want to enforce input arrays to be homogeneous, but allow them to accept any Number, then I need to write the following function declaration: frob(x::Union(Array{Unit8},Array{Int8},...a very long list of other types). Is there no shorter form for this? Surely there is a simpler way.

Oliver Woodford

unread,
Apr 30, 2014, 10:08:15 AM4/30/14
to julia...@googlegroups.com


On Wednesday, April 30, 2014 2:31:43 PM UTC+1, Patrick O'Leary wrote:

It's a flexible type system, but it doesn't provide the power of ML or Haskell. If you really, really want this, do a runtime check:

reduce((==), [typeof(el) for el in a])

I feel that the difference between homogeneous and heterogeneous arrays is a very important distinction, and not some odd thing that you might only rarely care about. The distinction has a massive impact on speed. The point of Julia is to be a fast dynamic language. Hiding the distinction under the carpet seems contrary to one of the aims of Julia.

Ivar's suggestion of:
@assert isleaftype(T)
is nice, but it doesn't feel quite right to me.

Tim Holy

unread,
Apr 30, 2014, 10:20:23 AM4/30/14
to julia...@googlegroups.com
You can add this as the first line of your function:
assert_leaftype(T)
where
assert_leaftype(T) = isleaftype(T) || error("Must be a concrete type")

To your users, this is at least as useful as
ERROR: no method myfunction(Array{Real,1})
which is what it would be if you relied on method dispatch to generate the
error (which is what I think you're asking for).

--Tim

Oliver Woodford

unread,
Apr 30, 2014, 10:37:05 AM4/30/14
to julia...@googlegroups.com
On Wednesday, April 30, 2014 3:20:23 PM UTC+1, Tim Holy wrote:
You can add this as the first line of your function:
   assert_leaftype(T)
where
    assert_leaftype(T) = isleaftype(T) || error("Must be a concrete type")

To your users, this is at least as useful as
    ERROR: no method myfunction(Array{Real,1})
which is what it would be if you relied on method dispatch to generate the
error (which is what I think you're asking for).

--Tim

It's OK, but it still requires me to have a static parameter. Is there a O(1) time solution which avoids the need for static parameters. Something like:

function frob(x::Array)
isleaftype(x) || error("Homogeneous array required") 

though I know this won't work.

Stefan Karpinski

unread,
Apr 30, 2014, 10:39:04 AM4/30/14
to Julia Users
Fundamentally I don't think you should be in the business of forcing people who call your code to use arrays with concrete element types. If they really want to use an array with an abstract element type – and they may have perfectly good reasons to – then that's their business. Does your code break if it is applied to an array with an abstract element type? If so, you must be doing something rather strange. We've gone to great effort to make sure that even though the underlying implementation of concrete versus abstract arrays are different, the semantics are completely identical – there's no reason for code to work with one but not the other.

Jacob Quinn

unread,
Apr 30, 2014, 10:46:54 AM4/30/14
to julia...@googlegroups.com
function frob(x::Array)
isleaftype(eltype(x)) || error("Homogeneous array required')?

Though, IMO, this is all a non-issue in my experience. Just specifying frob{T<:Real}(x::Vector{T}) gets you exactly what you want--the ability to have JIT generate fast, efficient code for a range of types that the user can specify. The fact that this has never come up before or in any package implementations, to me, indicates that this issue if more of getting used to idiomatic Julia and spending some time playing with parametric types and the interactions with the type hierarchy. 

I come from a non-technical background and at first, the idea of parametric functions/types was a little wild and hard to wrap my head around, but after reading through the manual several times (which has a lot of great stuff!) and developing my own non-trivial codebase (Datetime.jl), I feel I'm comfortable with use cases and how they work in general. I think if you spend some more time developing code, poking around popular packages and Base, you'll come to find that there isn't really anything broken here (though quite possibly some things that need cleaned up a little).

Cheers,

-Jacob

Patrick O'Leary

unread,
Apr 30, 2014, 10:56:17 AM4/30/14
to julia...@googlegroups.com
This is true. I agree in general that having the tighter container is desirable but hey, if you are only interested in the homogeneity property (which is what I was testing) and not in the storage there ya go.

Stefan Karpinski

unread,
Apr 30, 2014, 10:57:28 AM4/30/14
to Julia Users
On Wed, Apr 30, 2014 at 9:57 AM, Oliver Woodford <oliver....@gmail.com> wrote:
Stefan

Firstly, thank you for taking the time to write such a lengthy response. I think it comes closest to addressing the particular question I had. Sadly I'm an Engineer, not a Computer Scientist, so it's taking me a while to get my head round. I've put a few comments inline.

No problem. I do think one of the benefits that Julia provides is bringing some of the computer science goodness that languages like Haskell and ML have to the computational sciences in a practical way, so it's perfectly understandable that there are many here who aren't big language nerds :-)
 
On Tuesday, April 29, 2014 4:51:52 PM UTC+1, Stefan Karpinski wrote:

Julia has chosen the simple approach of invariance everywhere. This simplifies the subtyping rules, but there is no free lunch – variance issues still do crop up. But at least Julia's rules are simple if occasionally a bit inconvenient. Basically, this is the situation:
  • covariance: P{A} <: P{B} ⟺ A <: B – correct for read, incorrect for write.
  • contravariance: P{A} <: P{B} ⟺ B <: A – correct for write, incorrect for read.
  • invariance: P{A} <: P{B} ⟺ A = B – correct for both read and write.
I'm not sure what you mean by "correct for read, incorrect for write" etc.

If you ask for a Basket{Fruit} and I (covariantly) give you a Basket{Apple}, it's fine if you only take things out of the basket – you will always get a Fruit since Apples are Fruit. But if you try to put an Orange, which is also a Fruit, into the basket, then there's a problem because an Orange is not an Apple. On the other hand, if you ask for a Basket{Fruit} and I (contravariantly) give you a Basket{Food}, it's fine if you put things into the basket – since anything you put in will be Fruit and all Fruit is Food (let's say), it will be ok. If you start taking things out of the basket, expecting Fruit, however, you may get a Sausage or a Cheese, which is not what you were expecting.

frob{A<:B}(x::P{A})

This is effectively a covariant signature, applying to P{A} for any A <: B. It seems like you would be happier if parametric types were covariant and you were allowed to write this as:

frob(x::P{B})

where the method implicitly also applies to x::P{A} for every A <: B. This would certainly be less of an eyesore.

Yes, I guess I am saying this, but only for a particular type, which would instantiate homogeneous arrays. I will call this Array, and the type for heterogeneous arrays I will call Cell.

Types don't live in isolation – you can't just make one parametric type covariant while all others are invariant. You have to add covariant types to the type system and make everything still work together. It's not impossible, but it's a very complex, interconnected system.

As to the rest of what you write (omitted for brevity), it seems like what you really want is the ability to declare parametric types whose type parameters can only be concrete. That's an interesting notion, but I'm not sure it could (or should) be made to fit into the rest of the type system coherently. I'll have to think about it a bit more. I'm also not sure *why* you would want to do this. As I mentioned in another email, we've gone to great lengths to make sure that there's no semantic difference between what you call a heterogeneous array and a homogeneous array.

Tim Holy

unread,
Apr 30, 2014, 10:59:56 AM4/30/14
to julia...@googlegroups.com
What's wrong with parameters? Are you making life harder for yourself than you
need to? :-)

If you're worried about the (very slight) overhead (isleaftype calls out to C,
and therefore can't be inlined), then you could use a Union where you specify
all the types you allow. Or define the following function:

fastleaftype{T}(::Type{T}) = error("Must be a concrete type")

for x in names(Core)
T = eval(x)
if isleaftype(T) && T <: Number
@eval begin
fastleaftype(::Type{$T}) = nothing
end
end
end

Then:

function myfunction(x::Array)
fastleaftype(eltype(x))
x .+ 1
end

Now check this out:
julia> code_typed(myfunction, (Matrix{Float64},))
1-element Array{Any,1}:
:($(Expr(:lambda, {:x}, {{},{{:x,Array{Float64,2},0}},{}}, :(begin # none,
line 2:
nothing # line 3:
return x::Array{Float64,2} .+ 1::Array{Float64,2}
end::Array{Float64,2}))))

julia> code_typed(myfunction, (Matrix{Real},))
1-element Array{Any,1}:
:($(Expr(:lambda, {:x}, {{},{{:x,Array{Real,2},0}},{}}, :(begin # none, line
2:
throw($(Expr(:new, :(top(getfield)
(Base,:ErrorException)::Type{ErrorException}), "Must be a concrete
type"))::ErrorException)::None # line 3:
return x::Array{Real,2} .+ 1::Array{Real,2}
end::Array{Real,2}))))


The compiler does the check for you, and this adds nothing whatsoever to your
runtime.

--Tim

Stefan Karpinski

unread,
Apr 30, 2014, 11:01:22 AM4/30/14
to Julia Users
I really think the only code that should be too worried about whether something is a leaf type or not is the compiler...

Oliver Woodford

unread,
Apr 30, 2014, 11:30:55 AM4/30/14
to julia...@googlegroups.com, quinn....@gmail.com
On Wednesday, April 30, 2014 3:46:54 PM UTC+1, Jacob Quinn wrote:
function frob(x::Array)
isleaftype(eltype(x)) || error("Homogeneous array required')?

Though, IMO, this is all a non-issue in my experience. Just specifying frob{T<:Real}(x::Vector{T}) gets you exactly what you want--the ability to have JIT generate fast, efficient code for a range of types that the user can specify. The fact that this has never come up before or in any package implementations, to me, indicates that this issue if more of getting used to idiomatic Julia and spending some time playing with parametric types and the interactions with the type hierarchy.  
I come from a non-technical background and at first, the idea of parametric functions/types was a little wild and hard to wrap my head around, but after reading through the manual several times (which has a lot of great stuff!) and developing my own non-trivial codebase (Datetime.jl), I feel I'm comfortable with use cases and how they work in general. I think if you spend some more time developing code, poking around popular packages and Base, you'll come to find that there isn't really anything broken here (though quite possibly some things that need cleaned up a little). 

Cheers,

-Jacob
 

I have started developing my own non-trivial code base, which is what prompted me to start this thread. I also have a technical background.

You don't care whether someone uses homogeneous or heterogeneous arrays. Maybe you think it's their problem if they make things slow. I want to make them aware of it. On that we differ.

What I find absurd is that, rather than accept that different position, based on the explanation I have given (to make things fast), or make a point which invalidates my argument, you attempt to explain away this need by the fact that I am new to the language and that I'm simply not used to it.

You have taken a very concrete point that I make, and attempted to dismiss it with the suggestion that I simply don't know what I'm talking about. Not at all constructive.
 

Oliver Woodford

unread,
Apr 30, 2014, 11:37:15 AM4/30/14
to julia...@googlegroups.com
On Wednesday, April 30, 2014 4:01:22 PM UTC+1, Stefan Karpinski wrote:
I really think the only code that should be too worried about whether something is a leaf type or not is the compiler...


Are you saying:
a) we should use another way to distinguish between homogeneous and heterogeneous input arrays,
or
b) that only the compiler should care whether input arrays are homogeneous and heterogeneous input arrays.

If b) then:
1) doesn't it make a big difference to the speed of some functions, e.g. sum()?
and
2) if so, is it unreasonable to encourage the user to use homogeneous arrays in those situations?

Tim Holy

unread,
Apr 30, 2014, 11:45:07 AM4/30/14
to julia...@googlegroups.com
I think Stefan was merely saying that my illustration of how to inline away
the type-check was overkill :).

--Tim

Stefan Karpinski

unread,
Apr 30, 2014, 11:44:53 AM4/30/14
to Julia Users, Jacob Quinn
I think this suggestion was based on Jacob's personal experience – which others seem to have also had – that after a bit of time letting the language wash over you Julia, things that seemed problematic or disconcerting early on have a way of not seeming important anymore. That might or might not happen for you with this particular issue.

Personally, I don't think library code should be forcing arrays to have concrete element types. Is the library going to raise an error? If so, you're needlessly forcing the user to either not use the library or modify it to excise the pointless check. That seems bad. Is it going to print a warning but let you continue? That's going to be annoying, although at least I guess the user definitely knows they have a array with abstract element type. But is this really your place to warn them? This seems to treat the user like a child – shouldn't a library assume that the user is doing things for a reason and not by mistake? Since the main concern with heterogeneous arrays is performance, I feel that should be discovered via benchmarking or other performance analysis tools, not because some library is feeling avuncular.

Jacob Quinn

unread,
Apr 30, 2014, 11:51:13 AM4/30/14
to julia...@googlegroups.com
I was just trying to share some of my own experience with Julia from having used it for the last two years, not be dismissive or condescending. That this hasn't been a show-stopper for some 300+ packages now, IMO, is a valid point against making a somewhat disruptive change to the type system.

By stating that I come from a non-technical background, I was also trying to qualify that you may feel free to dismiss my argument because I may well be talking nonsense :). 

-Jacob

Stefan Karpinski

unread,
Apr 30, 2014, 11:58:32 AM4/30/14
to Julia Users
On Wed, Apr 30, 2014 at 11:37 AM, Oliver Woodford <oliver....@gmail.com> wrote:
On Wednesday, April 30, 2014 4:01:22 PM UTC+1, Stefan Karpinski wrote:
I really think the only code that should be too worried about whether something is a leaf type or not is the compiler...


Are you saying:
a) we should use another way to distinguish between homogeneous and heterogeneous input arrays,
or
b) that only the compiler should care whether input arrays are homogeneous and heterogeneous input arrays.

b)
 
If b) then:
1) doesn't it make a big difference to the speed of some functions, e.g. sum()?
and
2) if so, is it unreasonable to encourage the user to use homogeneous arrays in those situations?

It can make a big difference, but I don't think that libraries or the language should refuse perfectly reasonable functionality because it might be a bit slower. There are lots of situations where you don't care that much about performance and flexibility is more important. Doing computations with BigInts and BigFloats is dog slow compared to using native types but sometimes the increased precision is more important than speed. Should libraries refuse to work with BigInts and BigFloats because they're known to be slow? Even if the same code would work just fine with BitInts as Ints? That's very un-Julian.

The Julian approach is informed consent. You can use heterogeneous arrays and they will work, but they might be slow. If your code is too slow, consider changing it so that it uses homogeneous arrays instead. If knowing that this is the problem is hard, we should – and do – provide tools to help discover such things. The TypeCheck package is a great example of such a tool, giving many of the benefits of the static type checking without forcing static type checking on the user everywhere. TypeCheck can warn the user about things that they are allowed to do, but which might cause performance problems, like changing the type of a variable over the course of a loop or having functions whose return type depends on the values of its inputs, not just their type.

Oliver Woodford

unread,
Apr 30, 2014, 12:34:26 PM4/30/14
to julia...@googlegroups.com, Jacob Quinn
On Wednesday, April 30, 2014 4:44:53 PM UTC+1, Stefan Karpinski wrote:
I think this suggestion was based on Jacob's personal experience – which others seem to have also had – that after a bit of time letting the language wash over you Julia, things that seemed problematic or disconcerting early on have a way of not seeming important anymore. That might or might not happen for you with this particular issue.

I understand, but I think it helps to counter arguments that have been made before resorting to more prosaic explanations. After all, we don't all have the same experiences, but we can all agree on whether homogeneous arrays are more efficient. IMO it didn't help the discussion and detracted from from the valid points I made (which, granted, annoyed me). Hence my rebuttal.


Personally, I don't think library code should be forcing arrays to have concrete element types. Is the library going to raise an error? If so, you're needlessly forcing the user to either not use the library or modify it to excise the pointless check. That seems bad. Is it going to print a warning but let you continue? That's going to be annoying, although at least I guess the user definitely knows they have a array with abstract element type. But is this really your place to warn them? This seems to treat the user like a child – shouldn't a library assume that the user is doing things for a reason and not by mistake? Since the main concern with heterogeneous arrays is performance, I feel that should be discovered via benchmarking or other performance analysis tools, not because some library is feeling avuncular.

The Julian approach. Catchy. I can certainly understand your point of view on this.

However, why shouldn't it be the library's place to tell it's users how to make it run faster? The argument of treating the user like a child is an exaggeration, of course. You're treating them like someone who doesn't appreciate the efficiency savings of homogeneous arrays over heterogeneous ones. For the people that applies to, I suspect they'd appreciate the tip. I suppose the only real argument is that some advanced users may have a real need to pass in a heterogeneous array, and the library wouldn't let them. But then you're weighing up that risk against helping less advanced users (and even advanced users who've accidentally allowed an array to be heterogeneous) use it efficiently. It's not a clear cut decision; it's very subjective.  Yes, benchmarking is another useful tool that help users improve performance, but it's not an argument as to why libraries shouldn't. So I think this is a situation where we can agree to disagree. An avuncular library - catchy too!

The TypeCheck package sounds good. Does it check for homogeneity of arrays? Even so, for a library that would be impossible to catch when the heterogeneous array was be being passed in through the API, so then you're again relying on users to run that on their particular program. Will they?

Oliver Woodford

unread,
Apr 30, 2014, 1:00:22 PM4/30/14
to julia...@googlegroups.com, quinn....@gmail.com
On Wednesday, April 30, 2014 4:51:13 PM UTC+1, Jacob Quinn wrote:
I was just trying to share some of my own experience with Julia from having used it for the last two years, not be dismissive or condescending. That this hasn't been a show-stopper for some 300+ packages now, IMO, is a valid point against making a somewhat disruptive change to the type system.

OK, well from the first two lines it seemed like you were contesting the issue that detecting homo/heterogeneity could ever be useful, not about my suggested change to the language. I had given a good reason for the former already, which you didn't address.

Also, the statement "frob{T<:Real}(x::Vector{T}) gets you exactly what you want" isn't true; in my original post I specifically said I wanted to distinguish homogeneous and heterogeneous arrays, and as was pointed out, this doesn't achieve that.

"I was just trying to..". No, you were also trying to explain away my question on the grounds of my experience, rather than address it directly. But a true fact is true, regardless of who says it, and we should give weight to the argument rather than who is saying it. The latter just gives more power to those who already have it, and corrupts society. Please consider this point, because I stand by what I said - I don't think your response was constructive. It attacked me and not my argument, and that is not a healthy way to evaluate ideas.
 

By stating that I come from a non-technical background, I was also trying to qualify that you may feel free to dismiss my argument because I may well be talking nonsense :). 

Duly dismissed :).
 

Jacob Quinn

unread,
Apr 30, 2014, 1:47:43 PM4/30/14
to julia...@googlegroups.com
I apologize if you felt it was a personal attack, because it certainly wasn't intended as such. I've always appreciated the Julia community and how welcoming and encouraging it is and I wouldn't want to tarnish that reputation. I'm far from the best person to comment on the particularities of type invariance/co-variance/contra-variance, but I still enjoy contributing to discussions when I can. In this case, I felt my own experience getting comfortable with parametric types could be useful.

-Jacob

Oliver Woodford

unread,
Apr 30, 2014, 4:02:43 PM4/30/14
to julia...@googlegroups.com, quinn....@gmail.com
On Wednesday, April 30, 2014 6:47:43 PM UTC+1, Jacob Quinn wrote:
I apologize if you felt it was a personal attack, because it certainly wasn't intended as such. I've always appreciated the Julia community and how welcoming and encouraging it is and I wouldn't want to tarnish that reputation. I'm far from the best person to comment on the particularities of type invariance/co-variance/contra-variance, but I still enjoy contributing to discussions when I can. In this case, I felt my own experience getting comfortable with parametric types could be useful.

-Jacob

I don't think it was a personal attack - I'm sorry I used the word because you have missed the point I was making as a result. I am not after an apology. I don't want you to stop providing your valuable experience with anyone. Your testimony could be told to every person about to learn Julia - that would be great. However you shared it when we were halfway through a debate about a particular point, that we could perfectly well discuss on its own merits, drawing attention away from that and on to my experience, which at that point seems irrelevant. The point I made in my last post, and that I make again now, is that that approach to debate doesn't do us (me, you, this group, society) any favours - it's akin to focussing on personality not policy in politics, and what does that get us? IMO it simply wasn't the right moment for you to share that experience. All I would like is an understanding and acknowledgement of that point. And if you don't get it or disagree, not to worry. Please do continue to contribute in whatever way you feel is appropriate - and if I have had a positive impact in that respect, great. This will be my last message on the subject (trying to make the world a better place though sometimes it doesn't feel like it), so feel free to have the last word.

Stefan Karpinski

unread,
Apr 30, 2014, 4:13:50 PM4/30/14
to Julia Users, Jacob Quinn
Back to the original subject, because I think there's some value in summarizing it.

1. Variance. It is somewhat unintuitive to a lot of newcomers that in Julia Array{Real} is a concrete type and Array{Int} is not a subtype of it. It might be possible to make parametric types in Julia covariant by default, but then one must either come up with a different way of writing the concrete type currently known as Array{Real} or allow concrete parametric types like Array{Real} to have subtypes, namely Array{T} for each T <: Real.

2. Restricting type parameters to concrete types. The variance issue was a bit of a red herring. What Oliver was really after, it seems (correct me if I'm wrong), was being able to ensure that the T in Array{T} is always a concrete type.

Oliver Woodford

unread,
May 1, 2014, 4:59:08 AM5/1/14
to julia...@googlegroups.com, Jacob Quinn
On Wednesday, April 30, 2014 9:13:50 PM UTC+1, Stefan Karpinski wrote:
Back to the original subject, because I think there's some value in summarizing it.

1. Variance. It is somewhat unintuitive to a lot of newcomers that in Julia Array{Real} is a concrete type and Array{Int} is not a subtype of it. It might be possible to make parametric types in Julia covariant by default, but then one must either come up with a different way of writing the concrete type currently known as Array{Real} or allow concrete parametric types like Array{Real} to have subtypes, namely Array{T} for each T <: Real. 

2. Restricting type parameters to concrete types. The variance issue was a bit of a red herring. What Oliver was really after, it seems (correct me if I'm wrong), was being able to ensure that the T in Array{T} is always a concrete type.

Well, that would certainly be a solution. You could have a type modifier, say Concrete(Real), which returns the union of all concrete types which are subtypes of Real (or whatever). Then you could use:
frob{T<:Concrete(Real)}(x:Array{T})
to assert that input arrays are homogeneous.

What I would really like is to be able to write:
frob(x:Array{Concrete(Real)})
instead, because the static parameter seems as redundant to a beginner as it does in frob{T<:Real}(x::T). However, I understand that the two are different in the case of arrays/parameterized types, as the latter also encompasses some heterogeneous array types, and that that won't change unless the variance of Julia is changed. I'm not in a position to advocate such a thing.
 

Oliver Woodford

unread,
May 1, 2014, 6:28:57 AM5/1/14
to julia...@googlegroups.com, Jacob Quinn
In fact, under the current type system frob(x:Array{Concrete(Real)}) is exactly equivalent to frob(x:Array{Real}). Julia could have a different type system, where "Concrete()" implies covariance, such that frob(x:Array{Concrete(Real)}) enforces arrays to be homogeneous (i.e. frob(x:Array{Concrete(Real)}) and frob(x:Array{Real}) are different), without losing the ability to describe any current array types. Just throwing it out there. Possibly more confusing than enabling.

Oliver Woodford

unread,
May 1, 2014, 8:57:40 AM5/1/14
to julia...@googlegroups.com
On Wednesday, April 30, 2014 3:57:28 PM UTC+1, Stefan Karpinski wrote:
If you ask for a Basket{Fruit} and I (covariantly) give you a Basket{Apple}, it's fine if you only take things out of the basket – you will always get a Fruit since Apples are Fruit. But if you try to put an Orange, which is also a Fruit, into the basket, then there's a problem because an Orange is not an Apple. On the other hand, if you ask for a Basket{Fruit} and I (contravariantly) give you a Basket{Food}, it's fine if you put things into the basket – since anything you put in will be Fruit and all Fruit is Food (let's say), it will be ok. If you start taking things out of the basket, expecting Fruit, however, you may get a Sausage or a Cheese, which is not what you were expecting.

This is a fantastic example. Thank you! You should try to get it into the wikipedia article on variance.

Simon Kornblith

unread,
May 1, 2014, 11:20:31 AM5/1/14
to julia...@googlegroups.com
It seems to me that the massive difference in performance between homogeneous and heterogeneous arrays is at least in part a characteristic of the implementation and not the language. We currently store heterogeneous arrays as arrays of boxed pointers and perform function calls on values taken out of them using jl_apply_generic, but I think this could be made more efficient. For arrays of types that are sufficiently small or sufficiently close in size, e.g. Array{Union(Float32,Float64)}, we could store type information and value in the array instead of resorting to boxing, and we could create code that branches based on type unions instead of resorting to jl_apply_generic. Then (some?) heterogeneous arrays could be nearly as fast as homogeneous arrays, modulo branch prediction for the type. This is basically how we store DataArrays, which could just be Array{Union(T,NA)} if the compiler did this. This is easier for arrays of unions of concrete types than it is for arrays of abstract types, since for abstract types someone could subtype the element type of the array after the array has been constructed and both storage and code need to be able to accommodate that. But I'm not sure it's right to structure the language based on the performance characteristics of the current implementation unless we think they cannot be improved.

Simon

Andreas Noack Jensen

unread,
May 2, 2014, 9:06:15 AM5/2/14
to julia...@googlegroups.com
Does your code break if it is applied to an array with an abstract element type? If so, you must be doing something rather strange.

Stefan, I think this is stated too strongly. There have been many questions and issues related to arrays with abstract element type. For functions returning arrays it is not so intuitive to get the promotion right and there are still problems in base, e.g. this funny one

julia> sum(Real[1.0 1.0],1)
1x2 Array{Int64,2}:
 1  1



--
Med venlig hilsen

Andreas Noack Jensen

Stefan Karpinski

unread,
May 2, 2014, 11:12:26 AM5/2/14
to Julia Users
Well, yes, I should have put a rather big asterisk on that:

* Unless your code explicitly depends on the element type of that or derived arrays.

So, yes, it's not that simple.
Reply all
Reply to author
Forward
0 new messages