Julia compiles specialized versions of functions for the concrete types they are called with at run time, so you should usually think of the types in function signatures as documentation/assumption checks rather than a performance optimization.
This is in contrast to explicit types used in type declarations or inside function bodies, which can have important performance effects.
Do you find yourself dealing with heterogeneous arrays often? It might be useful to post some examples where this comes up to the list.
Julia compiles specialized versions of functions for the concrete types they are called with at run time, so you should usually think of the types in function signatures as documentation/assumption checks rather than a performance optimization.
This is in contrast to explicit types used in type declarations or inside function bodies, which can have important performance effects.
Sorry for nitpicking, but point 3 is wrong, and it might cause trouble in the following discussion.f{T<:Real}(a::Array{T})Matches any array with a element type that is a subtype of Real (eg. Integer[1,2,BigInt(44)] and Real[1, 3.4])
I have had trouble with this too, but now that I somewhat understand the rationale, I'm less frustrated. I'm not at the point of defending the current behaviour (yet), so others will have to do that (again).
Are you saying that f{T<:Real}(a::Array{T}) covers both homogeneous and heterogeneous arrays, whereas f(a::Array{Real}) only covers heterogeneous arrays? If that's the case it strikes me as just as confusing.
Array{Any} is also a single type.
On Apr 29, 2014, at 8:28 AM, Oliver Woodford <oliver....@gmail.com> wrote:Are you saying that f{T<:Real}(a::Array{T}) covers both homogeneous and heterogeneous arrays, whereas f(a::Array{Real}) only covers heterogeneous arrays? If that's the case it strikes me as just as confusing.This is indeed how parametric types work in Julia.Array{T} is a family of types, one for each value of T. Defining f{T}(a::Array{T}) allows you to generically define an infinite number of functions -- you get a unique function for each value of T.
In contrast, Array{Real} is a single type -- specifically, the type generated by the parametric type Array{T} when its parameter T = Real. There is no variable T, so f(a::Array{Real}) defines exactly one function -- and that function accepts exactly one type of input.
Array{Any} is also a single type.-- John
A programming language designer will consider variance when devising typing rules for e.g. arrays, inheritance, and generic datatypes. By making type constructors covariant or contravariant instead of invariant, more programs will be accepted as well-typed. On the other hand, programmers often find contravariance unintuitive, and accurately tracking variance to avoid runtime type errors can lead to complex typing rules. In order to keep the type system simple and allow useful programs, a language may treat a type constructor as invariant even if it would be safe to consider it variant, or treat it as covariant even when that can violate type safety.
frob{A<:B}(x::P{A})
frob(x::P{B})
frob{A<:B}(x::P{A}, y::A)frob{A<:B}(x::P{A}, y::B)
frob(x::P{B}, y::B)
It might be easier to understand if we start by stripping away the constraints and the function syntax and look just at the array type.
Array{Real} is a concrete type. It's an array whose elements are all subtypes of the abstract type Real.
Array{T} is a family of types. It describes a whole bunch of types, including Array{Real}, but also including Array{Float64}, Array{All} and Array{Bob}. T can itself be either abstract--a heterogenous array--or concrete.
The constrained type variable {T<:Real}(::Array{T}) still describes a family of types, but that family is smaller. Array{All} and Array{Bob} are out.
Hope this helps!
I use cell arrays very often in Matlab, too, but I've found that I often don't really need to even worry about the distinction in julia. Square brackets will constrain the types as much as possible, and if it's not possible, Any[] == {}.Moreover, most of what I used cell arrays for in Matlab are completely obviated in Julia — Cell arrays of strings (strings are first class) and passing/parsing/splatting varargs (keyword arguments are wonderful and any Julian collection can be splatted).
ConcreteArray gives the guarantee that all of it's elements are stored directly with a fixed stride.
Can you give some examples of functions that would use these in their type signature?
In my experience, I can think of a single time when having an array of a specific abstract type was useful (how multiple Period arithmetic is handled: https://github.com/karbarcca/Dates.jl/blob/master/src/periods.jl#L88).Almost always, I'm concentrating on making sure Arrays I work with are of a specific, concrete type to ensure performance and memory benefits. If I really realize a few different types might make it in an Array, I usually just go straight to a {} (Any Array) and move on.-Jacob
Quick question, folks:Does f(x::Array{Union(Int8,Int16)}) mean that x must be all Int8 or all Int16 (homogenous), or each element can be either Int8 or Int16 (heterogeneous)?If the latter, then would I need to use f(x::Union(Array{Int8},Array{Int16})) to achieve the former?
If so, then what I am suggesting is having two array types, Array and Cell, and have f(x::Array{Union(Int8,Int16)}) be equivalent to the current f(x::Union(Array{Int8},Array{Int16})), and have f(x::Cell{Union(Int8,Int16)}) be equivalent to the current f(x::Array{Union(Int8,Int16)}).
The explicit distinction between A – the actual parameter type of x – and B, which is simply an upper bound on A, becomes clearer in situations like this:
frob{A<:B}(x::P{A}, y::A)
frob{A<:B}(x::P{A}, y::B)
The first definition only applies to x and y where y is of the actual parameter type of x and thus could be used as is by x. The second definition applies when y is of type B, regardless of x's actual parameter type, which means that it might not be usable as is by x.
We could actually make parametric types covariant without losing the ability to express this distinction; it would just mean that the second definition could be written as:
frob(x::P{B}, y::B)
Currently, this means that x must have exact parameter type B; if types were covariant then it would mean that x's parameter type could be any A <: B. Although it may seem questionable that y not be of the true parameter type of x, allowing this wiggle room and then doing conversion at run-time as necessary is actually the standard way to do things in Julia. Conversion is often done implicitly by assignment to a type location such as a field of x or an array slot. The fact that this is quite common is why covariance might actually be ok. The main thing we would lose with parametric covariance is the ability to talk about arrays that are of *exactly* element type B where B is some abstract type. But I'm just not sure how useful that actually is. Those types are generally discouraged in any case and it's rare to want to give them special behaviors that wouldn't also apply to parametric types with concrete parameters. So perhaps we ought to consider making parametric types in Julia covariant.
OK, well, just because I wrote that, I realize there is one problem that I can't off the top of my head say how I'd solve: type assertions for e.g. arrays. Say I have a variable x, and I want to make sure that it's an array of real numbers that are all of the same type, but I don't care which one. Can I say x::Array{T<:Real} as a type assertion? (I'm at a computer without Julia atm, so I can't test it for myself...)
Is that correct? If not, what really is the correct way to constrain input arrays to be homogenous?
The real root issue here is parametric variance. Julia has opted for parametric invariance – i.e. that P{A} <: P{B} only if A = B – which is what James mentions in his StackOverflow answer. The wikipedia page does a decent job of summarizing the issue:A programming language designer will consider variance when devising typing rules for e.g. arrays, inheritance, and generic datatypes. By making type constructors covariant or contravariant instead of invariant, more programs will be accepted as well-typed. On the other hand, programmers often find contravariance unintuitive, and accurately tracking variance to avoid runtime type errors can lead to complex typing rules. In order to keep the type system simple and allow useful programs, a language may treat a type constructor as invariant even if it would be safe to consider it variant, or treat it as covariant even when that can violate type safety.Julia has chosen the simple approach of invariance everywhere. This simplifies the subtyping rules, but there is no free lunch – variance issues still do crop up. But at least Julia's rules are simple if occasionally a bit inconvenient. Basically, this is the situation:
- covariance: P{A} <: P{B} ⟺ A <: B – correct for read, incorrect for write.
- contravariance: P{A} <: P{B} ⟺ B <: A – correct for write, incorrect for read.
- invariance: P{A} <: P{B} ⟺ A = B – correct for both read and write.
The main purpose of variance in a programming language is to allow definitions to apply to more things. In particular, since the number of types that a parametric type may encompass is unbounded, you don't want to have to write an infinite number of method definitions to define some functionality for P{A} for all A <: B where B is some unbounded abstract type. Instead of accomplishing this with covariance or contravariance, Julia accomplishes it with more powerful dispatch – specifically, method signatures with quantified type parameters, such as:frob{A<:B}(x::P{A})This is effectively a covariant signature, applying to P{A} for any A <: B. It seems like you would be happier if parametric types were covariant and you were allowed to write this as:frob(x::P{B})where the method implicitly also applies to x::P{A} for every A <: B. This would certainly be less of an eyesore.
There are a couple of important points to be made, however:
- In the longer, more explicit form, it is clear that B may not be the actual parameter type of x; the parameter type of x is some A which is a subtype of B.
- If the shorter form were shorthand for the longer form without explicit A, there would be no way to express that x has an exact parameter type of B.
The explicit distinction between A – the actual parameter type of x – and B, which is simply an upper bound on A, becomes clearer in situations like this:frob{A<:B}(x::P{A}, y::A)frob{A<:B}(x::P{A}, y::B)The first definition only applies to x and y where y is of the actual parameter type of x and thus could be used as is by x. The second definition applies when y is of type B, regardless of x's actual parameter type, which means that it might not be usable as is by x.We could actually make parametric types covariant without losing the ability to express this distinction; it would just mean that the second definition could be written as:frob(x::P{B}, y::B)Currently, this means that x must have exact parameter type B; if types were covariant then it would mean that x's parameter type could be any A <: B. Although it may seem questionable that y not be of the true parameter type of x, allowing this wiggle room and then doing conversion at run-time as necessary is actually the standard way to do things in Julia. Conversion is often done implicitly by assignment to a type location such as a field of x or an array slot. The fact that this is quite common is why covariance might actually be ok. The main thing we would lose with parametric covariance is the ability to talk about arrays that are of *exactly* element type B where B is some abstract type. But I'm just not sure how useful that actually is. Those types are generally discouraged in any case and it's rare to want to give them special behaviors that wouldn't also apply to parametric types with concrete parameters. So perhaps we ought to consider making parametric types in Julia covariant.
Thinking about it some more, the big problem with covariance is that it forces the confusion of two distinct meanings of P{B} for abstract B:
- The concrete instance of type P with actual parameter type B.
If P{B} meant the latter, then we would either need a new way to express the former or we would be forced to identify the two. Unlike static languages, the concrete interpretation is not just a hallucination of the compiler – it exists at run-time – so we can't just have no name for it – there needs to be some concrete answer when you write typeof(x). Identifying the two meanings is very un-Julian as it would allow a concrete type to have subtypes, which would be a massive and to my mind distasteful change.
- The abstract type including all instances P{A} where A <: B.
On Tue, Apr 29, 2014 at 12:13 PM, Oliver Woodford <oliver....@gmail.com> wrote:
Quick question, folks:
Does f(x::Array{Union(Int8,Int16)}) mean that x must be all Int8 or all Int16 (homogenous), or each element can be either Int8 or Int16 (heterogeneous)?
If the latter, then would I need to use f(x::Union(Array{Int8},Array{Int16})) to achieve the former?
Yes and yes.
If so, then what I am suggesting is having two array types, Array and Cell, and have f(x::Array{Union(Int8,Int16)}) be equivalent to the current f(x::Union(Array{Int8},Array{Int16})), and have f(x::Cell{Union(Int8,Int16)}) be equivalent to the current f(x::Array{Union(Int8,Int16)}).
This is not a minor change. What you're asking for is having both invariant and covariant parametric types in the language – Array would be covariant and Cell would be invariant. If you're going to have invariant and covariant type parameters, you'll want contravariant type parameters too. At that point, you need to have a way of annotating the variance of a type's parameters – and worse still, you have to explain all this to everyone. At that point, you have Scala and all thecomplications and questions that come with that. I don't think that's the right path for a language that's meant to be used by people who are scientists rather than professional programmers (honestly, I don't really think it's a good idea for professional programmers either). Making all parametric types covariant is a more reasonable possibility, but as I pointed out, it has its own significant issues.
It's a flexible type system, but it doesn't provide the power of ML or Haskell. If you really, really want this, do a runtime check:
reduce((==), [typeof(el) for el in a])
You can add this as the first line of your function:
assert_leaftype(T)
where
assert_leaftype(T) = isleaftype(T) || error("Must be a concrete type")
To your users, this is at least as useful as
ERROR: no method myfunction(Array{Real,1})
which is what it would be if you relied on method dispatch to generate the
error (which is what I think you're asking for).
--Tim
StefanFirstly, thank you for taking the time to write such a lengthy response. I think it comes closest to addressing the particular question I had. Sadly I'm an Engineer, not a Computer Scientist, so it's taking me a while to get my head round. I've put a few comments inline.
On Tuesday, April 29, 2014 4:51:52 PM UTC+1, Stefan Karpinski wrote:Julia has chosen the simple approach of invariance everywhere. This simplifies the subtyping rules, but there is no free lunch – variance issues still do crop up. But at least Julia's rules are simple if occasionally a bit inconvenient. Basically, this is the situation:
- covariance: P{A} <: P{B} ⟺ A <: B – correct for read, incorrect for write.
- contravariance: P{A} <: P{B} ⟺ B <: A – correct for write, incorrect for read.
- invariance: P{A} <: P{B} ⟺ A = B – correct for both read and write.
I'm not sure what you mean by "correct for read, incorrect for write" etc.
frob{A<:B}(x::P{A})This is effectively a covariant signature, applying to P{A} for any A <: B. It seems like you would be happier if parametric types were covariant and you were allowed to write this as:frob(x::P{B})where the method implicitly also applies to x::P{A} for every A <: B. This would certainly be less of an eyesore.Yes, I guess I am saying this, but only for a particular type, which would instantiate homogeneous arrays. I will call this Array, and the type for heterogeneous arrays I will call Cell.
function frob(x::Array)isleaftype(eltype(x)) || error("Homogeneous array required')?Though, IMO, this is all a non-issue in my experience. Just specifying frob{T<:Real}(x::Vector{T}) gets you exactly what you want--the ability to have JIT generate fast, efficient code for a range of types that the user can specify. The fact that this has never come up before or in any package implementations, to me, indicates that this issue if more of getting used to idiomatic Julia and spending some time playing with parametric types and the interactions with the type hierarchy.
I come from a non-technical background and at first, the idea of parametric functions/types was a little wild and hard to wrap my head around, but after reading through the manual several times (which has a lot of great stuff!) and developing my own non-trivial codebase (Datetime.jl), I feel I'm comfortable with use cases and how they work in general. I think if you spend some more time developing code, poking around popular packages and Base, you'll come to find that there isn't really anything broken here (though quite possibly some things that need cleaned up a little).
Cheers,-Jacob
I really think the only code that should be too worried about whether something is a leaf type or not is the compiler...
On Wednesday, April 30, 2014 4:01:22 PM UTC+1, Stefan Karpinski wrote:I really think the only code that should be too worried about whether something is a leaf type or not is the compiler...Are you saying:a) we should use another way to distinguish between homogeneous and heterogeneous input arrays,orb) that only the compiler should care whether input arrays are homogeneous and heterogeneous input arrays.
If b) then:1) doesn't it make a big difference to the speed of some functions, e.g. sum()?and2) if so, is it unreasonable to encourage the user to use homogeneous arrays in those situations?
I think this suggestion was based on Jacob's personal experience – which others seem to have also had – that after a bit of time letting the language wash over you Julia, things that seemed problematic or disconcerting early on have a way of not seeming important anymore. That might or might not happen for you with this particular issue.
Personally, I don't think library code should be forcing arrays to have concrete element types. Is the library going to raise an error? If so, you're needlessly forcing the user to either not use the library or modify it to excise the pointless check. That seems bad. Is it going to print a warning but let you continue? That's going to be annoying, although at least I guess the user definitely knows they have a array with abstract element type. But is this really your place to warn them? This seems to treat the user like a child – shouldn't a library assume that the user is doing things for a reason and not by mistake? Since the main concern with heterogeneous arrays is performance, I feel that should be discovered via benchmarking or other performance analysis tools, not because some library is feeling avuncular.
I was just trying to share some of my own experience with Julia from having used it for the last two years, not be dismissive or condescending. That this hasn't been a show-stopper for some 300+ packages now, IMO, is a valid point against making a somewhat disruptive change to the type system.
By stating that I come from a non-technical background, I was also trying to qualify that you may feel free to dismiss my argument because I may well be talking nonsense :).
I apologize if you felt it was a personal attack, because it certainly wasn't intended as such. I've always appreciated the Julia community and how welcoming and encouraging it is and I wouldn't want to tarnish that reputation. I'm far from the best person to comment on the particularities of type invariance/co-variance/contra-variance, but I still enjoy contributing to discussions when I can. In this case, I felt my own experience getting comfortable with parametric types could be useful.-Jacob
Back to the original subject, because I think there's some value in summarizing it.1. Variance. It is somewhat unintuitive to a lot of newcomers that in Julia Array{Real} is a concrete type and Array{Int} is not a subtype of it. It might be possible to make parametric types in Julia covariant by default, but then one must either come up with a different way of writing the concrete type currently known as Array{Real} or allow concrete parametric types like Array{Real} to have subtypes, namely Array{T} for each T <: Real.
2. Restricting type parameters to concrete types. The variance issue was a bit of a red herring. What Oliver was really after, it seems (correct me if I'm wrong), was being able to ensure that the T in Array{T} is always a concrete type.
If you ask for a Basket{Fruit} and I (covariantly) give you a Basket{Apple}, it's fine if you only take things out of the basket – you will always get a Fruit since Apples are Fruit. But if you try to put an Orange, which is also a Fruit, into the basket, then there's a problem because an Orange is not an Apple. On the other hand, if you ask for a Basket{Fruit} and I (contravariantly) give you a Basket{Food}, it's fine if you put things into the basket – since anything you put in will be Fruit and all Fruit is Food (let's say), it will be ok. If you start taking things out of the basket, expecting Fruit, however, you may get a Sausage or a Cheese, which is not what you were expecting.
Does your code break if it is applied to an array with an abstract element type? If so, you must be doing something rather strange.