A question of Style: Iterators into regular Arrays

624 views
Skip to first unread message

Gabriel Gellner

unread,
Oct 21, 2015, 3:11:44 AM10/21/15
to julia-users

I find the way that you need to use `linspace` and `range` objects a bit jarring for when you want to write vectorized code, or when I want to pass an array to a function that requires an Array. I get how nice the iterators are when writing loops and that you can use `collect(iter)` to get a array (and that it is possible to write polymorphic code that takes LinSpace types and uses them like Arrays … but this hurts my small brain). But I find I that I often want to write code that uses an actual array and having to use `collect` all the time seems like a serious wart for an otherwise stunning language for science. (https://github.com/JuliaLang/julia/issues/9637 gives the evolution I think of making these iterators)

 

For example recently the following code was posted/refined on this mailing list:

 

function Jakes_Flat( fd, Ts, Ns, t0 = 0, E0 = 1, phi_N = 0 )

# Inputs:

#

# Outputs:

  N0  = 8;                  # As suggested by Jakes

  N   = 4*N0+2;             # An accurate approximation

  wd  = 2*pi*fd;            # Maximum Doppler frequency

  t   = t0 + [0:Ns-1;]*Ts;

  tf  = t[end] + Ts;

  coswt = [ sqrt(2)*cos(wd*t'); 2*cos(wd*cos(2*pi/N*[1:N0;])*t') ]

  temp = zeros(1,N0+1)

  temp[1,2:end] = pi/(N0+1)*[1:N0;]'

  temp[1,1] = phi_N

  h = E0/sqrt(2*N0+1)*exp(im*temp ) * coswt

  return h, tf;

end

 

From <https://groups.google.com/forum/#!topic/julia-users/_lIVpV0e_WI>

 

Notice all the horrible [<blah>;] notations to make these arrays … and it seems like the devs want to get rid of this notation as well (which they should it is way too subtle in my opinion). So imagine the above code with `collect` statements. Is this the way people work? I find the `collect` statements in mathematical expressions to really break me out of the abstraction (that I am just writing math).

 

I get that this could be written as an explicit loop, and this would likely make it faster as well (man I love looping in Julia). That being said in this case I don't find the vectorized version a performance issue, rather I prefer how this reads as it feels closer to the math to me.

 

So my question: what is the Juilan way of making explicit arrays using either `range (:)` or `linspace`? Is it to pollute everything with `collect`? Would it be worth having versions of linspace that return an actual array? (something like alinspace or whatnot)


Thanks for any tips, comments etc

Blake Johnson

unread,
Oct 21, 2015, 7:47:10 AM10/21/15
to julia-users
It doesn't directly answer your question, but one thought is to not force many of these ranges to become vectors. For instance, the line
t = t0 + [0:Ns-1;]*Ts;

could also have been written
t = t0 + (0:Ns-1)*Ts;

and that would still be valid input to functions like sin, cos, exp, etc...

Mauro

unread,
Oct 21, 2015, 7:57:34 AM10/21/15
to julia...@googlegroups.com
I'd argue that this should work:

julia> rand(4,4)*(1:4)
ERROR: MethodError: `A_mul_B!` has no method matching A_mul_B!(::Array{Float64,1}, ::Array{Float64,2}, ::UnitRange{Int64})

i.e. ranges should be equivalent to column vectors. But others more
knowledgeable on the linear algebra code may have a different opinion.
If you don't get more feedback here then please file an issue.

Not sure whether there should also be a row-range, instead of:

julia> (1:4)'
1x4 Array{Int64,2}:
1 2 3 4

Christoph Ortner

unread,
Oct 21, 2015, 8:14:36 AM10/21/15
to julia-users
Nice to see this is coming up again. (hopefully many  more times). There seems to be a small group of people (me included) who are really bothered by this. Here is the link to the last discussion I remember:

Jonathan Malmaud

unread,
Oct 21, 2015, 8:59:21 AM10/21/15
to julia-users
Gabriel, I rewrote your code to not ever explicitly convert ranges to arrays and it still works fine. Maybe I'm not quite understanding the issue?

function Jakes_Flat( fd, Ts, Ns, t0 = 0, E0 = 1, phi_N = 0 )
# Inputs:
#
# Outputs:
  N0  = 8;                  # As suggested by Jakes
  N   = 4*N0+2;             # An accurate approximation
  wd  = 2*pi*fd;            # Maximum Doppler frequency
  t   = t0 + (0:Ns-1)*Ts;
  tf  = t[end] + Ts;
  coswt = [ sqrt(2)*cos(wd*t'); 2*cos(wd*cos(2*pi/N*(1:N0))*t') ]
  temp = zeros(1,N0+1)
  temp[1,2:end] = pi/(N0+1)*(1:N0)'
  temp[1,1] = phi_N
  h = E0/sqrt(2*N0+1)*exp(im*temp ) * coswt
  return h, tf;
end
On Wednesday, October 21, 2015 at 3:11:44 AM UTC-4, Gabriel Gellner wrote:

Gabriel Gellner

unread,
Oct 21, 2015, 11:38:39 AM10/21/15
to julia-users
No that is a good point. Often you can use an iterator where an explicit array would also work. The issue I guess is that this puts the burden on the developer to always write generic code that when you would want to accept an Array you also need to accept a iterator like LinSpace. 

Maybe this is easier to do than I currently understand? If not for regular scientists like myself I find that this would force me to make my functions far more complex than I generally would do in practice. I would often just write a signature like func(x::Vector) when I want my code to accept of Vector, but if I pass a LinSpace object to this I get a type error. I like this type of coding as it fits my model of what I want to do. The contortions I would need to do for something as basic as having a nice way to get a linear array of floating point numbers to a simple function that wants to accept a Vector seems like a wart to me. Am I missing something simple?

For the builtins clearly this is not the case which is nice, but it only masks the issue of how this would be used by regular users in my opinion.

Spencer Russell

unread,
Oct 21, 2015, 11:50:32 AM10/21/15
to julia...@googlegroups.com
On Wed, Oct 21, 2015, at 11:38 AM, Gabriel Gellner wrote:
No that is a good point. Often you can use an iterator where an explicit array would also work. The issue I guess is that this puts the burden on the developer to always write generic code that when you would want to accept an Array you also need to accept a iterator like LinSpace. 
 
Maybe this is easier to do than I currently understand?
 
In general you can just make your function accept an AbstractVector (or more generally AbstractArray) and things will just work. If there are places where Ranges, LinSpaces, etc. aren't behaving as proper arrays then I think those are generally good candidates for Issues (or even better, PRs).
 
As a related aside:
For efficiency you'll still want to use a concrete Vector field if you're defining your own types, but luckily there seems to be a convert method defined so you can do:
 
type MyArr{T}
    x::Vector{T}
end
 
a = MyArr{Int64}(1:4)
 
And it will auto-convert the range to an array for storage in your type.
 
-s

Tom Breloff

unread,
Oct 21, 2015, 11:50:43 AM10/21/15
to julia-users
Actually it is that easy... you just have to get in the habit of doing it:


julia> function mysum(arr::Array)
           s = zero(eltype(arr))
           for a in arr
               s += a
           end
           s
       end
mysum (generic function with 1 method)

julia> mysum(1:10)
ERROR: MethodError: `mysum` has no method matching mysum(::UnitRange{Int64})

julia> mysum(collect(1:10))
55

julia> function mysum_generic(arr::AbstractArray)
           s = zero(eltype(arr))
           for a in arr
               s += a
           end
           s
       end
mysum_generic (generic function with 1 method)

julia> mysum_generic(1:10)
55

julia> mysum_generic(collect(1:10))
55


For something like this, you never need allocate space for the array.

Jonathan Malmaud

unread,
Oct 21, 2015, 11:57:08 AM10/21/15
to julia-users
Just to add to Spencer's answer: Is there a particular reason to have your function arguments have type annotations at all in the function definition? You could just write 

function f(x)
  y= x[3:5] # or whatever
  z = length(x)
end

and now someone could call f with any kind of object that supports indexing and "length" and it will work. This is "duck-typing", if you're familiar with that term, and is the dominant paradigm in Julia precisely since it makes generic programming easier.

Steven G. Johnson

unread,
Oct 21, 2015, 11:59:47 AM10/21/15
to julia-users


On Wednesday, October 21, 2015 at 3:11:44 AM UTC-4, Gabriel Gellner wrote:

I find the way that you need to use `linspace` and `range` objects a bit jarring for when you want to write vectorized code, or when I want to pass an array to a function that requires an Array. I get how nice the iterators are when writing loops and that you can use `collect(iter)` to get a array (and that it is possible to write polymorphic code that takes LinSpace types and uses them like Arrays … but this hurts my small brain). But I find I that I often want to write code that uses an actual array and having to use `collect` all the time seems like a serious wart for an otherwise stunning language for science.


Ranges are actual arrays; they are subtypes of AbstractVector.    Any vectorized operation in the standard library that works for Array but not for Range is probably a bug, because it won't work for other array types either.

Steven G. Johnson

unread,
Oct 21, 2015, 12:05:06 PM10/21/15
to julia-users


On Wednesday, October 21, 2015 at 11:57:08 AM UTC-4, Jonathan Malmaud wrote:
Just to add to Spencer's answer: Is there a particular reason to have your function arguments have type annotations at all in the function definition? You could just write 

function f(x)
  y= x[3:5] # or whatever
  z = length(x)
end

and now someone could call f with any kind of object that supports indexing and "length" and it will work. This is "duck-typing", if you're familiar with that term, and is the dominant paradigm in Julia precisely since it makes generic programming easier.

I agree that duck typing is often a good practice, but there are three good reasons to declare argument types:

* Correctness: the code might work but give unexpected results if you pass the wrong types.   e.g. fib(n) = n < 2 ? one(n) : fib(n-1)+fib(n-2) is a function that computes the Fibonacci numbers only for integers — it gives an answer for floating-point n, but the answer is probably not what you want.

* Clarity: sometimes it is a useful hint to the caller if you indicate the expected type.  

* Dispatch: you want to do different things for different types, so you use the argument types as a filter to indicate which methods should work when.

However, in all cases the trick is to declare the widest applicable argument type.   e.g. use fib(n::Integer), not fib(n::Int), so that any integer type will be accepted, rather than the concrete type Int.

In the case of functions accepting vectors, you should almost always declare the type as AbstractVector or AbstractArray, not Vector or Array.   That will let you handle any array-like type.   In particular, ranges are a subtype of AbstractVector.

Steven G. Johnson

unread,
Oct 21, 2015, 12:08:09 PM10/21/15
to julia-users
On Wednesday, October 21, 2015 at 11:50:32 AM UTC-4, Spencer Russell wrote: 
For efficiency you'll still want to use a concrete Vector field if you're defining your own types, but luckily there seems to be a convert method defined so you can do:
 
type MyArr{T}
    x::Vector{T}
end

Instead, you can do

type MyArr{V:<AbstractVector}
    x::V
end

and it will allow any vector type (including ranges) but will still be fast (i.e. every instance of MyArr will have a concrete type for V).

Jonathan Malmaud

unread,
Oct 21, 2015, 12:14:48 PM10/21/15
to julia-users
All true. 

One tip: if you want to find out what abstract type is a parent to all the concrete types you're interested in, you can use the "typejoin" function. 

julia> typejoin(Array, LinSpace) 
AbstractArray{T,N}

julia> typejoin(Vector,LinSpace)
AbstractArray{T,1}

Spencer Russell

unread,
Oct 21, 2015, 12:16:49 PM10/21/15
to julia...@googlegroups.com
Ah yes, even better! Thanks.
 
-s

DNF

unread,
Oct 21, 2015, 1:03:05 PM10/21/15
to julia-users
There is no need for doing collect or [  ;] most of the time. Whenever you use a range as input to a vectorized function, like sin, it returns a vector as expected.

This code should do the same as the one you posted:

function jakes_flat(fd, Ts, Ns, t0 = 0.0, E0 = 1.0, phi_N = 0.0)
    N0 = 8
    N = 4N0 + 2
    wd = 2π * fd
    t = t0 + (0:Ns-1) * Ts
    tf = t[end] + Ts
    coswt = [√2cos(wd*t)' ; 2cos(wd*cos(2π*(1:N0)/N)*t')]'
    temp = [phi_N; π*(1:N0)/(N0+1)]
    h = E0/√(2N0+1) * coswt * exp(im*temp)
    return (h, tf)
end


DNF

unread,
Oct 21, 2015, 1:03:05 PM10/21/15
to julia-users
The general advice would be: don't make explicit arrays. Treat the ranges as arrays, and explicit arrays will be made for you automatically when you use them in expressions.

In some cases you may have to use collect(), but that would be the exception rather than the rule, and would probably raise a pretty obvious error anyway.

DNF

unread,
Oct 21, 2015, 1:12:47 PM10/21/15
to julia-users
Bleh. I submitted my answer many hours ago, but since it was my first post ever, it had to go through an approval process. So now I look like some johnny-come-lately with my code :/

Mauro: This looks a lot like a bug to me. For example:
    julia> ones(4)' * (1:4)
    1-element Array{Float64,1}:
    10.0
But
    julia> x = ones(4)'; x * (1:4)
    ERROR: MethodError: `A_mul_B!` has no method matching 
    A_mul_B!(::Array{Float64,1}, ::Array{Float64,2}, ::UnitRange{Int64})

They should clearly give the same result. There is something seriously wrong with how ones(1,4)*(1:4) is dispatched, it even tries to call the mutating 3-arg version, A_mul_B!(!!)

If you feel strongly that you need a concrete Vector you can define
linvec(xs...) = collect(linspace(xs...))
and use that instead.

On the other hand, I do wish that Julia had kept the name 'linrange' and then defined linspace to be the vector version.

Gabriel Gellner

unread,
Oct 21, 2015, 2:38:56 PM10/21/15
to julia-users
Wow! Thanks everyone for all the advice!!! Super helpful. I now see that it is super easy to deal with the LinSpace objects. 

That being said I guess I get scared when the docs tell me to use concrete types for performance ;) Most of the code I write for myself is working with Float64 arrays a la Matlab. I am comfortable with duck typing and what not but one of the things that drew me to Julia (vs lets say Mathematica) is how in general the type system feels easy to reason about. All the layers of indirection can scare me as it makes it harder for me to understand the kinds of performance problems I might face. I like sending concrete array types for my own code as it often catches bugs (when I maybe used an int literal and this accidentally created a int array when i really wanted a float, potentially leading to horrible performance cascades as all my generic code starts doing the wrong thing...).

I guess really what this comes down to is the point made by DNF that changing linspace to an iterator when that name means an array in Matlab and python is not the path of least surprise. It feels to me like this is a strange inconsistency as we could just as easily return other common functions, like `rand` to be an AbstractArray like type, but this would suck as it does in this case. I guess it all comes down to wishing that, like DNF suggested, we had a proper name for linspace and the iterator version was a new name. I guess my way forward will likely be to define my own linarray() function, sadly this will make my code a bit confusing to my matlab friends ;)

Again thanks though. I really have learned a crazy bunch about the generic type system.

Gabriel Gellner

unread,
Oct 21, 2015, 2:52:20 PM10/21/15
to julia-users
Oh man thanks for this link. Makes me feel better that I am not alone in feeling this pain. This is really the first *wart* I have felt in the language decisions in Julia. Forcing iterators as the default such a common array generation object feels so needless. Calling any code that asks for a concrete array instead of a AbstractArray also feels dangerous... especially when at first approximation almost all the builtin basic array generation functions returns concrete arrays.

Jonathan Malmaud

unread,
Oct 21, 2015, 3:09:29 PM10/21/15
to julia...@googlegroups.com
It’s still hard for me to understanding what the value of returning an array is by default. 

By getting a structured LinSpace object, it enables things like having the REPL print it in a special way, to optimize arithmetic operations on it (so that adding a scalar to a LinSpace is O(1) instead of O(N) in the length of the space), to inspect its properties by accessing its fields, etc. And on top of that, it can be used transparently in virtually all expressions where a concrete array can be used. It’s not like Python where iterators are generally going to be much slower and clunkier to work with than a reified Numpy array. 

The only downside really is if your arguments are explicitly and unnecessarily typed to expect an Array, which is not a great habit to get into no matter what linspace returns.

Not trying to be argumentative or dismissive here- just trying to understand. I would think that if one of your motivations for getting into Julia is the rich type system compared to Matlab, you’d be happy that Julia isn’t forced to discard semantic information from operations like linspace as a result of only only raw numeric vectors being first-class constructs in the language (as in Matlab and Numpy).

Gabriel Gellner

unread,
Oct 21, 2015, 3:19:30 PM10/21/15
to julia-users
Continuing to think about all the ideas presented in this thread. It seems that the general advice is that almost all functions should at first pass be of "Abstract" or untyped (duck typed) versions. If this is the case why is Abstract not the default meaning for Array? Is this just a historical issue? This feels like the language design is sort of fighting this advice and instead it should have been that we have Array meaning AbstractArray and Array meaning something like ConcreteArray to put the incentive/most natural way to add types. Similar for Vector, Matrix etc.

I guess I find this idea that full genericity is the correct way to do things to be a bit at odds with how the language coaxes you to do things (and the general discussion of performance in Julia). Is this a more recent feeling? Did Julia start out being more about concrete types and template like generic types? This would explain the linspace vs logspace and all other basic array creating functions (ones, zeros, rand etc) and the default names for many types vs the "Abstract" prefixed ones.

Thanks for all the insight.

DNF

unread,
Oct 21, 2015, 3:28:10 PM10/21/15
to julia-users
I don't really agree that it is a wart. Because the Ranges have such nice behaviours, I think it is reasonable to let them be the default. People should be encouraged to use and get familiar with these very elegant data structures. If (0:N) returned Array{Int, 1}, then that is what everyone would use. I am just a bit confused as to why 'linrange' was changed to 'linspace'.

I think perhaps the naming of the types are contributing a bit to the confusion. If what you want is an array of floats, then it seems reasonable to declare Array{Int,1}, because what can be more natural-sounding? But you probably don't care whether it's a DenseArray, SparseArray, SubArray, Range or whatever. It's just that 'Array' sounds incredibly general. It's a bit odd to me that Array is a subtype of DenseArray, and not the other way around.

'AbstractArray' on the other hand, sounds kind of obscure and spooky. I think the mental hurdle would be lower if AbstractArray was called 'Array'. (I also find that 'String' sounds more general than 'AbstractString'...)

As for the performance of abstract types, I am a bit on thin ice, but I think what you want to avoid is to have an Array with mixed concrete element types. There is no problem with writing your functions for any subtype of an abstract array type as such, as long as the instances of that type have elements of a single concrete type.

Jonathan Malmaud

unread,
Oct 21, 2015, 3:29:11 PM10/21/15
to julia...@googlegroups.com
On Oct 21, 2015, at 3:19 PM, Gabriel Gellner <gabriel...@gmail.com> wrote:

Continuing to think about all the ideas presented in this thread. It seems that the general advice is that almost all functions should at first pass be of "Abstract" or untyped (duck typed) versions. If this is the case why is Abstract not the default meaning for Array? Is this just a historical issue? This feels like the language design is sort of fighting this advice and instead it should have been that we have Array meaning AbstractArray and Array meaning something like ConcreteArray to put the incentive/most natural way to add types. Similar for Vector, Matrix etc.
Yes, that’s a historical thing. I too wish the names had been Array and ConcreteArray. Possibly (fingers crossed) by the time of 1.0 something like that will happen. 


I guess I find this idea that full genericity is the correct way to do things to be a bit at odds with how the language coaxes you to do things (and the general discussion of performance in Julia). Is this a more recent feeling? Did Julia start out being more about concrete types and template like generic types? This would explain the linspace vs logspace and all other basic array creating functions (ones, zeros, rand etc) and the default names for many types vs the "Abstract" prefixed ones.
On the contrary - if you ever have some time, Jeff (one of the Julia creators)’s thesis (https://github.com/JeffBezanson/phdthesis/blob/master/main.pdf) is surprisingly accessible and explains Julia’s motivations was about generic programming from the beginning. 

But the notion specifically that the functions inherited from Matlab should return special types instead of arrays which retain some semantic knowledge of the operation that created them is recent and made possible by advances in the initial Julia compiler (especially the introduction of immutable types) that allow for use of lightweight types with zero performance or memory overhead.

DNF

unread,
Oct 21, 2015, 3:32:23 PM10/21/15
to julia-users
I see that we are thinking the same way here :) I understand that there has been a push toward renaming abstract types AbstractXXX. Unless all abstract types are going to get the 'Abstract' prefix, I don't quite understand this.

Gabriel Gellner

unread,
Oct 21, 2015, 3:38:07 PM10/21/15
to julia-users
I have no issue with the LinSpace object, I simply do not see why it is the special case for this kind of object (I imagine the choice was made since it was seen to be used mainly for looping vs being used for array creation like similar functions logspace, zeros, ones, etc). If the iterator version is so good I don't see why all Vectors are not returned as this type for all the reasons you mention. In the current state where only linspace returns this kind of special polymorphic type it simply breaks any feeling of consistency in the language. I so something like x = zeros(10) and I get an array great. the next line I do y = linspace(0, 5, 10) I get a new fangled iterator object. They work the same but how do I get an iterator version of zeros? etc. It is a pedantic point but so is special casing this super common function to mean something that is not consistent with all other languages that use it. Which would be fine if when I did something like sin(linspace(0, 5, 100)) I got back an iterator but I don't. This abstraction is not percolated thru other functions, further giving the feeling of a needless special case in the language writ large. They simply get converted to concrete types for many common functions, when if this is done will vary from function to function, with little semantic reasoning. My feeling is that if what people say is true than why is the default numeric array not have the iterator semantics. As it is linspace is made special for no reason other than it is assumed it will be used for looping.

People want to argue that it is more elegant and takes advantage of what makes Julia powerful, which I get, but then why not go all in on this? Mathematica does this. Everything is basically a List which nice behavior for vectors matrices etc. I have no issue with this kind of elegance, but it is rough when the abstraction is inconsistent (the type is not truly perserved in most functions). As people have mentioned clearly logspace should be a special object ... but when does this stop? I dislike that this feels so arbitrary ... and as a result it is jarring when it turns up. The fact that it is polymorphic is only one kind of consistency.

Jonathan Malmaud

unread,
Oct 21, 2015, 3:40:04 PM10/21/15
to julia...@googlegroups.com
It was motivated by consistency with strings. Initially there was Array <: AbstractArray, but ASCIIString <: String. So to be consistent between those, you have the choice of renaming things to get either

1) ConcreteArray <: Array and ASCIIString <: String or
2) Array <: AbstractArray and ASCIIString <: AbstractString

Since 1) would be a really difficult change to make because of all the existing code, 2) was chosen. 

DNF

unread,
Oct 21, 2015, 3:51:00 PM10/21/15
to julia-users
I understand. I'm just wondering why consistency between those two cases is particularly important, when there are lots of other parts of the type hierarchy where supertypes do not have the Abstract prefix. Is that the goal, that all abstract types should have an 'Abstract' prefix to their names?

DNF

unread,
Oct 21, 2015, 3:55:35 PM10/21/15
to julia-users
The reason why not all arrays can be iterators is that in general arrays can not be 'compressed' like that. A linear range can be compressed to: a start value, an increment, and a length, making it incredibly lightweight. Doing this for sin() is not that easy. Doing it for rand() is simply Impossible.

Jonathan Malmaud

unread,
Oct 21, 2015, 3:56:09 PM10/21/15
to julia...@googlegroups.com
Well, those need the ‘Abstract’ prefix to distinguish them from their concrete counterpart. There was already “Array” in the language, so what are you gong to call the type that is array-like but not literally an array? 

But take another common - there is the abstract type “IO", who subtypes include IOStream and IOBuffer. There isn’t a need for “AbstractIO” because there is no “ConcreteIO” to disambiguate it from - there isn’t a concept of what a “ConcreteIO” would even be. 

DNF

unread,
Oct 21, 2015, 4:03:27 PM10/21/15
to julia-users
I was thinking specifically of AbstractString, not AbstractArray. I just don't quite get why it had to be made consistent with Array/AbstractArray, when String was a nice general name, and it would be consistent with the IO/IOStream/IOBuffer example that you mention.

Gabriel Gellner

unread,
Oct 21, 2015, 4:07:23 PM10/21/15
to julia-users
That doesn't feel like a reason that they can't be iterators, rather that they might be slow ;) a la python. My point is not about speed but the consistency of the language. Are there many cases in Julia where there is a special type like this because it is convenient/elegant to implement? This feels like a recipe for madness, my guess is that this would be crazy rare.

People wondered why people might mind that we get a LinSpace object vs an Array. For me it is this strange feeling that I am getting a special case that doesn't feel well motivated other than there is a nice way to implement it (and that people, again, assumed that it would largely be used for looping). If not all things can be made consistently iterators when they are vector-like then why not have a special function that returns this special type (like your aforementioned linrange)? The fact that I lose my iterator when I use certain functions but not others is a way that this polymorphism that everyone is describing doesn't feel as nice to me, since it will not compose in cases where it likely should, outside of implementation details.

Stefan Karpinski

unread,
Oct 21, 2015, 4:12:34 PM10/21/15
to Julia Users
The types Vector, Matrix and Tensor were abstract originally. The Array type was the concrete implementation of Tensor. Later on we introduced DenseVector and DenseMatrix as aliases for Array{T,1} and Array{T,2}. This arrangement ended up being kind of confusing (by which I mean that Viral kept using the wrong one), so we eventually just changed the names so that it did what he was expecting.

[That first commit is pretty trippy – that is some ancient proto-Julia right there.]

Fundamentally, I think there's a tension between dispatch, where you want to use the most abstract type you can, and locations (element and field types of containers and types), where you want to use the most concrete type you can. AbstractArray does seem to long to type for dispatch, but having Array be an abstract type seems like a terrible gotcha for locations.

One possible naming scheme could be to follow the example of Int and Integer and have Vec, Mat, Arr be the concrete types and Vector, Matrix and Array be the abstract types. I'm really not sure this would be worth the trouble at this point or if we really want the AbstractArray names to be any shorter.

Back when I made the above rename and introduced AbstractArray, et al., part of the reasoning was that if you're going to be writing generic code, you should probably stop and think a little about how generic your code really is and what abstractions it applies to. Is it applicable any kind of abstract vector? Or is it actually applicable to anything iterable? Or should it really only be used with arrays that support fast linear indexing. I think that line of reasoning is as valid as ever: writing generic code is not easy and you have to think a bit about it. That burden seems acceptable for the library writer, but in that case the burden of typing the prefix "Abstract" here and there seems acceptable as well.

Stefan Karpinski

unread,
Oct 21, 2015, 4:16:41 PM10/21/15
to Julia Users
On Wed, Oct 21, 2015 at 4:07 PM, Gabriel Gellner <gabriel...@gmail.com> wrote:
That doesn't feel like a reason that they can't be iterators, rather that they might be slow ;) a la python. My point is not about speed but the consistency of the language.

How do you propose making arbitrary arrays into iterators? 

DNF

unread,
Oct 21, 2015, 4:20:56 PM10/21/15
to julia-users
I tried to find this previously, but failed until now: https://github.com/JuliaLang/julia/pull/8872

That's the pull request for the String -> AbstractString renaming. Even though I may not completely agree, this explains a lot about the thinking behind the renaming.

Spencer Russell

unread,
Oct 21, 2015, 4:22:20 PM10/21/15
to julia...@googlegroups.com
On Wed, Oct 21, 2015, at 04:07 PM, Gabriel Gellner wrote:
That doesn't feel like a reason that they can't be iterators, rather that they might be slow ;) a la python. My point is not about speed but the consistency of the language. Are there many cases in Julia where there is a special type like this because it is convenient/elegant to implement? This feels like a recipe for madness, my guess is that this would be crazy rare.
 
People wondered why people might mind that we get a LinSpace object vs an Array. For me it is this strange feeling that I am getting a special case that doesn't feel well motivated other than there is a nice way to implement it (and that people, again, assumed that it would largely be used for looping). If not all things can be made consistently iterators when they are vector-like then why not have a special function that returns this special type (like your aforementioned linrange)? The fact that I lose my iterator when I use certain functions but not others is a way that this polymorphism that everyone is describing doesn't feel as nice to me, since it will not compose in cases where it likely should, outside of implementation details.
 
Can you clarify with an example of when you lose the iterator? IMO that would be an example of breaking the AbstractArray contract and worthy of fixing.
 
I see your point that there are other functions that return array-like objects that could also be implemented without fully constructing the array, e.g. `zeros(20)` could return a `Zeros{20}` object that acts like a length-20 array full of zeros. As far as I know there's no compelling design reason that's a bad idea, it's just that nobody has done it.
 
The nice thing is that as long as people don't over-specify their type annotations that change can be done under-the-hood and it shouldn't break code. From a pedagogical standpoint I see how these sorts of things can add confusion and cause new folks to question their understanding, and more consistency reduces that conceptual friction.
 
-s

DNF

unread,
Oct 21, 2015, 4:23:02 PM10/21/15
to julia-users
I actually tend to think that's a pretty strong reason.

Jonathan Malmaud

unread,
Oct 21, 2015, 4:24:17 PM10/21/15
to julia...@googlegroups.com
You're making good points for sure - logspace and linspace are inconsistent wrt  return types.

But I just having trouble seeing how it impacts you as a user of the language; it's essentially an implementation detail that allows for some optimizations when performing arithmetic on array-like values known to have a certain exploitable structure (eg, uniform element spacing). 

Your own code will virtually never make explicit references to Vector or LinSpace types (except perhaps in specifying the types of fields in new types you define). If tomorrow logspace was changed to return a special type or linspace to return a plain array, your code would remain identical if function arguments were typed as AbstractVector instead of Vector.

So I can see how it could bother someone's aesthetic knowing that under the hood these inconsistencies exist (it bothers me a bit). And I can see how if you're counting on certain optimizations happening, like scalar multiplication on spaces being O(1), it would be frustrating that it will depend on if the space was generated by linspace or logspace. But is that really leading to 'madness'? 

There are other examples of these light-weight wrapper types being used, especially in the linear algebra routines. The matrix factorization functions, like qrfact, return special types, for one thing.

Gabriel Gellner

unread,
Oct 21, 2015, 5:13:48 PM10/21/15
to julia-users
First of all thank everyone for dealing with me! I find this discussion really interesting ... though I hope it is not rehashing to much basic stuff, I am trying my best to get up to speed with the Julian way. I am new to Julia, but have programmed a lot in other languages ... hence my random feelings.

Re: turning an arbitrary array into an iterator -> I guess you could simply bind the element generator a la pythons __iter__ so for sin(linspace(0, 0.5, 10)) I would simply have to store the sin function then have this call sin(getitem(_stored<LogSpace | or other iterable)) for each getitem call -- that is calls to functions that take iterables would store chained iterators. At the lamest level you could simply store the full array and return the __iter__ like interface. You get none of the benefits of memory/space savings but you preserve the "interface". I guess in general it would be some weird kind of lazy evaluation on a whole bunch of indexable types. 

But this is a distraction, and clearly stupid for performance reasons. What I really meant is that having the implementation details of having a convenient/compact linear indexing object from what many users would believe to be an array creation function feels bad to me. What is worse is that sometimes it is there (some functions might never convert out of the LinSpace as it preserves the representation) but others will instead return an Array instead.

Some feeling the type system would of course be solved by having the special printing behavior that people have mentioned is in the works for later Julia versions. If when I have a LinSpace object it looks and feels exactly like an Array (currently we at best have the latter) than it might not come to my attention. My issue is that it currently feels like having an implementation detail/optimization bleed into the user space. When I use linspace it can be strange vs just using logspace, zero etc. I guess it feels magical currently. Also it is a bug magnet for using other, less idiomatic, libraries. Yes good Julia code I get now should work on AbstractArrays, but my experience is that that is not always the case. I have used ODE.jl at times where the deep levels of type generality they go for causes very difficult error messages to understand for the end user as the calls to convert etc are chained ... LinSpace objects definitely cropped up on this, and what lead me to post. This is not likely a fault with ODE.jl, but rather as a new user sending the wrong information in the wrong way I was presented with a real drop in what I understood for what was happening.

Maybe all this is just transitional that soon LinSpace objects will always work like Arrays in all code I might use as an end user. Currently as a new user I have not had this experience. I have noticed that LinSpaces where returned, and had to learn what they were and at times run `collect` to make them into what I wanted. I have not felt this abstraction bleed yet in other areas of Julia.

Stefan Karpinski

unread,
Oct 21, 2015, 5:20:25 PM10/21/15
to Julia Users
The whole notion of always using a single dense array type is simply a non-starter. Do we just scrap the whole sparse array thing? There goes half of the stuff you might want to do in linear algebra or machine learning. Special matrix types like Diagonal or UpperTriangular, etc.? Toss those out too. Various types of ranges are just a different form of compactly representing arrays with special structure. How come no one is complaining that 1:n is not a dense vector?

Stefan Karpinski

unread,
Oct 21, 2015, 5:25:17 PM10/21/15
to Julia Users
On Wed, Oct 21, 2015 at 5:13 PM, Gabriel Gellner <gabriel...@gmail.com> wrote:
Maybe all this is just transitional that soon LinSpace objects will always work like Arrays in all code I might use as an end user. Currently as a new user I have not had this experience. I have noticed that LinSpaces where returned, and had to learn what they were and at times run `collect` to make them into what I wanted. I have not felt this abstraction bleed yet in other areas of Julia.

I think this is exactly what's happening. The LinSpace type is hitting code that should be abstracted in a lot of places but hasn't yet been. You'd get the exact same issues if you'd passed a FloatRange object into those functions. So the LinSpace type is really doing us all a favor by forcing more Julia libraries to the right level of abstraction.

Gabriel Gellner

unread,
Oct 21, 2015, 6:32:05 PM10/21/15
to julia-users
Okay so I am starting to see the light. I see that if LinSpace becomes fully replacable for an Array it is fine.

Final Questions:

* I can't use LinSpace in matrix mult A * b::LinSpace, is this simply a Bug/Missing Feature? Or intentional? In general if basic builtin functions that operate on Array{Float64} types don't accept LinSpace objects is this something to be fixed? Or do we make special assumptions about the meaningfulness of such operations?
* LinSpace objects seem much slower when used in things like element multiplication A .* b::LinSpace is much much Slower than A .* b::Array, is this to be expected (the cost of the extra abstraction lack of required denseness) or simply a side effect of lack of optimization?
* In general if I make my code accept AbstractArray{Float64} in general should I expect a performance penalty when calling the function with Array{Float64} parameters?

thanks

Tim Holy

unread,
Oct 21, 2015, 7:29:28 PM10/21/15
to julia...@googlegroups.com
On Wednesday, October 21, 2015 03:32:04 PM Gabriel Gellner wrote:
> * I can't use LinSpace in matrix mult A * b::LinSpace, is this simply a
> Bug/Missing Feature?

Yes

> * LinSpace objects seem much slower when used in things like element
> multiplication A .* b::LinSpace is much much Slower than A .* b::Array, is
> this to be expected (the cost of the extra abstraction lack of required
> denseness) or simply a side effect of lack of optimization?

It's because we can't yet eliminate bounds-checks with @inbounds on anything
except Arrays.

> * In general if I make my code accept AbstractArray{Float64} in general
> should I expect a performance penalty when calling the function with
> Array{Float64} parameters?

None whatsoever. The only place that's a bad idea is in declarations of type-
fields, see http://docs.julialang.org/en/latest/manual/performance-tips/#avoid-fields-with-abstract-type.

--Tim

Gabriel Gellner

unread,
Oct 21, 2015, 7:33:54 PM10/21/15
to julia-users
Isn't the situation actually similar to functions like `eye` which returns a dense array (which doesn't feel intuitive for what an eye or diagonal matrix actually is) whereas we have to call the special version seye to get a sparse array. For most users linspace is like `eye` by default suggesting the dense version (ie from reasoning about other languages), whereas the other names I have heard would be like seye, ie linrange. What we currently lack is top level functions that let us choose which version we want like what we have for the sparse vs dense arrays. The current situation is actually exposing less of the type diversity you mention. It feels like following the dense vs sparse array stuff we would want a similar symmetry for LinSpace vs Array creation.

Gabriel Gellner

unread,
Oct 21, 2015, 7:34:34 PM10/21/15
to julia-users
Sweetness. Thank you. AbstractArray it is for me!

Christoph Ortner

unread,
Oct 22, 2015, 3:11:05 AM10/22/15
to julia-users
So here is a reason for keeping the linespace a vector:

julia> t = linspace(0, 1, 1_000_000);
julia> s = collect(t);
julia> @time for n = 1:10; exp(t); end
  0.209307 seconds (20 allocations: 76.295 MB, 3.42% gc time)
julia> @time for n = 1:10; exp(s); end
  0.054603 seconds (20 allocations: 76.295 MB, 17.66% gc time)
julia> @time for n = 1:10; AppleAccelerate.exp(s); end
  0.016640 seconds (40 allocations: 76.295 MB, 31.64% gc time)
julia> @time for n = 1:10; AppleAccelerate.exp!(s,s); end
  0.005702 seconds

Now the natural response will be to say that most of the time I won't care so much about performance, and when I do, then I can go optimise. But in truth the same can be said about keeping linspace abstract just because it saves memory. (obviously it is not faster!) 

I think the argument for abstract types is very strong, but (in my personal view) not at the expense of expected behaviour.

Christoph


Christoph Ortner

unread,
Oct 22, 2015, 3:12:22 AM10/22/15
to julia-users
P.S.: It occurred to me that I should also have tried this:

@time for n = 1:10; AppleAccelerate.exp(collect(s)); end
  0.041035 seconds (60 allocations: 152.589 MB, 22.94% gc time)

Andras Niedermayer

unread,
Oct 22, 2015, 5:24:50 AM10/22/15
to julia-users
You're making a good point about an Array being sometimes faster than a LinSpace. But a LinSpace gets you a factor N improvement in terms of memory efficiency for a size N range, an Array only gets you a constant factor improvement in speed (the factor 15 being admittedly relatively large in this example).

Memory efficiency typically matters more for usability in an exploratory interactive session: if my Julia session needs 5 GB RAM, a factor 3 increase of memory will crash my computer. If my code runs for 10 seconds in an interactive session, 30 seconds is mildly annoying, but not a deal breaker. (Obviously, you can construct different examples with memory/time where this is different. But my point is that inconvenience changes discontinuously in memory usage.)

Milan Bouchet-Valat

unread,
Oct 22, 2015, 6:14:17 AM10/22/15
to julia...@googlegroups.com
A related discussion is about a special Ones type representing an array
of 1, which would allow efficient generic implementations of
(non-)weighted statistical functions:
https://github.com/JuliaStats/StatsBase.jl/issues/135

But regarding zeros(), there might not be any compelling use case to
return a special type. Anyway, if arrays are changed to initialize to
zero [1], that function go could away entirely.


Regards


1: https://github.com/JuliaLang/julia/issues/9147

Gabriel Gellner

unread,
Oct 22, 2015, 9:44:28 AM10/22/15
to julia-users
A related discussion is about a special Ones type representing an array
of 1, which would allow efficient generic implementations of
(non-)weighted statistical functions:
https://github.com/JuliaStats/StatsBase.jl/issues/135

But regarding zeros(), there might not be any compelling use case to
return a special type. Anyway, if arrays are changed to initialize to
zero [1], that function go could away entirely

lol never thought of this kind of special case. You could simply have a "SameVector" object that just stores the value and the length. * + - ^ would be easy to define and the space/(# of operations) savings could be massive ;). With this you would have no need to special case zeros(n) either just have it return a SameVector with value 0. We could even generalize it to "BlockVector" for sequences of same values storing start and stop locations.

Tom Breloff

unread,
Oct 22, 2015, 9:56:39 AM10/22/15
to julia-users

Tim Holy

unread,
Oct 22, 2015, 10:18:12 AM10/22/15
to julia...@googlegroups.com
I may have said this earlier, but we just need
https://github.com/JuliaLang/julia/issues/7799.

--Tim

Christoph Ortner

unread,
Oct 22, 2015, 11:15:17 AM10/22/15
to julia-users


On Thursday, 22 October 2015 10:24:50 UTC+1, Andras Niedermayer wrote:
You're making a good point about an Array being sometimes faster than a LinSpace. But a LinSpace gets you a factor N improvement in terms of memory efficiency for a size N range, an Array only gets you a constant factor improvement in speed (the factor 15 being admittedly relatively large in this example).

Memory efficiency typically matters more for usability in an exploratory interactive session: if my Julia session needs 5 GB RAM, a factor 3 increase of memory will crash my computer. If my code runs for 10 seconds in an interactive session, 30 seconds is mildly annoying, but not a deal breaker. (Obviously, you can construct different examples with memory/time where this is different. But my point is that inconvenience changes discontinuously in memory usage.)

In my domain (PDEs and related), memory usage is hardly ever an issue for one-dimensional arrays (as in I cannot for the life of me think of a case where it is). I don't know about other domains of course.

Christoph

Christoph Ortner

unread,
Oct 22, 2015, 11:17:05 AM10/22/15
to julia-users
Hi Tim,

Are you saying that if one removed bounds-checking from the LinSpace iterator, my code below would be much faster? 
(a) Why would there even have to be a bounds check in such an iterator?
(b) That still won't let me use AppleAccelerate, which I suspect requires a plain old array.

Christoph

Andras Niedermayer

unread,
Oct 25, 2015, 9:17:31 AM10/25/15
to julia-users
Memory usage can be an issue if you're simply iterating over a long range, e.g.

julia> function pi_half_montecarlo(n)
         mysum
= 0
         
for x in linrange(0,1,n)
           y
= rand()
           
if x*x + y*y < 1
             mysum
+= 1
           
end
         
end
         
return 4.0*mysum/n
       
end
pi_half_montecarlo
(generic function with 1 method)


julia
> pi_half_montecarlo(10); @time pi_half_montecarlo(1000_000_000)/pi
elapsed time
: 17.980499945 seconds (112 bytes allocated)
1.0000009913475585



This would use 8 GB of RAM if one were to write linspace in Julia 0.3 (or collect(linspace(...)) in Julia 0.4). Iterating over FloatRanges/LinSpaces may be less common than iterating over UnitRanges, but it's still not that far fetched.

The iterator vs array issue was considered important enough to break compatibility for "range" between Python 2 and Python 3.

Christoph Ortner

unread,
Oct 25, 2015, 10:05:03 AM10/25/15
to julia-users

very nice example - thank you - but I notice that you use linrange :).

Thanks, Christoph

Gabriel Gellner

unread,
Oct 25, 2015, 1:15:28 PM10/25/15
to julia-users


On Sunday, 25 October 2015 07:05:03 UTC-7, Christoph Ortner wrote:

very nice example - thank you - but I notice that you use linrange :).

Thanks, Christoph

Man I wish they had called it linrange ;) with type LinRange. Still I am at peace with the new way it works. Hopefully it will get faster in the future, and with the upcoming printing changes I think it will become a minor headache in corner cases at worst (in which case using collect will be no problem). I don't think the python range/xrange is a good comparison as that is a function that was almost always used for looping in that language so of course the iterator is the best idea. linspace is often used for array creation in languages like Matlab.

Gabriel 

Stefan Karpinski

unread,
Oct 25, 2015, 2:06:14 PM10/25/15
to julia...@googlegroups.com
This is still an option but I'm yet to be convinced that we want to have that many things exported: linrange, LinRange, and linspace which just does collect on linrange? Seems like one too many.

Gabriel Gellner

unread,
Oct 25, 2015, 2:58:06 PM10/25/15
to julia-users

On Sunday, 25 October 2015 11:06:14 UTC-7, Stefan Karpinski wrote:
This is still an option but I'm yet to be convinced that we want to have that many things exported: linrange, LinRange, and linspace which just does collect on linrange? Seems like one too many. 
 

Personally I think that `linspace` should not be exported/deprecated. Just have `linrange`  which returns the current `LinRange` (updated name ...) and have the user use collect(linrange) when needed. I just don't like the name linspace anymore as it makes me think about the function in Matlab/NumPy which caused my original discontent. (I even tried make a quick package for this, but sadly linrange is a deprecated name so I abandoned the attempt). Really the name linspace feels like Matlab baggage, it is not a name like rand or ones which are a perfect match to the  idea, regardless of the implementation. Instead it is a strange name for returning an equally spaced, dense, float array. linrange is really more descriptive in my mind.

I also prefer having the nice naming consistency with UnitRange, FloatRange. Makes me feel like these are a suite of types (like the filename linspace is defined in "ranges.jl" ;).

If this is done I think logspace -> logrange as well even though it returns an array at the moment.
Similarly if a special name is really wanted for collect(linrange) I think it should be alinrange, similarly you would then want aunitrange, afloatrange -- this to me would be similar to the seye like exports for getting different sparse types. That being said I currently wouldn't vote for this as I have seen the light on AbstractArray and the current linrange behavior. Outside of being a little slower, at the moment, in vectorized cases I like having this and seems to work fine for what I use it for most often (parameters for time series like functions or ODE solvers).

But this is pure bike shedding aesthetics. I have used R and Matlab enough to learn to deal with strange inconsistencies like this.

Scott Jones

unread,
Oct 26, 2015, 3:26:08 PM10/26/15
to julia-users
>One possible naming scheme could be to follow the example of Int and Integer and have Vec, Mat, Arr be the concrete types and Vector, Matrix and Array be the abstract types. I'm really not sure this would be worth the trouble at this point or if we really want the AbstractArray names to be any shorter.

That sounds like a quite good idea, which, if carried out completely, could eliminate some inconsistencies in the naming of abstract vs. concrete types, that have been causing people grief.

So:
Abstract         Concrete
Signed (Integer) Int*
Unsigned (Integer) UInt*
Float Flt*
Decimal Dec*
Array Arr
Vector Vec
Matrix Mat
String Str (maybe Str{Binary}, Str{ASCII}, Str{UTF8}, Str{UTF16}, Str{UTF32})

Tomas Lycken

unread,
Oct 27, 2015, 3:29:21 AM10/27/15
to julia-users

Better yet, since we already have both AbstractVector, AbstractMatrix, AbstractArray, AbstractString, AbstractFloat and a couple of others (try typing Abstract<tab> in the REPL…) it might be time to rename Integer to AbstractInteger. I have a hard time thinking the confusion between Int and Integer would be reduced just because we also had Arr and Array et al - rather, we’d have several pairs of types where it isn’t entirely obvious that one is abstract and one is concrete.

Renaming Integer to AbstractInteger would probably cause massive breakage, though, so it’d have to be done with care. The difference between Int, Int32/Int64 and Integer is well documented (see e.g. here and here), but it seems to me that people stumble on this often enough that a naming change might be well motivated anyway.
// T

DNF

unread,
Oct 27, 2015, 3:44:58 AM10/27/15
to julia-users
I think that the point of the idea was to get rid of the Abstract prefix. It can be a bit confusing that some, but not all, abstract types have the Abstract prefix, and as seen in this thread it has led to some misunderstandings.


On Tuesday, October 27, 2015 at 8:29:21 AM UTC+1, Tomas Lycken wrote:

Better yet, since we already have both AbstractVector, AbstractMatrix, AbstractArray, AbstractString, AbstractFloat and a couple of others (try typing Abstract<tab> in the REPL…) it might be time to rename Integer to AbstractInteger. I have a hard time thinking the confusion between Int and Integer would be reduced just because we also had Arr and Array et al - rather, we’d have several.

Reply all
Reply to author
Forward
0 new messages