Julia 0.5 discussions: ReshapedArrays and indexing performance

1544 views
Skip to first unread message

Tim Holy

unread,
Sep 14, 2015, 10:00:50 PM9/14/15
to julia-dev
Since there seems to be some interest in discussing strategy for 0.5 (and not
endlessly hijacking the thread on the debugger), I'll start a new thread. I've
been pretty involved in reworking julia's AbstractArray infrastructure in the
past, and I intend to keep participating, but I'm also pretty excited about
the possibilities right now in graphics. Since there's only so much time in
the day, I'll throw out my biggest concern and see whether folks have any good
ideas and/or whether someone wants to pick up a ball and run with it.

In making the transition to defaulting to array views (SubArrays and their
future cousins), I think the insufficiently-recognized elephant in the room is
`reshape`. There seems to be quite a lot of consensus that `reshape` should
return a view rather than a copy. I've demonstrated in the past that `reshape`
does not compose in a closed fashion with `sub`, and come up with a plan for a
2nd view type currently called a ReshapedArray
(https://github.com/JuliaLang/julia/pull/10507).

The kicker is performance. Because indexing a reshaped array often involves a
call to `div`, and `div` is dirt-slow, an alternative is highly desirable.
There are clever ways around its slowness
(http://www.flounder.com/multiplicative_inverse.htm) and we (Simon Kornblith,
with small extensions by me) implemented these algorithms. Unfortunately, in
preliminary tests (see #10507) they are still not nearly as fast as I'd like.
I am not sure how much of this is limitations in strategy, my implementation,
poor (improvable) codegen, or just "that's how it is." It would be interesting
to benchmark against implementations like http://libdivide.com/ and see how
we're doing.

In my opinion, this highly-focused issue is likely to result in the most
difficult decisions we have to make for Arraymagedon. If somebody wants to dive
into this and see how much can be satisfactorily resolved, it would be a big
contribution.

Best,
--Tim

Matt Bauman

unread,
Sep 15, 2015, 11:44:36 AM9/15/15
to julia-dev
I'll begin looking into this in-depth next week, but others are very welcome to dig in, too.  I've been intending to create an "Array Nirvana 2" umbrella issue to track all the things we want to accomplish for Arrays during 0.5, but haven't had a chance to do it yet.

Here's a quick random spattering of topics off the top of my head:

* User-extensible @inbounds
* ReshapedArrays (requires @inbounds, better perf - although if we get relatively close we could keep the current Array implementation and only use these guys as a fallback)
* Slices as views (requires ReshapedArrays, @inbounds)

* Drop scalar dimensions (could happen independently, relatively simple implementation-wise, unknown how breaking it will prove to be)

* NTuple/Vararg #11242
* Allow arbitrary elements through non-scalar indexing, like #12567

It'd also be cool to do some explorations into the Buffer type: #12447, but ideally that shouldn't be breaking and could happen at any point… I think.

Tim Holy

unread,
Sep 15, 2015, 4:43:55 PM9/15/15
to juli...@googlegroups.com
Yep, that's a good list. Really, I think from a technical perspective the two
big items are @inbounds and ReshapedArrays. I guess views of BitArrays could
be another stumbling block. The rest is mostly dealing with the resulting
chaos :-).

--Tim

Steven G. Johnson

unread,
Sep 21, 2015, 12:46:56 PM9/21/15
to julia-dev
On Monday, September 14, 2015 at 10:00:50 PM UTC-4, Tim Holy wrote:
The kicker is performance. Because indexing a reshaped array often involves a
call to `div`, and `div` is dirt-slow, an alternative is highly desirable.

Presumably this is not a problem for any array type that has fast linear indexing?  You don't need div to index into a reshaped Array, for example. 

Matt Bauman

unread,
Sep 21, 2015, 12:57:57 PM9/21/15
to julia-dev
The trouble is the interaction with views.  SubArrays require indexing at full dimensionality, and we'll soon have tons of them.

Simon Kornblith

unread,
Sep 26, 2015, 9:48:10 PM9/26/15
to julia-dev
The ideal approach would be some kind of optimization pass that can transform loops to eliminate the division. For the simplest case, you want to be able to transform:

for i = a:b
    i2
, i1 = divrem(i-1, m)
    i1
, i2 = i1+1, i2+1
    f
(i1, i2)
end

into:

l2, l1 = divrem(a-1, m)
u2
, u1 = divrem(b-1, m)
l1
, l2, u1, u2 = l1+1, l2+1, u1+1, u2+1
for i2 = l2:u2, i1 = (i2 == l2 ? l1 : 1):(i2 == u2 ? u1 : m)
    f
(i1, i2)
end

It may be possible to speed up fast integer division, but I don't think it will ever be able to reach this level of performance. Additionally, the per-iteration cost of fast integer division scales with the number of dimensions, whereas an appropriately transformed loop only has an add.

Simon


On Monday, September 14, 2015 at 10:00:50 PM UTC-4, Tim Holy wrote:

Tobias Grosser

unread,
Sep 27, 2015, 2:53:15 AM9/27/15
to juli...@googlegroups.com
Dear all,

this might be an interesting moment to give some status update on Polly
and its capabilities of optimizing Julia (like) code:

1) We just gained a first-version of run-time bounds-check elimination in Polly

https://github.com/llvm-mirror/polly/commit/41ce0a26932208b1d74ef7c6783f45da37f16e0e

This bound check elimination drops bounds checks as added by address sanitizer
with the option: -fsanitize=vla-bound

For Julia, we still require a small patch to LLVM's post-dominator tree. More
importantly, Julia formulates its bounds checks in a way that we can not drop them.
Specifically, Julia does not check each individual dimension, but only checks if an
access is inside the memory region allocated for an array. As the size of the
region is a polynomial expression in the array-size parameters, we can not
reason with our linear models about Julia bounds checks and can consequently
not drop them. I wonder if it would not make more sense to check each dimensionS
individually?

2) Delinearization

www.grosser.es/#pub-optimistic-delinerization

Polly is now capable of reason about array accesses as they are currently generated
by Julia (sometimes still requiring some loop-invariant-code-motion which we
currently automatize) and can give some nice speed-ups for common dense linear-algebra
kernels or stencils. The point to make here is that for Polly it is important to
know about the shape of the array at compile-time. Reshaped arrays will make it
harder for Polly to reason about them. They will break our array-recovery code (which
we might be able to fix), but to my understanding they will make it impossible to
detect the stride-one dimension of the array, which will make auto-vectorization or
any other kind of compiler optimization hard.

It seems you are currently mostly concerned about the div slowness, but I think it
is of equal importance to understand what other optimizations are hindered by this
array-shape by default approach. Does your approach hinder vectorization?

Best,
Tobias

Andy Ferris

unread,
Sep 27, 2015, 10:47:45 AM9/27/15
to julia-dev
I'd like to add an new item to the wish list for "arraymageddon", and this seemed like the best place to post it. The basic idea is a simple (and perhaps built-in) way of doing multilinear algebra (tensor contractions, einsum, or whatever you wish to call it).

In github.com/andyferris/Tensors.jl I've implemented a naming scheme for the indices of arrays that is convenient for multilinear algebra, tensor contractions, index permutations, etc. Though there are different ways of doing this (via macros, Jutho's TensorOperations.jl package, etc), the names of the indices are included in the type (as a type-Tuple{} of integers or symbols, as the user prefers) and assigned via getindex(::Type{Tuple{...}}) (and setindex!, etc). The idea was (a) remove run-time overhead associated with figuring out where indices match and (b) have persistent types that remember what they are. For (a), generated functions are used, so compile-time might be a bit slower. At the moment, the syntax is a bit annoying and thus helper macros are useful, but... there is hope:

If we use {...} as shorthand for Tuple{...} in Julia 0.5 we the nice tensor indexing via [{ ... }]. For a couple of examples

    A[{3,2,1}] = B[{1,2,3}]   # A = permutedims(B,[3,2,1])

or 

    A[{:left,:middle}] * B[{:middle,:right}]    # matrix multiplication, returning a tensor with indices {:left,:right}

which is relatively obvious, pretty terse, quite flexible and convenient for higher-order tensors, while letting the user give sensible, self-documenting names to the indices if they want.

Anyway, what do people think? Is this of broad interest? Does the way I've implemented this seem OK? 

On a general note, I strongly support using {} for Tuple{} in Julia 0.5 as it is a very convenient way of passing *compile-time* information to functions, which is something I can see myself doing more and more (and more) in Julia. Combined with generated functions, it's way more powerful than I've seen in any other programming language.

A couple of related questions ('cause I don't know where to post them).

(1) Is there much difference between Val{...} and Tuple types of a single index? If not, we could use both as {} (i.e. remove Val), and if so, use {X} for Val{X} and {X,} for Tuple{X} in analogy with tuple instances. Then people could pass {true} and {false} to functions instead of Val{true} or Val{false}.
(2) Is there going to be some fix to A += B assigning extra memory? To my use cases, this is *clearly* Julia's biggest current problem. Do we need a mutating + operator like +! (that way we can remember to only use it with mutating types)? Approaches in other languages (I'm thinking of C++) is allowing to overload the meaning of =, +=, etc and/or having a distinction between lvalues and rvalues (which help for delayed evaluation on a line-by-line level, which I feel is the least surprising version (to the programmer) of delayed evaluation).


Matt Bauman

unread,
Sep 27, 2015, 11:20:14 AM9/27/15
to julia-dev
I think that this is a great use of packages.  I think it's very valuable to allow some syntax in base to be "free" for different disciplines to use however it best suits them.  Beyond the Tuple{} to {} syntax change (which is on the table already), I'm not sure what more you need in base to better support your package.

The (ab)use of Tuples is an interesting way to lift values into the type domain, but be aware that there was already one attempt to remove it: https://github.com/JuliaLang/julia/commit/85f45974a581ab9af955bac600b90d9ab00f093b

Andy Ferris

unread,
Sep 27, 2015, 4:00:10 PM9/27/15
to julia-dev
Thanks Matt - you're right that for now Tensors.jl should remain a package (unless people think that kind of notation is just better for general consumption than the current permutedims() in Base, or something).

In addition to {} being Tuple{}, I'd also like operations for tensor sum and tensor product in Base (https://github.com/JuliaLang/julia/issues/13333).

Regarding that attempted removal you mentioned - I'm not sure what will happen if we can't create Tuple's with Int's. How could we create Array{T,N} where N is an Integer? And, if we allow Int's, why not anything isbits() (plus Symbol)? I can also do Val{(...)} - perhaps that's more "correct". I'm finding the ability to pass compile-time information to generated function the most powerful feature I've seen in a programming language. I'm sure it makes compilation a bit slower, but wow, its so much easier than C++ template metaprogramming, and we can't be removing that now.

Matt Bauman

unread,
Sep 27, 2015, 9:19:49 PM9/27/15
to julia-dev
On Sunday, September 27, 2015 at 4:00:10 PM UTC-4, Andy Ferris wrote:
Regarding that attempted removal you mentioned - I'm not sure what will happen if we can't create Tuple's with Int's. How could we create Array{T,N} where N is an Integer? And, if we allow Int's, why not anything isbits() (plus Symbol)? I can also do Val{(...)} - perhaps that's more "correct".

General type parameters are separate from Tuple{} parameters.  Tuple{} types are intended to describe the type of tuple () values.  So from a first-principles approach, they shouldn't allow non-Types since that couldn't possibly describe a tuple value.  That said, there is no other data structure in the language that supports variable number of arbitrary type parameters, so they are uniquely useful.

The ability to store bitstypes in general type parameters is definitely not going anywhere.

Andy Ferris

unread,
Sep 28, 2015, 1:45:07 AM9/28/15
to julia-dev
Of course - this makes perfect sense. So the sensible way of passing data to functions is a value tuple Val{(...)}. 

If that is so, then perhaps this is a better usage of the {} operator: {...} = Val{(...)}, or at least {T,} for Tuple{T} (like (x,) is a tuple instance) and {T} for Val{T}, thus {(...)} = Val{(...)}. (The point being to create a super-convenient way of passing compile-time arguments to functions, like f(...,Val{true}) but shorter, just using {true} without the Val)

Sisyphuss

unread,
Sep 28, 2015, 2:23:25 AM9/28/15
to julia-dev
I don't understand why reshape is a problem here.
In R, an array is nothing else than a data vector and the dim attribute. Reshape is just to modify the dim attribute.

Sisyphuss

unread,
Sep 28, 2015, 3:37:50 AM9/28/15
to julia-dev
I understand why reshape can be a problem now: reshape invalidates the previous ArrayView.

I have a solution: let ArrayView include the dim information (an ArrayView can only be defined when there exists a dimension);
reshape creates an ArrayView with full indices;
an Array is an ArrayView now, the original Array becomes DataVector.

Tim Holy

unread,
Sep 28, 2015, 5:43:12 AM9/28/15
to juli...@googlegroups.com
It's not quite that simple: see the analysis in
https://github.com/JuliaLang/julia/issues/9874#issuecomment-75979041,
especially the part about how sub and reshape compose with each other.

--Tim

Sisyphuss

unread,
Sep 28, 2015, 9:50:32 AM9/28/15
to julia-dev
You mean there's cases where some result could not be represented as view of original data vector?

Simon Kornblith

unread,
Sep 28, 2015, 1:07:54 PM9/28/15
to julia-dev
Hi Tobias,

I suspect this is an artifact of linear indexing. In Julia, if you have a 2x2x2 array X, then the expressions X[7] and X[1,4] are both legal and equivalent to X[1,2,2]. However, when the number of indices provided is equal to the size of the array, I believe all of the checks could just be simple checks on the individual dimensions, but it looks like the check for the last dimension is more complicated that necessary.

2) Delinearization

www.grosser.es/#pub-optimistic-delinerization

Polly is now capable of reason about array accesses as they are currently generated
by Julia (sometimes still requiring some loop-invariant-code-motion which we
currently automatize) and can give some nice speed-ups for common dense linear-algebra
kernels or stencils. The point to make here is that for Polly it is important to
know about the shape of the array at compile-time. Reshaped arrays will make it
harder for Polly to reason about them. They will break our array-recovery code (which
we might be able to fix), but to my understanding they will make it impossible to
detect the stride-one dimension of the array, which will make auto-vectorization or
any other kind of compiler optimization hard.

It seems you are currently mostly concerned about the div slowness, but I think it
is of equal importance to understand what other optimizations are hindered by this
array-shape by default approach. Does your approach hinder vectorization?

In cases where we can determine based on the types involved that an array can be reshaped to contiguous memory, I think we will will be able to express that in the type of the reshaped array, so LLVM/Polly would only get code where the stride has to be loaded from memory if this is not the case.

Simon

Tim Holy

unread,
Sep 28, 2015, 2:23:28 PM9/28/15
to juli...@googlegroups.com
Meaning that, if you want to present the result as a view of the original
vector, you need two different types of views (SubArray and ReshapedArray).
There's no good way to combine both into a single type. Since you can stuff one
inside the other, it should all work out, but there may be more layers than
we'd like.

--Tim

Tobias Grosser

unread,
Sep 28, 2015, 3:26:23 PM9/28/15
to juli...@googlegroups.com
Hi Simon.
Ah, thanks for explaining. Yes, per-dimension checks if possible would be
helpful here.
This sounds as if we can get the best of both worlds. So you plan to specialize
code for special layouts/shapes whenever possible? This would indeed make it
a lot easier to analyze.

Best,
Tobias

Reply all
Reply to author
Forward
0 new messages