SubArray memory footprint

Sebastian Good

unread,

Mar 25, 2015, 2:18:08 PM3/25/15

to julia...@googlegroups.com

I was surprised by two things in the SubArray implementation

1) They are big! About 175 bytes for a simple subset from a 1D array from my naive measurement.[*]

2) They are not flat. That is, they seem to get heap allocated and have indirections in them.

I'm guessing this is because SubArrays aren't immutable, and tuples aren't always inlined into an immutable either, but I am really grasping at straws.

I'm walking through a very large memory mapped structure and generating hundreds of thousands of subarrays to look at various windows of it. I was hoping that by using views I would reduce memory usage as compared with creating copies of those windows. Indeed I am, but by a lot less than I thought I would be.

In other words: SubArrays are surprisingly expensive because they necessitate several memory allocations apiece.

From the work that's gone into SubArrays I'm guessing that isn't meant to be. They are so carefully specialized that I would expect them to behave roughly like a (largish) struct in common use.

Is this a misconception? Do I need to take more care about how I parameterize the container I put them in to take advantage?

[*]

> const b = [1:5;]

> function f()

for i in 1:1_000_000 sub(b, 1:2) end

end

> @time f()

elapsed time: 0.071933306 seconds (175 MB allocated, 9.21% gc time in 8 pauses with 0 full sweep)

Kevin Squire

unread,

Mar 25, 2015, 3:07:22 PM3/25/15

to julia...@googlegroups.com

Others are more qualified to answer the specific question about SubArrays, but you might check out the ArrayViews package. For your test, it allocates a little under half the memory and is a little over twice as fast (after warmup);

julia> const b = [1:5;];

julia> function f()

for i in 1:1_000_000 sub(b, 1:2) end

end

f (generic function with 1 method)

julia> using ArrayViews

julia> function f2()

for i in 1:1_000_000 view(b, 1:2) end

end

f2 (generic function with 1 method)

julia> @time f() # after warmup

elapsed time: 0.048006869 seconds (137 MB allocated, 6.80% gc time in 6 pauses with 0 full sweep)

julia> @time f2() # after warmup

elapsed time: 0.018902176 seconds (61 MB allocated, 6.60% gc time in 2 pauses with 0 full sweep)

Cheers,

Kevin

Tim Holy

unread,

Mar 25, 2015, 3:30:02 PM3/25/15

to julia...@googlegroups.com

SubArrays are immutable on 0.4. But tuples aren't inlined, which is going to
force allocation.

Assuming you're using 0.3, there's a second problem: the code in the
constructor is not type-stable, and that makes construction slow and memory-
hungry. Compare the following on 0.3 and 0.4:

julia> A = rand(2,10^4);

julia> function myfun(A)
s = 0.0
for j = 1:size(A,2)
S = slice(A, :, j)
s += sum(S)
end
s
end
myfun (generic function with 1 method)

On 0.3:
# warmup call
julia> @time myfun(A)
elapsed time: 0.145141435 seconds (11277536 bytes allocated)

# the real call
julia> @time myfun(A)
elapsed time: 0.034556106 seconds (7866896 bytes allocated)

On 0.4:
julia> @time myfun(A)
elapsed time: 0.190744146 seconds (7 MB allocated)

julia> @time myfun(A)
elapsed time: 0.000697173 seconds (1 MB allocated)

So you can see it's about 50x faster and about 8-fold more memory efficient on
0.4. Once Jeff finishes his tuple overhaul, the allocation on 0.4 could
potentially drop to 0.

--Tim

Sebastian Good

unread,

Mar 25, 2015, 3:47:59 PM3/25/15

to Tim Holy, julia...@googlegroups.com

That helps a bit; I am indeed working on v0.4. A zero-allocation SubArray would be a phenomenal achievement. I guess it’s at that point that getindex with ranges will return SubArrays, i.e. mutable views, instead of copies? Is that still targeted for v0.4?

Tim Holy

unread,

Mar 26, 2015, 5:13:26 PM3/26/15

to Sebastian Good, julia...@googlegroups.com

It's better to keep these conversations on the list (CCed).

I can't tell from what you've shown whether you're talking about creation
performance or usage performance. For most cases, usage (indexing) performance
should be nearly at the level of Arrays, if you believe the benchmarks
(https://github.com/JuliaLang/julia/pull/8501#issuecomment-63232821,
https://github.com/JuliaLang/julia/pull/9329,
https://github.com/JuliaLang/julia/pull/10507). Are you sure you're using a
recent 0.4 checkout? You don't have a type-stability problem, do you?

--Tim

On Thursday, March 26, 2015 03:44:13 PM you wrote:
> Now that I’ve used a few in anger, they also seem slow in v0.4, like 50x
> slower than regular arrays. Is this down to bounds checking?
>
> My case is simple, I have data that looks like
>
> ..header..|..data[n]..|..header..|..data[n]..|...
>
> So all I have to do to expose the data as a 2D array is
> sub(header_len+1:header_len+n, 1:num_records) I’d think it would specialize
> very fast since there is still contiguous access in the first dimension,
> and a contiguous stride.
>
> And it’s good enough to get working, but I was a bit surprised. Any
> suggestions?
>
> On March 25, 2015 at 4:48:41 PM, Sebastian Good
> (seba...@palladiumconsulting.com) wrote:
>
> I think I’d have to know a lot more about Julia to have a reasonable
> contribution in those murky waters!
>
> On March 25, 2015 at 4:07:19 PM, Tim Holy (tim....@gmail.com) wrote:

>
> On Wednesday, March 25, 2015 03:47:55 PM you wrote:
> > I guess it’s at that point that getindex
> > with ranges will return SubArrays, i.e. mutable views, instead of copies?
> > Is that still targeted for v0.4?
>

> In my opinion, that depends mostly on whether we settle on a solution to
> https://github.com/JuliaLang/julia/pull/8227 in time. Help wanted :-).
>
> --Tim

Sebastian Good

unread,

Mar 26, 2015, 10:48:37 PM3/26/15

to Tim Holy, julia...@googlegroups.com

Sorry, didn’t realize the conversation had wandered off-list. Thanks for correcting that. I’ll check the type-stability and return with some benchmarks. I’ve got a julia build as of yesterday showing about a 5x decrease over native memory access which surprised me a little bit.

julia> const a = rand(10_000, 10_000);

(after a few repetitions…)

julia> @time mean(a)

elapsed time: 0.050039199 seconds (184 bytes allocated)

0.5000072164856915

julia> @time mean(sub(a, 1:10_000, 1:10_000));

elapsed time: 2.349825138 seconds (584 bytes allocated)

Here we’re about 5x slower. In my larger case we’re about 50x slower, but I will check there for type instability.

Tim Holy

unread,

Mar 27, 2015, 1:25:00 AM3/27/15

to Sebastian Good, julia...@googlegroups.com

Nice catch! It seems clear that not all algorithms have been updated to use
fast iteration, and this is one of them. In contrast, compare
s = sub(a, 1:10_000, 1:10_000)
@time sum(s, (1,2))
which you'll see is just as fast for an array.

It all comes down to the iteration paradigm; it's not possible to make linear
indexing fast for all array types, so for SubArrays one needs to use cartesian
iteration. There are probably more algorithms in base that need to be modified.
Can you file an issue with any you notice?

Best,
--Tim

Sebastian Good

unread,

Mar 27, 2015, 8:21:10 AM3/27/15

to Tim Holy, julia...@googlegroups.com

Forgive my ignorance, but what is Cartesian indexing?

--

Sebastian Good

Matt Bauman

unread,

Mar 27, 2015, 9:20:02 AM3/27/15

to julia...@googlegroups.com, tim....@gmail.com

On Friday, March 27, 2015 at 8:21:10 AM UTC-4, Sebastian Good wrote:

Forgive my ignorance, but what is Cartesian indexing?

There are two ways to iterate over all elements of an array: Linear indexing and Cartesian indexing. For example, given a 2x3 matrix, linear indexing would use just one index from 1:6, whereas Cartesian indexing specifies indices for both dimensions: (1,1), (1,2), (2,1), ...

If an array isn't stored continuously in memory for linear indexing, converting to the Cartesian indices is very expensive (because it requires integer division, which is a surprising slow). The new `eachindex` method in 0.4 returns an iterator to go over all the Cartesian indices very quickly.

Sebastian Good

unread,

Mar 27, 2015, 10:58:46 AM3/27/15

to julia...@googlegroups.com, tim....@gmail.com

Ah, excellent, that makes sense. In practice this is how we'll be doing it anyway.

--

Sebastian Good

Sebastian Good

unread,

Mar 27, 2015, 6:56:34 PM3/27/15

to Tim Holy, Julia Users

Will do!

On Friday, March 27, 2015, Tim Holy <tim....@gmail.com> wrote:

Thanks for chiming in, Matt.

I should have added that there are _some_ SubArrays that do have efficient
linear indexing: sub(A, :, 3:5) and sub(A, 2, :), for a matrix A, are two
examples. (The latter in particular is one of the advantages of 0.4's
SubArrays over ArrayViews.) But in general it's not efficient.

But Sebastian, please do file those issues. It's hard to keep track of what
needs updating, and issues are vastly preferable to posts to julia-users. For
instance, it's already gone clean out of my head what function was slow with
SubArrays :-).

--Tim

--

Sebastian Good

Sebastian Good

unread,

Mar 28, 2015, 7:10:54 AM3/28/15

to julia...@googlegroups.com, tim....@gmail.com

Random thought: If tuples can avoid boxing why not return tuples from the iterator protocol?

--

Sebastian Good

Tim Holy

unread,

Mar 28, 2015, 12:38:18 PM3/28/15

to Sebastian Good, julia...@googlegroups.com

Right now getindex(::AbstractArray, ::Tuple) isn't defined (though there's a
proposal to define it as one of several options to solve a tricky problem, see
https://github.com/JuliaLang/julia/pull/10525#issuecomment-84597488).

More importantly, tuples don't have +, -, min, and max defined for them, and
it's not obvious they should. But all that is supported by CartesianIndex.

--Tim

Sebastian Good

unread,

Mar 29, 2015, 11:12:57 AM3/29/15

to Tim Holy, julia...@googlegroups.com

Makes sense. Was just thinking naively of the start and next functions using tuples, as they would know the proper dimensions etc but I bet this is what Cartesian index does and now I should look at it :-)