Dynamically growing array?

2,875 views
Skip to first unread message

Sheehan Olver

unread,
Dec 27, 2013, 8:45:15 PM12/27/13
to julia...@googlegroups.com
What's the "best" way of constructing an array that can grow adaptively?  For example, it has fixed m rows but the number of columns grows as an algorithm proceeds.  Unfortunately,

resize!

doesn't work for 2d arrays.  It does work for Array{Array{Float64,1},1}, but not sure that's optimal.





Ivar Nesje

unread,
Dec 28, 2013, 7:43:06 AM12/28/13
to julia...@googlegroups.com
Which way is BEST depends on the size of the array and if performance or maintainability/simplicity is most important. If performance is important you have to do testing on your specific use case to see which solution does the right trade of.

I think the most readable way is to collect the columns in an Array{Array{Float64,1},1} and then do hcat(a...) to get the array. Maybe we could define full{T<:Number}(a::Array{Array{T,1},1}) to julia Base to do that concatenation in a more efficient way. 

The best performance is to know or calculate the array size beforehand, and write all numbers directly into that. If the calculation of the number of columns is expensive you can append every column to the end of a 1d array, and then use reshape to get the right shape.

Ivar

Tim Holy

unread,
Dec 28, 2013, 10:08:13 AM12/28/13
to julia...@googlegroups.com
Holding columns in separate entries is a great way. However, if you need to do
linear algebra on the matrix at intermediate stages during its growth, then
you'll have a lot of needless copying occurring while you convert the column-
storage into a matrix.

In such circumstances, there's a sneaky workaround:

reshape1(a::Vector, dims::Dims) = pointer_to_array(pointer(a), dims)

a = zeros(3)
c = ones(3)
append!(a, c)
A = reshape1(a, (3, div(length(a),3)))
c += 1
append!(a, c)
A = reshape1(a, (3, div(length(a),3)))

Using pointer_to_array circumvents the ordinary protections built into resize!
There's still allocation occurring (it has to build a new Array "wrapper" on
each iteration), but it avoids copying any data, and for large amounts of data
this is a big win.

Even better would be to generalize resize! to support the final dimension of
any array. I seem to remember Stefan had a reason why this might be
problematic, but I confess I forget what it is.

--Tim

Stefan Karpinski

unread,
Dec 28, 2013, 12:25:50 PM12/28/13
to julia...@googlegroups.com
The issue was bounds check elimination, which is already a problem for 1d arrays. Currently it's very hard to eliminate them because arrays can get resized out from under you at any point.

Tim Holy

unread,
Dec 28, 2013, 3:32:50 PM12/28/13
to julia...@googlegroups.com
On Saturday, December 28, 2013 12:25:50 PM Stefan Karpinski wrote:
> The issue was bounds check elimination, which is already a problem for 1d
> arrays. Currently it's very hard to eliminate them because arrays can get
> resized out from under you at any point.

Thanks for the reminder.

--Tim

Toivo Henningsson

unread,
Dec 28, 2013, 3:35:50 PM12/28/13
to julia...@googlegroups.com
So what happens if you use Tim's sneaky workaround and resize the 1d array? I suppose that the pointer is no longer valid...

Stefan Karpinski

unread,
Dec 28, 2013, 3:36:18 PM12/28/13
to Julia Users
If you make a mistake, segfault.

Sheehan Olver

unread,
Dec 28, 2013, 6:46:32 PM12/28/13
to julia...@googlegroups.com
Thanks for the suggestions.  It seems best to do the simplest first, and then optimize later if memory management is taking a significant cost.   So I think I’ll stick with reshape! and Array{Array{Float64,1},1}.

Deniz Yuret

unread,
Jun 5, 2015, 3:19:59 PM6/5/15
to julia...@googlegroups.com
I had the same problem and stumbled on this post.  Any kernel / support vector based machine learning algorithm needs this.

Here is one more suggestion that might be useful in some circumstances: 

1. Create a large 2D array X of size (a,b).  
2. Keep track of the actual columns that are filled (c).  
3. Use subarray(X, :, 1:c) for actual operations.
4. Increment c and use copy! for appending additional columns.
5. Grow b as necessary.

This is basically implementing a 2D dynamic array and can be wrapped up in a new type for clean code.
Advantages: no sneaky pointer manipulations, no reshape, no unnecessary allocations.
Disadvantages: not easy to make the code pretty.

best,
deniz

Eric Forgy

unread,
Oct 9, 2015, 11:21:34 AM10/9/15
to julia-users
I find myself in the same position. What is the current wisdom on this subject? Any news tricks available with 0.4?

In my case, the number of rows is fairly small (<100), but the number of columns can grow quite large (pushing the limits of a laptop).

Thanks.

Sheehan Olver

unread,
Oct 9, 2015, 12:06:21 PM10/9/15
to julia...@googlegroups.com
I just stuck to using an array with an extra parameter “datalength” that tells which entries of the array are turned on.  When the array would overrun, I double and copy.
Reply all
Reply to author
Forward
0 new messages