Sparse matrix memory preallocation?

393 views
Skip to first unread message

Gabriel Goh

unread,
Jan 31, 2016, 5:07:56 PM1/31/16
to julia-users
Generating a sparse matrix from scratch seems to be quite memory intensive. and slow. Say I wish to create a large block diagonal matrix with 2x2 block entries.

Doing it naively is quite slow

function f(k)
  M = spzeros(2*k,2*k)
  for i = 1:k
    D = (i-1)*2 + 1:i*2
    M[D,D] = randn(2,2)
  end
  return M
end

julia> @time f(10000)
2.534277 seconds (239.26 k allocations: 3.013 GB, 15.58% gc time)

Is there a way to speed this up by preallocating the memory somehow?

Kristoffer Carlsson

unread,
Feb 1, 2016, 3:25:22 AM2/1/16
to julia-users
Create the vectors I J V which holds the nonzero rows, columns and values respectively and then call sparse(I, J, V).

Kristoffer Carlsson

unread,
Feb 1, 2016, 4:46:27 AM2/1/16
to julia-users
At computer now.

Something like this:

function f(k)
    I, J, V = Int[], Int[], Float64[]
    for i = 1:k
        idxs = (i-1)*2 + 1:i*2
        for i in idxs, j in idxs
            push!(I, i)
            push!(J, j)
            push!(V, rand())
        end
    end
    return sparse(I,J,V)
end

@time f(10000)
0.001932 seconds (71 allocations: 4.986 MB)

alan souza

unread,
Feb 1, 2016, 7:34:12 AM2/1/16
to julia-users
You could try to use the triplet form (tree vectors containing the row/col indexes and the value of the entry) and call the function sparse.
In this way you can preallocate in advance these three vectors.
However  doing in this way is more cumbersome because you need to have a good estimate of the number of entries and to explicitly calculate the index for all entries.

Kristoffer Carlsson

unread,
Feb 1, 2016, 9:42:33 AM2/1/16
to julia-users
> However  doing in this way is more cumbersome because you need to have a good estimate of the number of entries

Not true. The difference between pre allocating the arrays and just pushing into them is not that large due to how julia arrays work (constant ammortized time etc).

Erik Schnetter

unread,
Feb 1, 2016, 9:56:56 AM2/1/16
to julia...@googlegroups.com
Compare this function:

```Julia
function f2(k)
M = spzeros(2*k,2*k)
for i = 1:k
j1 = (i-1)*2+1
j2 = i*2
M[j1,j1] = rand()
M[j2,j1] = rand()
M[j1,j2] = rand()
M[j2,j2] = rand()
end
return M
end
```
which is much faster. It seems your original code has two performance
issues that are unrelated to sparse matrix memory allocation:
(1) `randn` allocates a new matrix every time
(2) Something about indexing sparse matrices with ranges seems slow (I
don't know why)

If you want to continue to use `randn`, then you can use `randn!`
instead, and preallocate the small matrix outside the loop.

-erik
--
Erik Schnetter <schn...@gmail.com>
http://www.perimeterinstitute.ca/personal/eschnetter/

Stefan Karpinski

unread,
Feb 1, 2016, 12:02:34 PM2/1/16
to Julia Users
randn() also has a scalar form.

Gabriel Goh

unread,
Feb 1, 2016, 9:20:09 PM2/1/16
to julia-users
excellent point your function was indeed about 4x faster, but Kristophers was about 100x :O so i'll stick to his

Gabriel Goh

unread,
Feb 1, 2016, 9:20:57 PM2/1/16
to julia-users
Thanks! This triplet solution was a miracle

Stefan Karpinski

unread,
Feb 2, 2016, 10:15:12 AM2/2/16
to Julia Users
This is a classic pattern for sparse matrices – constructing them by assignment is very inefficient, so it's much better to construct the I, J, V vectors and then call sparse. It's often more efficient to use findnz to get I, J, V, do some operations on these vectors, and call sparse to construct a new matrix, than it is to directly operate on the sparse matrix.
Reply all
Reply to author
Forward
0 new messages