Matrix Extraction Efficiency Problem

192 views
Skip to first unread message

Todd Leo

unread,
Nov 28, 2014, 4:45:12 AM11/28/14
to julia...@googlegroups.com
Hi Fellows,

Say I have a 1000 x 1000 matrix, and I'm going to do some calculation in a nested for-loop, with each pair of rows/cols in the matrix. But I suffered a heavy performance penalty in row/col extraction. Here's my minimum reproducible example, which I hope explains itself.

A = rand(0.:0.01:1.,1000,1000)

function test(x)
    for i in 1:1000, j in 1:1000
        x[:,i]
        x[:,j]
    end
end

test(A) # warm up
gc()
@time test(A)
## elapsed time: 13.28547939 seconds (16208000080 bytes allocated, 72.42% gc time)

 It takes 13 seconds, only extracting the rows/cols for the sake of further calculations. I'm wondering if anything I could do to improve the performance.Thanks in advance.

--
BEST REGARDS,
Todd Leo

Milan Bouchet-Valat

unread,
Nov 28, 2014, 5:00:07 AM11/28/14
to julia...@googlegroups.com
This is because extracting a row/column creates a copy. Depending on
what calculation you want to do on them, you can:
- use arrays views (which will become the default when extracting slices
in 0.4): https://github.com/JuliaLang/ArrayViews.jl
- manually write loops to go over the row and column so that you only
extract one individual element of the matrix at a time


Regards

SLiZn Liu

unread,
Nov 28, 2014, 5:21:46 AM11/28/14
to julia...@googlegroups.com
I'm doing row-wise/col-wise calculation, isn't it inevitable to create row/col copies after iteratively extract single elements? I will consider to take a shot on option 1, ArrayViews if this single-element-extraction comes to a dead end. Thanks, Milan!

Milan Bouchet-Valat

unread,
Nov 28, 2014, 7:49:51 AM11/28/14
to julia...@googlegroups.com
Le vendredi 28 novembre 2014 à 10:21 +0000, SLiZn Liu a écrit :
> I'm doing row-wise/col-wise calculation, isn't it inevitable to create
> row/col copies after iteratively extract single elements?
No, I don't think so, though sometimes you'll want to extract a full
row/column to pass it to a standard function instead of writing all of
the computations by hand. That's where array views are very useful.

But can you give more details about the calculation you need to do?


Regards

SLiZn Liu

unread,
Nov 30, 2014, 9:50:30 PM11/30/14
to julia...@googlegroups.com
I have a n by m dense matrix, and each row is a vector representing variating flows like stock price, and I'd like to find out the two vectors which have the highest similarity using cor(). 
Hence, a nested for-loop was utilized to calculate the similarity between each pair, and fill the similarity into an n by n adjacency matrix. 

Milan Bouchet-Valat

unread,
Dec 1, 2014, 7:55:15 AM12/1/14
to julia...@googlegroups.com
Le lundi 01 décembre 2014 à 02:50 +0000, SLiZn Liu a écrit :
> I have a n by m dense matrix, and each row is a vector
> representing variating flows like stock price, and I'd like to find
> out the two vectors which have the highest similarity using cor().
> Hence, a nested for-loop was utilized to calculate the similarity
> between each pair, and fill the similarity into an n by n adjacency
> matrix.
In that case you can simply use the Distances.jl package like this:
pairwise(CorrDist(), x)

(Though this will compute correlations between columns, not rows. And
the distance is 1 - correlation.)


If you look at the code it uses by calling
edit(pairwise!)

you'll see that it relies on array views to avoid creating copies of the
columns.


Regards

Todd Leo

unread,
Dec 1, 2014, 9:25:45 PM12/1/14
to julia...@googlegroups.com
Big thanks for your tips! 

(Though this will compute correlations between columns, not rows. And 
the distance is 1 - correlation.) 
 Since matrix transpose is easy to obtain, there's no obstacle to adapt to CorrDist() .

I've also tried ArrayViews on my previous applications, except ArrayViews does not support sparse matrices. Will check on Linda Hua's source code and try to implement views on sparse matrices soon.

--
REGARDS,
Todd Leo

Tim Holy

unread,
Dec 1, 2014, 9:34:17 PM12/1/14
to julia...@googlegroups.com
On Monday, December 01, 2014 06:25:45 PM Todd Leo wrote:
>> I've also tried ArrayViews on my previous applications, except ArrayViews
> does not support sparse matrices. Will check on Linda Hua's source code and
> try to implement views on sparse matrices soon.

A potentially easier path: SubArrays support views of sparse matrices. It's
also worth noting that in julia 0.4, SubArrays have been improved and have
performance similar to (or better than) ArrayViews.

--Tim

Kevin Squire

unread,
Dec 1, 2014, 10:06:57 PM12/1/14
to julia...@googlegroups.com
Just FYI, "Linda Hua" is actually "Dahua Lin".  :-)

Cheers,
   Kevin 

Todd Leo

unread,
Dec 1, 2014, 10:25:57 PM12/1/14
to julia...@googlegroups.com
You are right, shame on me... :D

Todd Leo

unread,
Dec 1, 2014, 10:26:44 PM12/1/14
to julia...@googlegroups.com
Thanks Kevin, will try SubArrays first.
Reply all
Reply to author
Forward
0 new messages