large matrix multiplication

20 views
Skip to first unread message

Tom Carroll

unread,
Nov 25, 2024, 10:41:59 AM11/25/24
to MAGMA User
Hello,

I'm looking for some feedback to see if I'm thinking about this the right way.
  • I have two large matrices (NxN with N = 125k) that I need to multiply. Each matrix takes up about 120 GB. 
  • I have four Nvidia A100s available, each with 40GB memory = 160 GB.
  • To get AB, I need a total of 360 GB on the CPU side.
  • If I cut each matrix into three 40 GB parts, then I need 120 GB on the GPU side.
To accomplish AB, I can break A into three submatrices N/3 x N and break B into three submatrices N x N/3.

 [A1] [B1 B2 B3]   [A1B1  A1B2  A1B3]
 [A2]            = [A2B1  A2
B2  A2B3]
 [A3]              [A3B1  A3
B2  A3B3]

Thus, I can do 9 matrix multiplications and collect the results on the CPU.

To code this up, can I use something like magma_dsetmatrix_1D_col_bcyclic to copy each piece of A and B over to the GPUs and then use magma_dgemm to multiply?

Thanks for the help!

Cheers,
tom
Reply all
Reply to author
Forward
0 new messages