large matrix multiplication

23 views

Skip to first unread message

Tom Carroll

unread,

Nov 25, 2024, 10:41:59 AM11/25/24

to MAGMA User

Hello,

I'm looking for some feedback to see if I'm thinking about this the right way.

I have two large matrices (NxN with N = 125k) that I need to multiply. Each matrix takes up about 120 GB.
I have four Nvidia A100s available, each with 40GB memory = 160 GB.
To get AB, I need a total of 360 GB on the CPU side.
If I cut each matrix into three 40 GB parts, then I need 120 GB on the GPU side.

To accomplish AB, I can break A into three submatrices N/3 x N and break B into three submatrices N x N/3.

[A1] [B1 B2 B3] [A1B1 A1B2 A1B3]
[A2] = [A2B1 A2B2 A2B3]
[A3] [A3B1 A3B2 A3B3]

Thus, I can do 9 matrix multiplications and collect the results on the CPU.

To code this up, can I use something like magma_dsetmatrix_1D_col_bcyclic to copy each piece of A and B over to the GPUs and then use magma_dgemm to multiply?

Thanks for the help!

Cheers,

tom

Reply all

Reply to author

Forward

0 new messages