Is it make sense to use pinned memory for Native mode?
Which type of the memory should be used in Native mode? and which type for hybrid?
When we are using Native mode and when Hybrid?
What about the no_pivoting version? Here just we have hybrid mode?
--
You received this message because you are subscribed to the Google Groups "MAGMA User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magma-user+...@icl.utk.edu.
To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/magma-user/3dc1abf1-bd6b-467a-b07f-da8f7d17c192n%40icl.utk.edu.
On Nov 24, 2020, at 10:34 AM, aran nokan <noka...@gmail.com> wrote:Awesome, thanks!magma_dgetrf is a hybrid routine. It assumes that the input matrix is in the CPU memory. Pinned memory is recommended for this routine, because it allows faster data copies across the CPU-GPU interconnect.So what is the difference between magma_dgetrf and magma_dgetrf_gpu. As I understand, memory of magma_dgetrf is initially allocated in CPU and after that the routine will allocate another one in GPU also and magma_dgetrf_gpu has GPU memory initially and after that allocate CPU copy also. Am I wrong?
No. Native routines accept GPU pointers only.Actually I allocated pinned memory and the input was accepted but the time of excitement was higher. Also if I use pinned memory for hybrid mode the time is equal. What is the problem here (why is the pinned memory accepted?)?
That really depend on your system configuration, and the size of your matrix. Hybrid routines perform best when you have a high-end CPU with optimized LAPACK software (e.g. a recent Intel CPU with MKL). Native routines are independent from the CPU. They don’t use it in any computational workload. Native routines can perform better than hybrid routines on small matrices, regardless of the system configuration.Can I ask about the small here? Is it related to the GPU? for P100 and Volta which dimension is small? around 20k, Native seems to be better.
No. Native routines accept GPU pointers only.Actually I allocated pinned memory and the input was accepted but the time of excitement was higher. Also if I use pinned memory for hybrid mode the time is equal. What is the problem here (why is the pinned memory accepted?)?
Another graph below shows that Native seems to be always better. This is because the CPU is different (IBM POWER8), and so we cannot use MKL. Due to the lake of an optimized factorization on the CPU, the hybrid routine suffers from a big performance drop. The intersection point is beyond 40k.
Last question. Is working with MAGMA hard or am I a dummy?!
If you are new to MAGMA, give it more time and you will like it :-)Ahmad
On Tue, Nov 24, 2020 at 5:59 PM Ahmad Abdelfattah <ah...@icl.utk.edu> wrote:
Hi Aran,
- magma_dgetrf is a hybrid routine. It assumes that the input matrix is in the CPU memory. Pinned memory is recommended for this routine, because it allows faster data copies across the CPU-GPU interconnect.
- magma_dgetrf_gpu is a hybrid routine. It assumes that the input matrix is in the GPU memory. Internally, it allocates pinned memory workspaces on the CPU. As a user, you just need to allocate the matrix on the GPU memory.
- magma_dgetrf_native uses the GPU only for performing the factorization. It assumes that the input matrix is in the GPU memory. You cannot pass pinned memory pointers to this routine.
- All of the three routines assume that the pivot vector is in the CPU memory.
So, coming to your questions.
Is it make sense to use pinned memory for Native mode?
No. Native routines accept GPU pointers only.
Which type of the memory should be used in Native mode? and which type for hybrid?Native —> GPU memoryHybrid without the “_gpu” suffix —> CPU memory, preferably pinnedHybrid with the “_gpu” suffix —> GPU memoryWhen we are using Native mode and when Hybrid?
That really depend on your system configuration, and the size of your matrix. Hybrid routines perform best when you have a high-end CPU with optimized LAPACK software (e.g. a recent Intel CPU with MKL). Native routines are independent from the CPU. They don’t use it in any computational workload. Native routines can perform better than hybrid routines on small matrices, regardless of the system configuration.
What about the no_pivoting version? Here just we have hybrid mode?There is currently no native mode for non-pivoting LU.Ahmad