For certain calls, MAGMA is hybrid. (Some calls are native, GPU-only, which is often faster than hybrid on today's GPUs.) For getrf, MAGMA runs the panel on the CPU, and the trailing matrix updates on the GPU. In this case, it doesn't do anything for CPU parallelism; it relies on the underlying LAPACK and BLAS library (e.g., Intel MKL or OpenBLAS) for CPU parallelism. Because the panel is not very wide (nb = 32), the LAPACK library may decide there is not enough work for multiple threads.
You could experiment with larger block sizes in magma_get_zgetrf_nb() in
magma/control/get_nb.cpp
For many LAPACK and BLAS libraries, $OMP_NUM_THREADS controls the number of threads that are used.
Mark