this could be a late additional project for linbox/fflas-ffpack.
Brice.
== GPU acceleration for dense/sparse matrix multiplication on finite fields ==
Fast exact dense linear algebra on finite fields is the core of the C++ library [[
http://linalg.org/projects/fflas-ffpack|FFLAS-FFPACK]] [1]. More importantly algorithms therein rely on the efficiency of matrix/matrix multiplication and matrix/vector multiplication. The numerical (sparse) BLAS are the building blocks underlying these algorithms.
Recently, a lot of effort has been put into re-factoring the dense code and introducing sparse matrix formats and operations. On the one hand, there is now a clean and efficient implementation for both sequential and shared memory matmul routines. On the other hand, using GPU acceleration (OpenCL) for computations over \(F_p\) was introduced in [[
http://linalg.org|LinBox]] [3]. The goal here is to make use of fast numerical GPU BLAS libraries ([[
http://docs.nvidia.com/cuda/cublas/|cuBLAS]], [[
http://docs.nvidia.com/cuda/cusparse/|cuSPARSE]]). Also, it would be nice to import/implement openCL fall back routines in FFLAS-FFPACK.
A first project would consist in using the library for the dense/sparse matrix multipliation operations and write a \(fmod\) operation for the GPU in cuda/opencl. A little more challenging first project would consist in moving the matrix multiplication OpenCL code from LinBox to FFLAS-FFPACK.
Depending on how this project goes and the goals of the student, it would also be interesting to add an offloading to the GPU mechanism to the existing multi-threaded code.
'''skills/prerequisites''': C/C++, basic linear algebra routines, Cuda or OpenCL
'''mentors:''' Brice Boyer, B. David Saunders