Hi everyone,
I would like to contribute and learn writing compute micro kernels.
I
have studied computational engineering with three semesters in HPC,
Matrix HPC, computer architecture, and parallel computing. In order to
make full use of this knowledge, also to contribute to OpenBLAS, I lack
the ability to write compute micro kernels.
How do the developers learn the skills required to write the assembly micro kernels?
I would like to be able to comprehend and tune the assembly
generated from avx-enhanced c++ and tweak it for a given architecture.
I would be glad for your advice on how to learn that, and references to
reading material that you recommend. I am also searching for tutoring in
micro kernel coding, so if anyone wants to offer their services please
get in touch!
Kind regards,
Martin