Hi everyone,
I would like to contribute and learn writing compute micro kernels.
I have studied computational engineering with three semesters in HPC, Matrix HPC, computer architecture, and parallel computing. In order to make full use of this knowledge, also to contribute to OpenBLAS, I lack the ability to write compute micro kernels.
How do the developers learn the skills required to write the assembly micro kernels?
I would like to be able to comprehend and tune the assembly
generated from avx-enhanced c++ and tweak it for a given architecture.
I would be glad for your advice on how to learn that, and references to reading material that you recommend. I am also searching for tutoring in micro kernel coding, so if anyone wants to offer their services please get in touch!
Kind regards,
Martin