I need to make a minor change in implementation details of openblas cblas_ssbmv routine to be able to use it in my own program. To do that, I found out sbmv_k.c which is a kernel that is invoked by cblas_ssbmv. In there, there are other routines axpyu_k and dotu_k, whose parameters I can trace in the loop but cannot figure out what exactly they do. In documentation, they are simply dot product and y= a*x+y, but I see sbmv_k.c passed another unknown parameters as well.
For symmetric matrices with bandwith k=2 (size 5x5 and 10x10, X full of 1 and Y initially 0) , without any modification in the library code, openblas cblas_ssbmv routine miscomputes some of the resulting Y elements.
I would like to learn what I am doing wrong. Let's start with which arguments (row/column order is whose order, only that of array A?) need to be passed to cblas_ssbmv. Then, I need to learn the execution steps of axpyu_k and dotu_k.
Thanks in advance.