Why do dgemm_itcopy and sgemm_itcopy get different results for a same matrix?

33 views
Skip to first unread message

Shan Kang

unread,
Oct 22, 2019, 4:40:54 AM10/22/19
to OpenBLAS-dev
Hi,
I wrote a very simple test apps. Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <cblas.h>
#include <lapacke.h>

#define SIZE1 8
#define SIZE2 8

double src[SIZE1*SIZE2];
double dst[SIZE1*SIZE2];

int my_print_array(double a[])
{
        for (int i = 0; i < SIZE1*SIZE2; i++)
                printf("%.0f ", a[i]);
        printf("\n");
}

int main(int argc, char **argv)
{
        for(int i=0; i<SIZE2; i++)
            for(int j=0; j<SIZE1; j++)
                src[i*SIZE1+j] = i*10+j;
        printf("src:\n");
        my_print_array(src);
        dgemm_itcopy(SIZE1, SIZE2, src, 8, dst);
        printf("dst:\n");
        my_print_array(dst);
}

The "sgemm_itcopy " test app is almost the same as the above code, except using "float" instead of "double" and using "sgemm_itcopy" instead of "dgemm_itcopy".

dgemm_itcopy results:
src:
0   1   2   3   4   5   6   7 
10 11 12 13 14 15 16 17 
20 21 22 23 24 25 26 27 
30 31 32 33 34 35 36 37 
40 41 42 43 44 45 46 47 
50 51 52 53 54 55 56 57 
60 61 62 63 64 65 66 67 
70 71 72 73 74 75 76 77
dst:
0   1   2   3   10 11 12 13 
20 21 22 23 30 31 32 33 
40 41 42 43 50 51 52 53 
60 61 62 63 70 71 72 73 
4   5   6   7   14 15 16 17 
24 25 26 27 34 35 36 37 
44 45 46 47 54 55 56 57 
64 65 66 67 74 75 76 77
It looks reasonable.

sgemm_itcopy results:
src:
0   1   2   3   4   5   6   7 
10 11 12 13 14 15 16 17 
20 21 22 23 24 25 26 27 
30 31 32 33 34 35 36 37 
40 41 42 43 44 45 46 47 
50 51 52 53 54 55 56 57 
60 61 62 63 64 65 66 67 
70 71 72 73 74 75 76 77
dst:
0   1   2   3   4   5   6   7 
10 11 12 13 14 15 16 17 
20 21 22 23 24 25 26 27 
30 31 32 33 34 35 36 37 
40 41 42 43 44 45 46 47 
50 51 52 53 54 55 56 57 
60 61 62 63 64 65 66 67 
70 71 72 73 74 75 76 77
The src and dst matrices are identical which is different from dgemm_itcopy.
I wonder whether it is an expected behavior or it is a bug.
I use the latest dev-branch code. dgemm_itcopy's implementation is "kernel/generic/gemm_tcopy_4.c" and sgemm_itcopy's implementation is "kernel/x86_64/sgemm_tcopy_16_skylakex.c".


Reply all
Reply to author
Forward
0 new messages