Anton compared ATLAS, ESSL (the IBM BLAS) and OpenBLAS using 16B aligned
arguments. Testing the functions you have implemented, we see it beat
ATLAS at many tests and mostly match ESSL. Nice work!
He does see 2-3x slower performance on sgemm, strmm, cgemm and ctrmm,
possibly because of the use of lxsspx/stxssp.
Also, it helps to allocate the static buffers with 16B alignment.
--
You received this message because you are subscribed to the Google Groups "OpenBLAS-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openblas-dev...@googlegroups.com.
To post to this group, send email to openbl...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
|
Runs |
M |
N |
K |
lda |
ldb |
alpha |
beta |
|
1 |
1000 |
1 |
1 |
1 |
1 |
-1 |
1 |
|
2 |
128 |
169 |
1728 |
1728 |
169 |
1 |
0 |
|
3 |
128 |
729 |
1200 |
1200 |
729 |
1 |
0 |
|
4 |
192 |
169 |
1728 |
1728 |
169 |
1 |
0 |
|
5 |
256 |
169 |
1 |
1 |
169 |
1 |
1 |
|
6 |
256 |
729 |
1 |
1 |
729 |
1 |
1 |
|
7 |
384 |
169 |
1 |
1 |
169 |
1 |
1 |
|
8 |
384 |
169 |
2304 |
2304 |
169 |
1 |
0 |
|
9 |
50 |
1000 |
1 |
1 |
1000 |
1 |
1 |
|
10 |
50 |
1000 |
4096 |
4096 |
4096 |
1 |
0 |
|
11 |
50 |
4096 |
1 |
1 |
4096 |
1 |
1 |
|
12 |
50 |
4096 |
4096 |
4096 |
4096 |
1 |
0 |
|
13 |
50 |
4096 |
9216 |
9216 |
9216 |
1 |
0 |
|
14 |
96 |
3025 |
1 |
1 |
3025 |
1 |
1 |
|
15 |
96 |
3025 |
363 |
363 |
3025 |
1 |
0 |
<p style="language:en-US;margin-top:0pt;margin-bottom:0pt;margin-left:0in; t
--
--
You received this message because you are subscribed to a topic in the Google Groups "OpenBLAS-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openblas-dev/QqjCsEFuPHo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openblas-dev...@googlegroups.com.
To post to this group, send email to openbl...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
