ARM performance comparison of ATLAS, OpenBLAS and BLIS

4,764 views
Skip to first unread message

Jeff Hammond

unread,
Jan 13, 2014, 2:46:15 PM1/13/14
to openbla...@googlegroups.com
Perhaps people are interested in the results I obtained on ARM (specifically https://developer.nvidia.com/content/kayla-platform) with three different implementations of the BLAS.  I attached the data for DGEMM but have the rest if anyone cares.

I compared the latest ARM builds of OpenBLAS as of today, the latest version of BLIS (https://code.google.com/p/blis/) as of today and ATLAS 3.11.22 (which took more than 5 days to build).  The performance of ATLAS 3.8.4 that the package manager installed was ~25% worse.

Jeff
kayla-dgemm.pdf

Zhang Xianyi

unread,
Jan 14, 2014, 3:48:09 AM1/14/14
to Jeff Hammond, openbla...@googlegroups.com
Hi Jeff,

Thank you for the test. The kayla is also a Cortex-A9 CPU, the same cpu core in our development platform (Samsung Exynos 4412) .

I think werner unrolled 4 in OpenBLAS, while ATLAS used 5.
Werner, any comments?

Xianyi


2014/1/14 Jeff Hammond <jeff.s...@gmail.com>
Perhaps people are interested in the results I obtained on ARM (specifically https://developer.nvidia.com/content/kayla-platform) with three different implementations of the BLAS.  I attached the data for DGEMM but have the rest if anyone cares.

I compared the latest ARM builds of OpenBLAS as of today, the latest version of BLIS (https://code.google.com/p/blis/) as of today and ATLAS 3.11.22 (which took more than 5 days to build).  The performance of ATLAS 3.8.4 that the package manager installed was ~25% worse.

Jeff

--
You received this message because you are subscribed to the Google Groups "OpenBLAS-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openblas-user...@googlegroups.com.
To post to this group, send email to openbla...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

José Luis García Pallero

unread,
Jan 14, 2014, 7:10:08 AM1/14/14
to Jeff Hammond, openbla...@googlegroups.com
2014/1/13 Jeff Hammond <jeff.s...@gmail.com>:
Hello,

I've yet discovered BLIS, and it is an interesting package. But I'm a
bit confused. Is it a native BLAS implementation or uses internally an
optimized BLAS, as OpenBLAS or ATLAS? Apparently is a native
implementation, as shows the differences in your results. In this
case, is based on assembler, C...?

>
> Jeff
>
> --
> You received this message because you are subscribed to the Google Groups
> "OpenBLAS-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to openblas-user...@googlegroups.com.
> To post to this group, send email to openbla...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



--
*****************************************
José Luis García Pallero
jgpa...@gmail.com
(o<
/ / \
V_/_
Use Debian GNU/Linux and enjoy!
*****************************************

Werner Saar

unread,
Jan 14, 2014, 7:24:03 AM1/14/14
to openbla...@googlegroups.com
Hi,

here are the results of my performance tests for dgemm:

* 1 Core: 1.57 GFLOPS ATLAS: 1.59 GFLOPS
* 2 Cores: 3.14 GFLOPS ATLAS: 3.16 GFLOPS
* 3 Cores: 4.56 GFLOPS ATLAS: 4.60 GFLOPS
* 4 Cores: 5.82 GFLOPS ATLAS: 5.41 GFLOPS

In Atlas the most important parts are also written in assembler
and Atlas unrolls 5 values while in Openblas I only unroll 4 values.
So atlas needs fewer read operations compared to the number
of floating point operations. If you run a benchmark with different
sizes from small to big, you will see that Openblas performs always
very well while Atlas only has good performance with big problem sizes.
But the great advantage of OpenBLAS compared to Atlas is, that it's very
easy to build and it's also easy to create a binary package, that performs
well on a lot of platforms.

Best regards
Werner

Jeff Hammond

unread,
Jan 14, 2014, 7:42:30 AM1/14/14
to Werner Saar, openbla...@googlegroups.com
Exactly. My ATLAS build took six days on this chip. BLIS took about a minute, probably mostly due to NFS-mount overhead.

BLIS is not yet optimized for DGEMM on A9. SGEMM for A15 would be a more fair comparison for them. I sent them the results as well and they are interested in improving their implementation. But I don't care about SGEMM or have an A15, so I didn't do that.

Thanks for the interesting comments on unrolling.

Jeff

Sent from my iPhone
> You received this message because you are subscribed to a topic in the Google Groups "OpenBLAS-users" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/openblas-users/W1tviVFDx1w/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to openblas-user...@googlegroups.com.

"C. Bergström"

unread,
Jan 14, 2014, 7:58:42 AM1/14/14
to Jeff Hammond, Werner Saar, openbla...@googlegroups.com
On 01/14/14 07:42 PM, Jeff Hammond wrote:
> Exactly. My ATLAS build took six days on this chip. BLIS took about a minute, probably mostly due to NFS-mount overhead.
>
> BLIS is not yet optimized for DGEMM on A9. SGEMM for A15 would be a more fair comparison for them. I sent them the results as well and they are interested in improving their implementation. But I don't care about SGEMM or have an A15, so I didn't do that.
>
> Thanks for the interesting comments on unrolling.
Sorry to hi-jack the thread a little, but what's the point of this? Is
this substantially better than say Atom + NVIDIA? To me I'd be really
interested to see (Cortex-5x) AArch64 + NVIDIA - Where is project Denver?

José Luis García Pallero

unread,
Jan 14, 2014, 8:27:36 AM1/14/14
to Jeff Hammond, openbla...@googlegroups.com
2014/1/13 Jeff Hammond <jeff.s...@gmail.com>:
Hello,

I've installed BLIS apparently without errors, but I can not find the
cblas.h file for cblas interface. Includes BLIS a CBLAS interface?

Thanks

>
> Jeff
>
> --
> You received this message because you are subscribed to the Google Groups
> "OpenBLAS-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to openblas-user...@googlegroups.com.
> To post to this group, send email to openbla...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



Jeff Hammond

unread,
Jan 14, 2014, 8:33:37 AM1/14/14
to "C. Bergström", Werner Saar, openbla...@googlegroups.com
ARM is interesting to me for price, power and architectural diversity reasons.

I cannot comment on any vendor roadmaps right now. Sorry.

Jeff

Sent from my iPhone

Werner Saar

unread,
Jan 14, 2014, 8:58:01 AM1/14/14
to Jeff Hammond, "C. Bergström", openbla...@googlegroups.com
Hi,

My plans are to integrate MALI GPU's in the further development of
OpenBLAS for ARM32 as well as for AArch64,
because it's possible to use unified memory with these GPU's.
I don't want to integrate proprietary software like Cuda and I don't
want to reinvent the wheel, some good
software like Magma or ViennaCL is already available.

We only need the appropriate hardware for the further development.

Best regards
Werner

José Luis García Pallero

unread,
Jan 14, 2014, 9:05:18 AM1/14/14
to Werner Saar, Jeff Hammond, C. Bergström, openbla...@googlegroups.com
2014/1/14 Werner Saar <wern...@googlemail.com>:
> On 14.01.2014 14:33, Jeff Hammond wrote:
>>
>> ARM is interesting to me for price, power and architectural diversity
>> reasons.
>>
>> I cannot comment on any vendor roadmaps right now. Sorry.
>>
>> Jeff
>>
>> Sent from my iPhone
>>
>>> On Jan 14, 2014, at 6:58 AM, "C. Bergström" <cberg...@pathscale.com>
>>> wrote:
>>>
>>>> On 01/14/14 07:42 PM, Jeff Hammond wrote:
>>>> Exactly. My ATLAS build took six days on this chip. BLIS took about a
>>>> minute, probably mostly due to NFS-mount overhead.
>>>>
>>>> BLIS is not yet optimized for DGEMM on A9. SGEMM for A15 would be a more
>>>> fair comparison for them. I sent them the results as well and they are
>>>> interested in improving their implementation. But I don't care about SGEMM
>>>> or have an A15, so I didn't do that.
>>>>
>>>> Thanks for the interesting comments on unrolling.
>>>
>>> Sorry to hi-jack the thread a little, but what's the point of this? Is
>>> this substantially better than say Atom + NVIDIA? To me I'd be really
>>> interested to see (Cortex-5x) AArch64 + NVIDIA - Where is project Denver?
>
> Hi,
>
> My plans are to integrate MALI GPU's in the further development of OpenBLAS
> for ARM32 as well as for AArch64,
> because it's possible to use unified memory with these GPU's.

It sounds very good. Does MALI double precision capabilities?

> I don't want to integrate proprietary software like Cuda and I don't want to
> reinvent the wheel, some good
> software like Magma or ViennaCL is already available.
>
> We only need the appropriate hardware for the further development.
>
> Best regards
> Werner
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "OpenBLAS-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to openblas-user...@googlegroups.com.
> To post to this group, send email to openbla...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



Werner Saar

unread,
Jan 14, 2014, 9:12:26 AM1/14/14
to José Luis García Pallero, Jeff Hammond, "C. Bergström", openbla...@googlegroups.com
On 14.01.2014 15:05, Jos� Luis Garc�a Pallero wrote:
> 2014/1/14 Werner Saar <wern...@googlemail.com>:
>> On 14.01.2014 14:33, Jeff Hammond wrote:
>>> ARM is interesting to me for price, power and architectural diversity
>>> reasons.
>>>
>>> I cannot comment on any vendor roadmaps right now. Sorry.
>>>
>>> Jeff
>>>
>>> Sent from my iPhone
>>>
>>>> On Jan 14, 2014, at 6:58 AM, "C. Bergstr�m" <cberg...@pathscale.com>
Hi,

Double precision is fully supported and without any restrictions.
The max. expected performance for dgemm on a MALI 604 GPU
is between 50 and 72 GFLOPS, compared to 4 arm core's, this is
at least a factor of 10.

Best regards
Werner

José Luis García Pallero

unread,
Jan 14, 2014, 9:29:21 AM1/14/14
to Werner Saar, Jeff Hammond, C. Bergström, openbla...@googlegroups.com
2014/1/14 Werner Saar <wern...@googlemail.com>:
> On 14.01.2014 15:05, José Luis García Pallero wrote:
>>
>> 2014/1/14 Werner Saar <wern...@googlemail.com>:
>>>
>>> On 14.01.2014 14:33, Jeff Hammond wrote:
>>>>
>>>> ARM is interesting to me for price, power and architectural diversity
>>>> reasons.
>>>>
>>>> I cannot comment on any vendor roadmaps right now. Sorry.
>>>>
>>>> Jeff
>>>>
>>>> Sent from my iPhone
>>>>
>>>>> On Jan 14, 2014, at 6:58 AM, "C. Bergström" <cberg...@pathscale.com>
Sounds also very good. Und wahat then about the performance in single
precision? Common nVidia and ATI cards in the mid-low range has a
ratio SP/DP about 8, 12 or even 24, so its double precision
apabilities ara very poor. Has MALI also this ranges? Anyway, 50
GFLOPS/s is a very good number compared with the low price of an ARM
chip.
Are you started the development or is only a long-term plan?

Thanks

>
>
> Best regards
> Werner
>
> --
> You received this message because you are subscribed to the Google Groups
> "OpenBLAS-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to openblas-user...@googlegroups.com.
> To post to this group, send email to openbla...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



Jeff Hammond

unread,
Jan 14, 2014, 9:34:01 AM1/14/14
to Werner Saar, "C. Bergström", openbla...@googlegroups.com
>>> Sorry to hi-jack the thread a little, but what's the point of this? Is
>>> this substantially better than say Atom + NVIDIA? To me I'd be really
>>> interested to see (Cortex-5x) AArch64 + NVIDIA - Where is project Denver?
>
> My plans are to integrate MALI GPU's in the further development of OpenBLAS
> for ARM32 as well as for AArch64,
> because it's possible to use unified memory with these GPU's.
> I don't want to integrate proprietary software like Cuda and I don't want to
> reinvent the wheel, some good
> software like Magma or ViennaCL is already available.

People might be interested in
https://github.com/clMathLibraries/clBLAS. Formerly it was an AMD
product.

> We only need the appropriate hardware for the further development.

I might be able to help here in some cases. Exactly what parts do you need?

Jeff



--
Jeff Hammond
jeff.s...@gmail.com

Werner Saar

unread,
Jan 14, 2014, 9:39:27 AM1/14/14
to José Luis García Pallero, Jeff Hammond, "C. Bergström", openbla...@googlegroups.com
On 14.01.2014 15:29, Jos� Luis Garc�a Pallero wrote:
> 2014/1/14 Werner Saar <wern...@googlemail.com>:
>> On 14.01.2014 15:05, Jos� Luis Garc�a Pallero wrote:
>>> 2014/1/14 Werner Saar <wern...@googlemail.com>:
>>>> On 14.01.2014 14:33, Jeff Hammond wrote:
>>>>> ARM is interesting to me for price, power and architectural diversity
>>>>> reasons.
>>>>>
>>>>> I cannot comment on any vendor roadmaps right now. Sorry.
>>>>>
>>>>> Jeff
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>>> On Jan 14, 2014, at 6:58 AM, "C. Bergstr�m" <cberg...@pathscale.com>
Hi,

At the moment, I don't have a access to a system with MALI GPU.
As soon as we have such a system, I will begin to develop. MALI
GPU's have 128bit SIMD vector units, the single precision performance will
be about 2x of double precision performance.

Best regards
Werner

José Luis García Pallero

unread,
Jan 14, 2014, 9:43:12 AM1/14/14
to Werner Saar, Jeff Hammond, C. Bergström, openbla...@googlegroups.com
2014/1/14 Werner Saar <wern...@googlemail.com>:
> On 14.01.2014 15:29, José Luis García Pallero wrote:
>>
>> 2014/1/14 Werner Saar <wern...@googlemail.com>:
>>>
>>> On 14.01.2014 15:05, José Luis García Pallero wrote:
>>>>
>>>> 2014/1/14 Werner Saar <wern...@googlemail.com>:
>>>>>
>>>>> On 14.01.2014 14:33, Jeff Hammond wrote:
>>>>>>
>>>>>> ARM is interesting to me for price, power and architectural diversity
>>>>>> reasons.
>>>>>>
>>>>>> I cannot comment on any vendor roadmaps right now. Sorry.
>>>>>>
>>>>>> Jeff
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>>> On Jan 14, 2014, at 6:58 AM, "C. Bergström"
As fas as I know, the ODROID-U2 as the one Xianyi has is equipped with
a MALI 400

>
>
> Best regards
> Werner
>
> --
> You received this message because you are subscribed to the Google Groups
> "OpenBLAS-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to openblas-user...@googlegroups.com.
> To post to this group, send email to openbla...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



Werner Saar

unread,
Jan 14, 2014, 9:48:14 AM1/14/14
to José Luis García Pallero, Jeff Hammond, "C. Bergström", openbla...@googlegroups.com
On 14.01.2014 15:43, Jos� Luis Garc�a Pallero wrote:
> 2014/1/14 Werner Saar <wern...@googlemail.com>:
>> On 14.01.2014 15:29, Jos� Luis Garc�a Pallero wrote:
>>> 2014/1/14 Werner Saar <wern...@googlemail.com>:
>>>> On 14.01.2014 15:05, Jos� Luis Garc�a Pallero wrote:
>>>>> 2014/1/14 Werner Saar <wern...@googlemail.com>:
>>>>>> On 14.01.2014 14:33, Jeff Hammond wrote:
>>>>>>> ARM is interesting to me for price, power and architectural diversity
>>>>>>> reasons.
>>>>>>>
>>>>>>> I cannot comment on any vendor roadmaps right now. Sorry.
>>>>>>>
>>>>>>> Jeff
>>>>>>>
>>>>>>> Sent from my iPhone
>>>>>>>
>>>>>>>> On Jan 14, 2014, at 6:58 AM, "C. Bergstr�m"
Hi,

Required is a MALI GPU T6xx, older MALI GPU's don't support OpenCL.

Best Regards
Werner

Werner Saar

unread,
Jan 14, 2014, 10:15:04 AM1/14/14
to Jeff Hammond, "C. Bergström", openbla...@googlegroups.com
Hi,

A very cheap solution would be, to bye a Samsung Chromebook with MALI T604
GPU. Instructions to install Ubuntu or Fedora are available. When the
development will be
finished, you can use this system as laptop.

There are also some development boards with Cortex-A15 processor and a
MALI T6XX
GPU available. But bying such a board from Europe or China seems to be
impossible.

Best regards
Werner

José Luis García Pallero

unread,
Jan 14, 2014, 10:24:33 AM1/14/14
to Werner Saar, Jeff Hammond, C. Bergström, openbla...@googlegroups.com
2014/1/14 Werner Saar <wern...@googlemail.com>:
Hello,

another good solution could be the ODROID-XU family. They are equipped
with two ARMs, one of them an Exynos5 Octa with a GPU PowerVR
SGX544MP3 GPU, wich is OpenCL 1.1 capable. The price starts at $139
for the cheapest system
(http://hardkernel.com/main/products/prdt_info.php?g_code=G138503207322),
and it comes from South Korea.

>
> Best regards
> Werner
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "OpenBLAS-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to openblas-user...@googlegroups.com.
> To post to this group, send email to openbla...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



Werner Saar

unread,
Jan 14, 2014, 10:41:35 AM1/14/14
to José Luis García Pallero, Jeff Hammond, "C. Bergström", openbla...@googlegroups.com
On 14.01.2014 16:24, Jos� Luis Garc�a Pallero wrote:
Hi,

For the moment, I want to begin with the MALI GPU, because OpenCL
for this GPU is very well documented. Integration of other GPU's will be
done at later time.

Best regards
Werner


François Bissey

unread,
Jan 14, 2014, 3:44:38 PM1/14/14
to openbla...@googlegroups.com
On Tue, 14 Jan 2014 14:27:36 you wrote:
> 2014/1/13 Jeff Hammond <jeff.s...@gmail.com>:
> > Perhaps people are interested in the results I obtained on ARM
> > (specifically https://developer.nvidia.com/content/kayla-platform) with
> > three different implementations of the BLAS. I attached the data for
> > DGEMM but have the rest if anyone cares.
> >
> > I compared the latest ARM builds of OpenBLAS as of today, the latest
> > version of BLIS (https://code.google.com/p/blis/) as of today and ATLAS
> > 3.11.22 (which took more than 5 days to build). The performance of ATLAS
> > 3.8.4 that the package manager installed was ~25% worse.
>
> Hello,
>
> I've installed BLIS apparently without errors, but I can not find the
> cblas.h file for cblas interface. Includes BLIS a CBLAS interface?
>
Hi José,

Unfortunately it doesn't look like blis ship a cblas interface only
a "gnu fortran style" interface.
By that I mean that that the blis library includes object like
"dgemm_" which proves how ignorant of fortran they are. I also
tried to compile lapack against it, since it should use fortran blas.
That's a dismal failure, so that interface it broken anyway.
I looked for cblas.h too.

I was excited because it has support for power7 architecture which
is not supported in openblas. openblas has support for power6 but
I have a test case showing it is broken on power7.

Francois

Jeff Hammond

unread,
Jan 14, 2014, 4:04:16 PM1/14/14
to François Bissey, openbla...@googlegroups.com, blis-discuss
On Tue, Jan 14, 2014 at 2:44 PM, François Bissey
<francoi...@canterbury.ac.nz> wrote:
> On Tue, 14 Jan 2014 14:27:36 you wrote:
>> 2014/1/13 Jeff Hammond <jeff.s...@gmail.com>:
>> > Perhaps people are interested in the results I obtained on ARM
>> > (specifically https://developer.nvidia.com/content/kayla-platform) with
>> > three different implementations of the BLAS. I attached the data for
>> > DGEMM but have the rest if anyone cares.
>> >
>> > I compared the latest ARM builds of OpenBLAS as of today, the latest
>> > version of BLIS (https://code.google.com/p/blis/) as of today and ATLAS
>> > 3.11.22 (which took more than 5 days to build). The performance of ATLAS
>> > 3.8.4 that the package manager installed was ~25% worse.
>>
>> Hello,
>>
>> I've installed BLIS apparently without errors, but I can not find the
>> cblas.h file for cblas interface. Includes BLIS a CBLAS interface?
>>
> Hi José,
>
> Unfortunately it doesn't look like blis ship a cblas interface only
> a "gnu fortran style" interface.
> By that I mean that that the blis library includes object like
> "dgemm_" which proves how ignorant of fortran they are. I also
> tried to compile lapack against it, since it should use fortran blas.
> That's a dismal failure, so that interface it broken anyway.
> I looked for cblas.h too.

I'm sure that many of the BLIS developers would accept your attempt at
criticism as a compliment. I believe I am the only user of the BLIS
Fortran interface right now and I'm satisfied with how they've done
it. If I find a case where it doesn't work for me, I'll fix it and
contribute the patch. All the common Fortran compilers are amenable
to the dgemm_ symbol convention.

I didn't have any problem compiling LAPACK against BLIS when I tried.

You can use the GSL CBLAS interface on top of the BLIS Fortran
interface if you absolutely must have that. It is a reasonable
feature require for someone to implement the canonical CBLAS interface
on top of the BLIS C API but I don't know anyone who has time for that
right now.

> I was excited because it has support for power7 architecture which
> is not supported in openblas. openblas has support for power6 but
> I have a test case showing it is broken on power7.

I ran BLIS on POWER7 just yesterday and it was fine. Please send bug
reports to blis-...@googlegroups.com if you believe that BLIS is
deficient. Given that I have never encountered the problems you claim
to see, it is possible that you are doing something incorrectly.

José Luis García Pallero

unread,
Jan 14, 2014, 5:27:32 PM1/14/14
to François Bissey, openbla...@googlegroups.com
2014/1/14, François Bissey <francoi...@canterbury.ac.nz>:
> On Tue, 14 Jan 2014 14:27:36 you wrote:
>> 2014/1/13 Jeff Hammond <jeff.s...@gmail.com>:
>> > Perhaps people are interested in the results I obtained on ARM
>> > (specifically https://developer.nvidia.com/content/kayla-platform) with
>> > three different implementations of the BLAS. I attached the data for
>> > DGEMM but have the rest if anyone cares.
>> >
>> > I compared the latest ARM builds of OpenBLAS as of today, the latest
>> > version of BLIS (https://code.google.com/p/blis/) as of today and ATLAS
>> > 3.11.22 (which took more than 5 days to build). The performance of
>> > ATLAS
>> > 3.8.4 that the package manager installed was ~25% worse.
>>
>> Hello,
>>
>> I've installed BLIS apparently without errors, but I can not find the
>> cblas.h file for cblas interface. Includes BLIS a CBLAS interface?
>>
> Hi José,
>
> Unfortunately it doesn't look like blis ship a cblas interface only
> a "gnu fortran style" interface.
> By that I mean that that the blis library includes object like
> "dgemm_" which proves how ignorant of fortran they are. I also
> tried to compile lapack against it, since it should use fortran blas.
> That's a dismal failure, so that interface it broken anyway.
> I looked for cblas.h too.

Fortran inferface works perfectly for me in ARM. Yes, it has not cblas
interface, but I remembered it can be created from the Fortran
interface using http://www.netlib.org/blas/blast-forum/cblas.tgz Try
to use it. By default, it uses the fortran interface xxxxx_, that is
the one provided by BLIS

>
> I was excited because it has support for power7 architecture which
> is not supported in openblas. openblas has support for power6 but
> I have a test case showing it is broken on power7.
>
> Francois
>

Zhifei Wang

unread,
Oct 19, 2014, 9:47:03 PM10/19/14
to openbla...@googlegroups.com, jgpa...@gmail.com, jeff.s...@gmail.com, cberg...@pathscale.com, wern...@googlemail.com
Hi all,

Any progress in Mali 600+ hardware?

Best Regards,
Phil

Zhang Xianyi

unread,
Oct 19, 2014, 10:34:09 PM10/19/14
to Zhifei Wang, openbla...@googlegroups.com, José Luis García Pallero, jeff.s...@gmail.com, C Bergström, Werner Saar
Hi Phil,

Do you mean implementing  OpenCL kernels on ARM?

Xianyi

For more options, visit https://groups.google.com/d/optout.

Zhifei Wang

unread,
Oct 19, 2014, 11:16:29 PM10/19/14
to openbla...@googlegroups.com, wzf...@gmail.com, jgpa...@gmail.com, jeff.s...@gmail.com, cberg...@pathscale.com, wern...@googlemail.com
Hi Xianyi,

Yes. And, are you still waiting for Mali 600+ hardware?

Thanks,
Phil

Zhang Xianyi

unread,
Oct 19, 2014, 11:20:12 PM10/19/14
to Zhifei Wang, openbla...@googlegroups.com, José Luis García Pallero, jeff.s...@gmail.com, C Bergström, Werner Saar
Hi Phil,

We didn't purchase Mali 600+ yet. 

For OpenCL, we plan to develop them on AMD HSA APU first.

Zhifei Wang

unread,
Oct 19, 2014, 11:35:33 PM10/19/14
to openbla...@googlegroups.com, wzf...@gmail.com, jgpa...@gmail.com, jeff.s...@gmail.com, cberg...@pathscale.com, wern...@googlemail.com
Hi Xianyi,

Thanks a lot. Nice to know the road map.
How about AArch64? It is experimental, right?

Zhang Xianyi

unread,
Oct 20, 2014, 5:04:40 AM10/20/14
to Zhifei Wang, openbla...@googlegroups.com, José Luis García Pallero, Jeff Hammond, C Bergström, Werner Saar
It's naive C implementation for AArch64. Because we didn't have the hardware, we didn't optimized the kernels.

Xianyi

Phil Wang

unread,
Oct 20, 2014, 5:14:50 AM10/20/14
to openbla...@googlegroups.com, wzf...@gmail.com, jgpa...@gmail.com, jeff.s...@gmail.com, cberg...@pathscale.com, wern...@googlemail.com
Thanks again.

Olivier Grisel

unread,
Nov 1, 2014, 12:16:35 PM11/1/14
to Phil Wang, openbla...@googlegroups.com, jgpa...@gmail.com, jeff.s...@gmail.com, cberg...@pathscale.com, wern...@googlemail.com
For those interested it's possible to get a free early beta access to
ARMv7 machines on this cloud host:

http://labs.online.net/dghd

They have experimental docker support has well with many base images
rebuilt for the armhf platform:

https://blog.cloud.online.net/2014/10/27/docker-on-c1/

Here is a the content of /cpu/proc on the host I am currently connected to.

processor : 0
model name : ARMv7 Processor rev 2 (v7l)
Features : half thumb fastmult vfp edsp thumbee vfpv3 tls idiva
idivt vfpd32 lpae
CPU implementer : 0x56
CPU architecture: 7
CPU variant : 0x2
CPU part : 0x584
CPU revision : 2

processor : 1
model name : ARMv7 Processor rev 2 (v7l)
Features : half thumb fastmult vfp edsp thumbee vfpv3 tls idiva
idivt vfpd32 lpae
CPU implementer : 0x56
CPU architecture: 7
CPU variant : 0x2
CPU part : 0x584
CPU revision : 2

processor : 2
model name : ARMv7 Processor rev 2 (v7l)
Features : half thumb fastmult vfp edsp thumbee vfpv3 tls idiva
idivt vfpd32 lpae
CPU implementer : 0x56
CPU architecture: 7
CPU variant : 0x2
CPU part : 0x584
CPU revision : 2

processor : 3
model name : ARMv7 Processor rev 2 (v7l)
Features : half thumb fastmult vfp edsp thumbee vfpv3 tls idiva
idivt vfpd32 lpae
CPU implementer : 0x56
CPU architecture: 7
CPU variant : 0x2
CPU part : 0x584
CPU revision : 2

Hardware : Marvell Armada 370/XP (Device Tree)
Revision : 0000
Serial : 0000000000000000



--
Olivier

Michael Görgens

unread,
Jul 9, 2020, 4:05:22 PM7/9/20
to OpenBLAS-users
@Werner

What happened to the plans to use MALI for OpenBLAS speedup on ARM?

Best regards,
Michael

martin-frbg

unread,
Jul 12, 2020, 5:11:06 PM7/12/20
to OpenBLAS-users
Unfortunately Werner has not been seen on this project (or github in general) since early 2017. Some development happened in a separate clOpenBLAS project in the 2014/15 timeframe but as far as I know it never progressed
beyond a partial implementation of sgemm/dgemm for nvidia gpus of the time.
Reply all
Reply to author
Forward
0 new messages