NEON assembly much faster than intrinsics

1,386 views
Skip to first unread message

Olivier Guilyardi

unread,
Sep 1, 2011, 9:31:35 AM9/1/11
to andro...@googlegroups.com
Hi everyone,

Not sure if this has already been mentioned, but according to the following
article the GCC NEON intrinsics perform badly when compared to using NEON
instructions directly in assembly: http://hilbert-space.de/?p=22

With the test described in this article, the intrinsics only runs 1.5 times
faster than basic C, whereas assembly provides x7.5 speedup.

If that allows to consume 7.5 times less power, then NEON is a pretty awesome
thing. I thought that might interest some of you.

--
Olivier

Shervin Emami

unread,
Sep 2, 2011, 5:37:12 AM9/2/11
to android-ndk
Yes in general, the GCC compiler produces bad ARM & NEON code, even if
you use Intrinsics (and sometimes even if you use inline Assembly
code!) compared to writing pure Assembly code. But the speed & power
difference depends a lot on what your C code was trying to do, how you
implemented it in C and how you implemented it in Assembly. For
example, if most of the execution time is spent waiting for memory
access then it won't matter much whether you use C or Intrinsics or
Assembly. But if most of the delay is in the calculations then NEON
Assembly is probably going to be atleast twice and perhaps 5 times as
fast as NEON Instrinsics. But it might take you a lot longer to write
the Assembly code, so you should only do it if you really need to
optimize that piece of code as much as possible. If you want more
info, I have written various notes & tutorial about ARM & NEON &
Assembly code at "http://www.shervinemami.info/armAssembly.html"

Cheers,
Shervin Emami.

Olivier Guilyardi

unread,
Sep 6, 2011, 5:19:33 AM9/6/11
to andro...@googlegroups.com
Thanks Shervin for these clarifications.

However, at the moment I'm more concerned with saving power than performance. Is
it safe to assume that if I use NEON and achieve say 7x speedup, it will
actually consume 7 times less power?

Cheers,

Olivier

David Turner

unread,
Sep 6, 2011, 5:26:37 AM9/6/11
to andro...@googlegroups.com
On Tue, Sep 6, 2011 at 11:19 AM, Olivier Guilyardi <li...@samalyse.com> wrote:
Thanks Shervin for these clarifications.

However, at the moment I'm more concerned with saving power than performance. Is
it safe to assume that if I use NEON and achieve say 7x speedup, it will
actually consume 7 times less power?

Certainly not, power consumption doesn't scale linearly with speed and instruction count anyway.
You can assume that it will take less power to do the same job 7x faster, but by how much depends a lot on what you're trying to do.
 
Cheers,

Olivier
--
You received this message because you are subscribed to the Google Groups "android-ndk" group.
To post to this group, send email to andro...@googlegroups.com.
To unsubscribe from this group, send email to android-ndk...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/android-ndk?hl=en.


mic _

unread,
Sep 6, 2011, 5:35:20 AM9/6/11
to andro...@googlegroups.com
To assume that you will get a 7x improvement for any algorithm is incorrect to begin with. The example you linked to was a very simple one.
When you use Neon you will activate additional hardware blocks, and thus use more power for the period that you're using it. Since the computation time likely will be shorter you'll probably end up saving power, but not necessarily a great deal of it.

/Michael

Olivier Guilyardi

unread,
Sep 6, 2011, 5:49:08 AM9/6/11
to andro...@googlegroups.com

On 09/06/2011 11:26 AM, David Turner wrote:
>
>
> On Tue, Sep 6, 2011 at 11:19 AM, Olivier Guilyardi <li...@samalyse.com
> <mailto:li...@samalyse.com>> wrote:
>
> Thanks Shervin for these clarifications.
>
> However, at the moment I'm more concerned with saving power than
> performance. Is
> it safe to assume that if I use NEON and achieve say 7x speedup, it will
> actually consume 7 times less power?
>
> Certainly not, power consumption doesn't scale linearly with speed and
> instruction count anyway.
> You can assume that it will take less power to do the same job 7x
> faster, but by how much depends a lot on what you're trying to do.

That's what I sensed.

So, basically, I guess that to save power, I need to do less, not faster, which
actually makes sense in general :)

Thanks

--
Olivier

Dianne Hackborn

unread,
Sep 7, 2011, 1:56:16 AM9/7/11
to andro...@googlegroups.com
On Tue, Sep 6, 2011 at 2:49 AM, Olivier Guilyardi <li...@samalyse.com> wrote:
So, basically, I guess that to save power, I need to do less, not faster, which
actually makes sense in general :)

Well generally you go faster by doing less, so they are closely related.

There is just no simple rule.  For example using NEON may cause the CPU to bring up parts that aren't otherwise running and so take a little more power.  This should be well offset by the reduced amount of time spent doing the work...  that is if this is actually reducing the amount of time you are running the CPU and not just letting you get a better frame rate.

--
Dianne Hackborn
Android framework engineer
hac...@android.com

Note: please don't send private questions to me, as I don't have time to provide private support, and so won't reply to such e-mails.  All such questions should be posted on public forums, where I and others can see and answer them.

Olivier Guilyardi

unread,
Sep 7, 2011, 3:12:50 PM9/7/11
to andro...@googlegroups.com
On 09/07/2011 07:56 AM, Dianne Hackborn wrote:
> On Tue, Sep 6, 2011 at 2:49 AM, Olivier Guilyardi <li...@samalyse.com
> <mailto:li...@samalyse.com>> wrote:
>
> So, basically, I guess that to save power, I need to do less, not
> faster, which
> actually makes sense in general :)
>
>
> Well generally you go faster by doing less, so they are closely related.
>
> There is just no simple rule. For example using NEON may cause the CPU
> to bring up parts that aren't otherwise running and so take a little
> more power. This should be well offset by the reduced amount of time
> spent doing the work... that is if this is actually reducing the amount
> of time you are running the CPU and not just letting you get a better
> frame rate.

Hmm, well, at the moment, a bottleneck in my app is the computations which occur
in the OpenGL thread. This is consuming quite a lot of CPU. And actually, if I
make this faster, it may not help saving power, since it might result in shorter
cycles. That would provide a higher frame rate, which I don't need.

Basically, I'm recomputing and redrawing everything at each OpenGL cycle. It's
pretty bad when compared to the redraw-when-changed UI paradigm. I'm thinking
about avoiding computations before drawing when they are unnecessary. But
there's a good deal of complexity in that, in my very case. So I thought that if
NEON is able to provide 7x speedup, then that could be a good move, at first.

That said, I've looked at my code again, and some optimizations are possible in
pure C. After this I suppose I could maybe do a little ARM assembly, which
wouldn't involve extra hardware/CPU parts, and maybe only after that think about
NEON.

I'm pretty sure that I can reduce power consumption a lot, but I'm not there
yet, and also I'm not very used to thinking in terms of power saving. But, when
you're on the go, it's clearly a central concern, as I experienced recently ;)

--
Olivier

Reply all
Reply to author
Forward
0 new messages