> Hello everyone,
> I may be asking a stupid question, but I'm having all sort of troubles
> using the inline assembly to speed up my software with NEON. Would
> anyone be able to tell me why the NEON piece of code breaks my
> program?
You shouldn't use inline asm with NEON. It's almost impossible to get
it right.
> [...]
>
> Specifically, it looks like the result of the function is fine, but
> the program does not execute in the same way afterwards.. it's like
> some clobbered register is not restored.. I don't understand.
Look at the assembler generated by the compiler. That should tell you
what's going wrong.
--
Måns Rullgård
ma...@mansr.com
> 2009/10/26 Måns Rullgård <ma...@mansr.com>:
>>
>> mic <michele...@gmail.com> writes:
>>
>>> Hello everyone,
>>> I may be asking a stupid question, but I'm having all sort of troubles
>>> using the inline assembly to speed up my software with NEON. Would
>>> anyone be able to tell me why the NEON piece of code breaks my
>>> program?
>>
>> You shouldn't use inline asm with NEON. It's almost impossible to get
>> it right.
>
> Any particular reason? Why does this stuff have to be so hard ...
Why does gcc have to suck? Nobody knows...
--
Måns Rullgård
ma...@mansr.com
>
> 2009/10/26 Måns Rullgård <ma...@mansr.com>:
>>
>> mic <michele...@gmail.com> writes:
>>
>>> Hello everyone,
>>> I may be asking a stupid question, but I'm having all sort of
>>> troubles
>>> using the inline assembly to speed up my software with NEON. Would
>>> anyone be able to tell me why the NEON piece of code breaks my
>>> program?
>>
>> You shouldn't use inline asm with NEON. It's almost impossible to
>> get
>> it right.
>
> Any particular reason? Why does this stuff have to be so hard ...
I suspect register allocation, but I'm only repeating what people say
on IRC.
regards,
Koen
And instruction scheduling too.
Laurent
This instruction updates %[R] register.
> ::[R]"r" (out), [A]"r" (a), [B]"r" (b)
^^^^^^^^^
And this tells gcc that %[R] is a constant input argument. Same for [A] and
[B].
> "memory","q0","q1","q2","q3","q4","q5","q6","q7","q8","q9","q10","q11");
^^^^^^^^^^^
CodeSourcery toolchain 2007q3 has a bug with handling 'q' registers in the
clobber list. Not sure if it can affect you, but to be on a safe side, it is
better to replace them with equivalent 'd' registers.
> Specifically, it looks like the result of the function is fine, but
> the program does not execute in the same way afterwards.. it's like
> some clobbered register is not restored.. I don't understand.
Yes, the constraints are wrong.
--
Best regards,
Siarhei Siamashka
From a suggestion Gerald made, I've created the RoboticsBus project on
beagleboard.org. As I think about this and consider it more, it occurs to me
that this project does not have to be just for robotics expansion boards, but
can be used as a generalized expansion bus for Beagle. For now though, I won't
change the name of the project (RoboticsBus) though.
You can find the URL to the project wiki in my signature. Please feel free to
read, comment, add to, and generally contribute in any way you can. I want
this to be a completely Open Source project, both in hardware and software.
When this project is far enough along to start working on software, I'll
create a repository on github.com and add whomever wants to help with software
as a contributor. I believe we can even use this for schematics to track
versions.
I've been writing quite a lot on this project on my Wiki, and have been
trying to keep things in a logical format.
8-Dale
--
I can handle complexity. It's the simple things that confound me.
Open your mind, Read, Learn, Think, Apply. 73, from N7PKT!
http://www.thedynaplex.info - Blog, Wiki, and Forums
http://www.thedynaplex.info/wiki/index.php/RoboticsBus - Robotics Bus
> To Siarhei Siamashka:
> In theory it's right, but changing the last piece to
>
> :[R]"+r" (out), [A]"+r" (a), [B]"+r" (b)
> ::"memory","q0","q1","q2","q3","q4","q5","q6","q7","q8","q9","q10","q11");
>
> does no difference at all. Please note that I've tried CS 2009q3, not
> 2007q3.
>
> To Mans Rullgard:
> Oh dear! Should I not use inline assembly with NEON??? Should I use
> gcc intrinsics then? Is writing .S files the only (scaring) option?
Intrinsics are worse than inline asm. Use .S files. They have the
additional advantage of allowing you to compile the C code with any
compiler, not only gcc.
--
Måns Rullgård
ma...@mansr.com
I'm in the process of looking at exactly which signals and in what
combinations they might be used. I'm still brand new to Beagle, so still have
quite a learning curve to go through. I know for sure we want I2C and SPI,
along with as many UARTs as can be made usable. I'm more interested in the
various communications methods available on Beagle, but am not as interested
in working directly with GPIOs and such directly, although this might be an
option also. I think Beagle would be better used for things like its main
processing power, graphics, vision processing, etc.
My original idea is to use smaller micros, like expansion boards with AVRs,
to handle most of the direct interfacing with sensors and GPIOs. We may need
some of Beagle's GPIOs for control signals. I haven't gotten far enough into
thinking about all this yet to know for sure. I'm proposing more of a
communication bus for communication between Beagle and smart expansion boards.
To effectively use the Beagle's GPIOs for digital I/Os and sensors in robotics
would require adding 3 pin headers with power and ground buses next to the
signal pins. I'm not sure I want to use board real space for these connected
to Beagle, but it might be an option if there is interest.
Expansion boards based on micros like AVRs can easily interface with the 3.3V
and 5V components most often found and used in robotics. They also have much
needed things like analog inputs, more UARTs, etc. I'd connect these to Beagle
through a bus type interface and let Beagle do all the heavy processing where
its required to use the data these boards provide. I don't think it would
really be appropriate to connect servos directly to a Beagle, for instance.
We can certainly discuss topics like GPIOs and such though, and see where
things go from there. This expansion bus is to be completely open and Open
Source, both in software and hardware, so anyone can feel free to contribute
in any way they want and add anything they want as long as everything works
together. I won't be able to actually get Linux up and running on my Beagle
for a couple weeks, because I need to get a USB Hub and SDHC card reader. I
think I have everything else I want for my Beagle except a Zippy.
I do have a Wiki started for this as well as other projects of mine. I've
also setup forums that allow attachments to be included as well as code within
postings. I also have my Blog, which I've been writing on for awhile now. They
are all available at http://www.thedynaplex.info now. Unfortunately, there is
no interaction between the three different packages and you to create accounts
on each one they want access on. I'm also considering putting something like
Drupal online for this, which has all these features rolled into a single
package.
As an additional experiment, try to execute the following command as root
before running your test:
# echo 4 > /proc/cpu/alignment
Kernel may behave really funny when it tries to emulate unaligned NEON memory
accesses.
It doesn't try, it just spins on the faulting instruction.
--
Måns Rullgård
ma...@mansr.com
It spins if you disable emulation.
But when emulation is enabled, it just decodes NEON instructions wrong and
interprets them as something else. Sometimes with weird side effects (like ARM
registers getting modified).
Ouch. I always run with unaligned fixups entirely disabled using that
patch RMK refuses to talk about. I want that SIGBUS.
--
Måns Rullgård
ma...@mansr.com
I haven't checked the latest kernels yet, so don't know whether this issue
still exists. But it was present in 2.6.28 at least.
Anyway, considering the use of VLDM instructions instead of VLD1 in the posted
code snippet, alignment related problems could be potentially involved. Let's
wait for a reply from Michele to see if it really was the case.
> I always run with unaligned fixups entirely disabled using that
> patch RMK refuses to talk about. I want that SIGBUS.
I also have fixups disabled (from one of the scripts early at boot, did not
bother to patch kernel). I hope that even if kernel keeps having alignment
fixup as a default setup, at least linux distros will use something more
reasonable.